Hi everyone,

at this year's Berlin Buzz words conference someone (sematext?) have
described a technique of a hot shard. The idea is to have a slim shard to
maximize the update throughput during a day (when millions of docs need to
be posted) and make sure the indexed documents are immediately searchable.
In the end of the day the day's documents are moved to cold shards. If I'm
not mistaken, this was implemented for ElasticSearch. I'm currently
implementing something similar (but pretty tailored to our logical sharding
use case) for Solr (3.x). The feature set looks roughly like this:

1) front end solr (query router) is aware of the hot shard: it directs the
incoming queries to the hot and "cold" shards.
2) new incoming documents are directed first to the hot shard and then
periodically (like once a day or once a week) moved over to the closest in
time cold shard. And for that...
3) hot shard index is being partitioned low level using Lucene's
IndexReader / IndexWriter with the implementation based on [1], [2] and
customized to logical (time-based) sharding.


The question is: is doing index partitioning low-level a good way of
implementing the hot shard concept? That is, is there anything better
operationally-wise from the point of view of disaster recovery / search
cluster support? Am I missing some obvious SOLR-ish solution?
Doing instead the periodical hot shard cleaning and re-posting its source
documents to the closest cold shard is less modular and hence more
complicated operationally for us.

Please let me know, if you need more details or if the problem isn't clear
enough. Thanks.

[1]
http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html
[2] https://github.com/HON-Khresmoi/hash-based-index-splitter

-- 
Regards,

Dmitry Kan

Reply via email to