Hi everyone, at this year's Berlin Buzz words conference someone (sematext?) have described a technique of a hot shard. The idea is to have a slim shard to maximize the update throughput during a day (when millions of docs need to be posted) and make sure the indexed documents are immediately searchable. In the end of the day the day's documents are moved to cold shards. If I'm not mistaken, this was implemented for ElasticSearch. I'm currently implementing something similar (but pretty tailored to our logical sharding use case) for Solr (3.x). The feature set looks roughly like this:
1) front end solr (query router) is aware of the hot shard: it directs the incoming queries to the hot and "cold" shards. 2) new incoming documents are directed first to the hot shard and then periodically (like once a day or once a week) moved over to the closest in time cold shard. And for that... 3) hot shard index is being partitioned low level using Lucene's IndexReader / IndexWriter with the implementation based on [1], [2] and customized to logical (time-based) sharding. The question is: is doing index partitioning low-level a good way of implementing the hot shard concept? That is, is there anything better operationally-wise from the point of view of disaster recovery / search cluster support? Am I missing some obvious SOLR-ish solution? Doing instead the periodical hot shard cleaning and re-posting its source documents to the closest cold shard is less modular and hence more complicated operationally for us. Please let me know, if you need more details or if the problem isn't clear enough. Thanks. [1] http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html [2] https://github.com/HON-Khresmoi/hash-based-index-splitter -- Regards, Dmitry Kan