I am considering using SolrCloud, but I have a use case that I am not sure
if it covers.

I would like to keep an index up to date in realtime, but also I would like
to sometimes restate the past. The way that I would restate the past is to
do batch processing over historical data.

My idea is that I would have the Solr collection sharded by date range. As
I move forward in time I would add more shards.

For restating historical data I would have a separate process that actually
indexes a shards worth of data. (This keeps the servers that are meant for
production search from having to handle the load of indexing historically.)
I would then move the index files to the solr servers and register the
newly created index with the server replacing the existing shards.

I used to be able to do something similar pre-SolrCloud by using the core
admin. But this did not have the benefit of having one search for the
entire "collection". I had to manually query each of the cores to get the
full search index.

Essentially the question is:
1- is it possible to shard by date range in this way?
2- is it possible to swap out the index used by a shard?
3- is there a different way I should be thinking of this?

Max

Reply via email to