Custom routing is a nice improvement for 4.1, but if I understand you
correctly it is probably not what you want to use.
If I understand you correctly you want to make a collection with a
number of slices - one slice for each day (or other period) - and then
make kinda "slicing window" where you create a new slice under this
collection every day and delete the slice corresponding to "the oldest
day". It is hard to create and delete slices under a particular
collection. It is much easier to delete an entire collection. Therefore
I suggest you make a collection for each day (or other period) and
delete collection corresponding to "the oldest day". We do that in our
system based on 4.0. We are doing one collection per month though. There
is a limit to how much you can put into a single slice/shard before it
becomes slower to index/search - that is part of the reason for
sharding. With a collection-per-day solution you also get the
opportunity to put as many documents into a collection/day as you want -
it is just a matter of slicing into enough slices/shards and throw
enough hardware into it. If you dont have a lot of data for each day,
you can just have one or two slices/shards per day-collection.
We are running our Solr cluster across 10 4CPU-core/4GB-RAM machines and
we are able to index over 1 billion documents (per month) into a
collection with 40 shards (=40 slices because we are not using
replication) - 4 shards on each Solr node in the cluster. We still do
not know how the system will behave when we have and cross-search many
(up to 24 since we are supposed to keep data for 2 years before we can
throw it away) collections with 1+ billion documents each.
Regards, Per Steffensen
On 12/18/12 8:20 PM, Scott Stults wrote:
I'm going to be building a Solr cluster and I want to have a rolling set of
slices so that I can keep a fixed number of days in my collection. If I
send an update to a particular slice leader, will it always hash the unique
key and (probably) forward the doc to another leader?
Thank you,
Scott