Re: Will SolrCloud always slice by ID hash?
Thanks guys. Yeah, separate rolling collections seem like the better way to go. -Scott On Sat, Dec 29, 2012 at 1:30 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: https://issues.apache.org/jira/browse/SOLR-4237
Re: Will SolrCloud always slice by ID hash?
Scott (OP), maybe you are after http://search-lucene.com/m/YBn4w1UAbEB and https://issues.apache.org/jira/browse/SOLR-4237 ? Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Dec 21, 2012 at 6:57 AM, Per Steffensen st...@designware.dk wrote: Custom routing is a nice improvement for 4.1, but if I understand you correctly it is probably not what you want to use. If I understand you correctly you want to make a collection with a number of slices - one slice for each day (or other period) - and then make kinda slicing window where you create a new slice under this collection every day and delete the slice corresponding to the oldest day. It is hard to create and delete slices under a particular collection. It is much easier to delete an entire collection. Therefore I suggest you make a collection for each day (or other period) and delete collection corresponding to the oldest day. We do that in our system based on 4.0. We are doing one collection per month though. There is a limit to how much you can put into a single slice/shard before it becomes slower to index/search - that is part of the reason for sharding. With a collection-per-day solution you also get the opportunity to put as many documents into a collection/day as you want - it is just a matter of slicing into enough slices/shards and throw enough hardware into it. If you dont have a lot of data for each day, you can just have one or two slices/shards per day-collection. We are running our Solr cluster across 10 4CPU-core/4GB-RAM machines and we are able to index over 1 billion documents (per month) into a collection with 40 shards (=40 slices because we are not using replication) - 4 shards on each Solr node in the cluster. We still do not know how the system will behave when we have and cross-search many (up to 24 since we are supposed to keep data for 2 years before we can throw it away) collections with 1+ billion documents each. Regards, Per Steffensen On 12/18/12 8:20 PM, Scott Stults wrote: I'm going to be building a Solr cluster and I want to have a rolling set of slices so that I can keep a fixed number of days in my collection. If I send an update to a particular slice leader, will it always hash the unique key and (probably) forward the doc to another leader? Thank you, Scott
Re: Will SolrCloud always slice by ID hash?
Custom routing is a nice improvement for 4.1, but if I understand you correctly it is probably not what you want to use. If I understand you correctly you want to make a collection with a number of slices - one slice for each day (or other period) - and then make kinda slicing window where you create a new slice under this collection every day and delete the slice corresponding to the oldest day. It is hard to create and delete slices under a particular collection. It is much easier to delete an entire collection. Therefore I suggest you make a collection for each day (or other period) and delete collection corresponding to the oldest day. We do that in our system based on 4.0. We are doing one collection per month though. There is a limit to how much you can put into a single slice/shard before it becomes slower to index/search - that is part of the reason for sharding. With a collection-per-day solution you also get the opportunity to put as many documents into a collection/day as you want - it is just a matter of slicing into enough slices/shards and throw enough hardware into it. If you dont have a lot of data for each day, you can just have one or two slices/shards per day-collection. We are running our Solr cluster across 10 4CPU-core/4GB-RAM machines and we are able to index over 1 billion documents (per month) into a collection with 40 shards (=40 slices because we are not using replication) - 4 shards on each Solr node in the cluster. We still do not know how the system will behave when we have and cross-search many (up to 24 since we are supposed to keep data for 2 years before we can throw it away) collections with 1+ billion documents each. Regards, Per Steffensen On 12/18/12 8:20 PM, Scott Stults wrote: I'm going to be building a Solr cluster and I want to have a rolling set of slices so that I can keep a fixed number of days in my collection. If I send an update to a particular slice leader, will it always hash the unique key and (probably) forward the doc to another leader? Thank you, Scott
Will SolrCloud always slice by ID hash?
I'm going to be building a Solr cluster and I want to have a rolling set of slices so that I can keep a fixed number of days in my collection. If I send an update to a particular slice leader, will it always hash the unique key and (probably) forward the doc to another leader? Thank you, Scott
Re: Will SolrCloud always slice by ID hash?
On Tue, Dec 18, 2012 at 2:20 PM, Scott Stults sstu...@opensourceconnections.com wrote: I'm going to be building a Solr cluster and I want to have a rolling set of slices so that I can keep a fixed number of days in my collection. If I send an update to a particular slice leader, will it always hash the unique key and (probably) forward the doc to another leader? Nope. Flexibility is our middle name ;-) Starting with 4.1 you will be able to do custom sharding. If you send a document to any replica of a shard and don't indicate it's for another shard, then it will assume it's for that shard (and forward it to the leader replica for that shard if it's not the leader). -Yonik http://lucidworks.com