Re: Will SolrCloud always slice by ID hash?

2013-01-07 Thread Scott Stults
Thanks guys. Yeah, separate rolling collections seem like the better way to
go.


-Scott

On Sat, Dec 29, 2012 at 1:30 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 https://issues.apache.org/jira/browse/SOLR-4237


Re: Will SolrCloud always slice by ID hash?

2012-12-28 Thread Otis Gospodnetic
Scott (OP), maybe you are after http://search-lucene.com/m/YBn4w1UAbEB and
https://issues.apache.org/jira/browse/SOLR-4237 ?

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Fri, Dec 21, 2012 at 6:57 AM, Per Steffensen st...@designware.dk wrote:

 Custom routing is a nice improvement for 4.1, but if I understand you
 correctly it is probably not what you want to use.

 If I understand you correctly you want to make a collection with a number
 of slices - one slice for each day (or other period) - and then make kinda
 slicing window where you create a new slice under this collection every
 day and delete the slice corresponding to the oldest day. It is hard to
 create and delete slices under a particular collection. It is much easier
 to delete an entire collection. Therefore I suggest you make a collection
 for each day (or other period) and delete collection corresponding to the
 oldest day. We do that in our system based on 4.0. We are doing one
 collection per month though. There is a limit to how much you can put into
 a single slice/shard before it becomes slower to index/search - that is
 part of the reason for sharding. With a collection-per-day solution you
 also get the opportunity to put as many documents into a collection/day as
 you want - it is just a matter of slicing into enough slices/shards and
 throw enough hardware into it. If you dont have a lot of data for each day,
 you can just have one or two slices/shards per day-collection.

 We are running our Solr cluster across 10 4CPU-core/4GB-RAM machines and
 we are able to index over 1 billion documents (per month) into a collection
 with 40 shards (=40 slices because we are not using replication) - 4 shards
 on each Solr node in the cluster. We still do not know how the system will
 behave when we have and cross-search many (up to 24 since we are supposed
 to keep data for 2 years before we can throw it away) collections with 1+
 billion documents each.

 Regards, Per Steffensen


 On 12/18/12 8:20 PM, Scott Stults wrote:

 I'm going to be building a Solr cluster and I want to have a rolling set
 of
 slices so that I can keep a fixed number of days in my collection. If I
 send an update to a particular slice leader, will it always hash the
 unique
 key and (probably) forward the doc to another leader?


 Thank you,
 Scott





Re: Will SolrCloud always slice by ID hash?

2012-12-21 Thread Per Steffensen
Custom routing is a nice improvement for 4.1, but if I understand you 
correctly it is probably not what you want to use.


If I understand you correctly you want to make a collection with a 
number of slices - one slice for each day (or other period) - and then 
make kinda slicing window where you create a new slice under this 
collection every day and delete the slice corresponding to the oldest 
day. It is hard to create and delete slices under a particular 
collection. It is much easier to delete an entire collection. Therefore 
I suggest you make a collection for each day (or other period) and 
delete collection corresponding to the oldest day. We do that in our 
system based on 4.0. We are doing one collection per month though. There 
is a limit to how much you can put into a single slice/shard before it 
becomes slower to index/search - that is part of the reason for 
sharding. With a collection-per-day solution you also get the 
opportunity to put as many documents into a collection/day as you want - 
it is just a matter of slicing into enough slices/shards and throw 
enough hardware into it. If you dont have a lot of data for each day, 
you can just have one or two slices/shards per day-collection.


We are running our Solr cluster across 10 4CPU-core/4GB-RAM machines and 
we are able to index over 1 billion documents (per month) into a 
collection with 40 shards (=40 slices because we are not using 
replication) - 4 shards on each Solr node in the cluster. We still do 
not know how the system will behave when we have and cross-search many 
(up to 24 since we are supposed to keep data for 2 years before we can 
throw it away) collections with 1+ billion documents each.


Regards, Per Steffensen

On 12/18/12 8:20 PM, Scott Stults wrote:

I'm going to be building a Solr cluster and I want to have a rolling set of
slices so that I can keep a fixed number of days in my collection. If I
send an update to a particular slice leader, will it always hash the unique
key and (probably) forward the doc to another leader?


Thank you,
Scott





Will SolrCloud always slice by ID hash?

2012-12-18 Thread Scott Stults
I'm going to be building a Solr cluster and I want to have a rolling set of
slices so that I can keep a fixed number of days in my collection. If I
send an update to a particular slice leader, will it always hash the unique
key and (probably) forward the doc to another leader?


Thank you,
Scott


Re: Will SolrCloud always slice by ID hash?

2012-12-18 Thread Yonik Seeley
On Tue, Dec 18, 2012 at 2:20 PM, Scott Stults
sstu...@opensourceconnections.com wrote:
 I'm going to be building a Solr cluster and I want to have a rolling set of
 slices so that I can keep a fixed number of days in my collection. If I
 send an update to a particular slice leader, will it always hash the unique
 key and (probably) forward the doc to another leader?

Nope.  Flexibility is our middle name ;-)

Starting with 4.1 you will be able to do custom sharding.
If you send a document to any replica of a shard and don't indicate
it's for another shard, then it will assume it's for that shard (and
forward it to the leader replica for that shard if it's not the
leader).

-Yonik
http://lucidworks.com