Re: Decision on Number of shards and collection

2018-04-14 Thread Shawn Heisey
On 4/13/2018 1:44 AM, neotorand wrote: Lets say i have 5 different entities and they have each 10,20,30,40 and 50 attributes(Columns) to be indexed/stored. Now if i store them in single collection.is there any ways empty spaces being created. On other way if i store heterogeneous data items in a

Re: Decision on Number of shards and collection

2018-04-13 Thread Erick Erickson
Having documents without fields doesn't matter much. Solr (well, Lucene actually) is pretty efficient about this. It handles thousands of different field types, although I have to say that when you have thousands of fields it's usually time to revisit the design. It looks like your total field cou

Re: Decision on Number of shards and collection

2018-04-13 Thread neotorand
Hi Shawn, Thanks for the long explanation. Now 2 Billion limit can be overcome by using shard. Now coming back to collection.Unless we have a logical or Business reason we should not go for more than one collection. Lets say i have 5 different entities and they have each 10,20,30,40 and 50 attri

Re: Decision on Number of shards and collection

2018-04-12 Thread Shawn Heisey
On 4/12/2018 4:57 AM, neotorand wrote: I read from the link you shared that "Shard cannot contain more than 2 billion documents since Lucene is using integer for internal IDs." In which java class of SOLR implimentaion repository this can be found. The 2 billion limit  is a *hard* limit from L

Re: Decision on Number of shards and collection

2018-04-12 Thread neotorand
Emir I read from the link you shared that "Shard cannot contain more than 2 billion documents since Lucene is using integer for internal IDs." In which java class of SOLR implimentaion repository this can be found. Regards Neo -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068

Re: Decision on Number of shards and collection

2018-04-12 Thread neotorand
Thanks every one for your beautifull explanation and valuable time. Thanks Emir for the Nice Link(http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html) Thanks Shawn for https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ When

Re: Decision on Number of shards and collection

2018-04-11 Thread SOLR4189
I advise you to read the book Solr in Action. To answer your question you need to take account server resources that you have (CPU, RAM and disk), take account index size and take account average size single doc. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Decision on Number of shards and collection

2018-04-11 Thread Emir Arnautović
Hi, Only you can tell what are acceptable query latency (I can tell you ideal - it is 0 :) Usually you start test with a single shard and start adding documents to it and measure query latency. When you start being close to max allowed latency, you have your shard size. Then you try to estimate

Re: Decision on Number of shards and collection

2018-04-11 Thread Erick Erickson
50M is a ballpark number I use as a place to _start_ getting a handle on capacity. It's useful solely to answer the "is it bigger than a breadbox and smaller than a house" question. It's totally meaningless without testing. Say I'm talking to a client and we have no data. Some are scared that thei

Re: Decision on Number of shards and collection

2018-04-11 Thread Abhi Basu
*The BKM I have read so far (trying to find source) says 50 million docs/shard performs well. I have found this in my recent tests as well. But of course it depends on index structure, etc.* On Wed, Apr 11, 2018 at 10:37 AM, Shawn Heisey wrote: > On 4/11/2018 4:15 AM, neotorand wrote: > > I bel

Re: Decision on Number of shards and collection

2018-04-11 Thread Shawn Heisey
On 4/11/2018 4:15 AM, neotorand wrote: > I believe heterogeneous data can be indexed to same collection and i can > have multiple shards for the index to be partitioned.So whats the need of a > second collection?. yes when collection size grows i should look for more > collection.what exactly that

Re: Decision on Number of shards and collection

2018-04-11 Thread neotorand
Hi Emir, Thanks a lot for your reply. so when i design a solr eco system i should start with some rough guess on shards and increase the number of shards to make performance better.what is the accepted/ideal Response Time.There should be a trade off between Response time and the number of shards as

Re: Decision on Number of shards and collection

2018-04-11 Thread Emir Arnautović
Hi Neo, Shard size determines query latency, so you split your index when queries become too slow. Distributed search comes with some overhead, so oversharding is not the way to go either. There is no hard rule what are the best numbers, but here are some thought how to approach this: http://w

Decision on Number of shards and collection

2018-04-11 Thread neotorand
Hi Team First of all i take this opportunity to thank you all for creating a beautiful place where people can explore ,learn and debate. I have been on my knees for couple of days to decide on this. When i am creating a solr cloud eco system i need to decide on number of shards and collection. Wh