Hi, I was speculating whether sharding is done on: 1. index terms with each shard having the whole document space. 2. document space with each shard have num(documents/no. of shards) of the documents divided between them.
Regards, Sid. On Tue, May 31, 2016 at 9:27 AM, Siddhartha Singh Sandhu < sandhus...@gmail.com> wrote: > Thank you. > > On Mon, May 30, 2016 at 11:15 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> You should have: >> shard1_replica1 + shard2_replica1 = 50 ? >> >> On Sat, May 28, 2016 at 9:58 AM, Siddhartha Singh Sandhu >> <sandhus...@gmail.com> wrote: >> > Still struggling with this. Bump. :) >> > >> > On Thu, May 26, 2016 at 3:53 PM, Siddhartha Singh Sandhu >> > <sandhus...@gmail.com> wrote: >> >> >> >> Hi Erick, >> >> >> >> Thank you for the reply. What I meant was suppose I have the config: >> >> >> >> 2 shards each with 1 replica. >> >> >> >> Hence, on both servers I have >> >> 1. shard1_replica1 >> >> 2 . shard2_replica1 >> >> >> >> Suppose I have 50 documents then, >> >> shard1_replica1 + shard2_replica1 = 50 ? >> >> >> >> or shard2_replica1 = 50 && shard1_replica1 = 50 ? >> >> >> >> Regards, >> >> >> >> Sid. >> >> >> >> On Thu, May 26, 2016 at 2:30 PM, Erick Erickson < >> erickerick...@gmail.com> >> >> wrote: >> >>> >> >>> Q1: Not quite sure what you mean. Let's say I have 2 shards, 3 >> >>> replicas each 16 docs on each.I _think_ you're >> >>> talking about the "core selector", which shows the docs on that >> >>> particular core, 16 in our case not 48. >> >>> >> >>> Q2: Yes, that's how SolrCloud is designed. It has to be for HA/DR. >> >>> Every replica in a shard has all the docs, 16 as above. Otherwise if >> >>> one of your machines went down there could be no guarantee even >> >>> attempted about there not being data loss. >> >>> >> >>> Q3: Yes, indexing will be slower when there is more than one replica >> >>> per shard since the raw document is forwarded from the leader to all >> >>> followers before acking back. In distributed situations, you will have >> >>> a bunch (potentially) more machines doing indexing so total throughput >> >>> can be faster. >> >>> >> >>> Why do you care? Is there a problem or is this just general background >> >>> info? There are a number of techniques for speeding up indexing, the >> >>> first is to use SolrJ and CloudSolrClient and send batches of docs at >> >>> once rather than one-at-a-time. >> >>> >> >>> Best, >> >>> Erick >> >>> >> >>> On Wed, May 25, 2016 at 1:54 PM, Siddhartha Singh Sandhu >> >>> <sandhus...@gmail.com> wrote: >> >>> > Hi, >> >>> > >> >>> > I recently moved to a SolrCloud config. I had a few questions: >> >>> > >> >>> > Q1. Does a shard show cumulative number of documents or documents >> >>> > present >> >>> > in that particular shard on the admin console of respective shard? >> >>> > >> >>> > Q2. If 1's answer is non-cumulative then my shards(on different >> >>> > servers) >> >>> > are indexing all the documents on each instance of shard. Is this >> >>> > natural? >> >>> > I created the shards with compositeId. >> >>> > >> >>> > Q3. If the answer to 1 is cumulative then my indexing was slower >> then a >> >>> > single core instance which was on the same machine of which I have 2 >> >>> > now(my shards). What could I be missing while configuring Solr? >> >>> > >> >>> > >> >>> > I am using Solr 6.0.0 on Ubuntu 14.04 with external zookeeper. >> >>> > >> >>> > Regards, >> >>> > >> >>> > Sid. >> >> >> >> >> > >> > >