Re: SolrCloud Shard console shows roughly same number of documents?

Siddhartha Singh Sandhu Tue, 31 May 2016 08:54:35 -0700

Hi,

I was speculating whether sharding is done on:
1. index terms with each shard having the whole document space.
2. document space with each shard have num(documents/no. of shards) of the
documents divided between them.


Regards,

Sid.

On Tue, May 31, 2016 at 9:27 AM, Siddhartha Singh Sandhu <
sandhus...@gmail.com> wrote:

> Thank you.
>
> On Mon, May 30, 2016 at 11:15 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> You should have:
>> shard1_replica1 + shard2_replica1 = 50 ?
>>
>> On Sat, May 28, 2016 at 9:58 AM, Siddhartha Singh Sandhu
>> <sandhus...@gmail.com> wrote:
>> > Still struggling with this. Bump. :)
>> >
>> > On Thu, May 26, 2016 at 3:53 PM, Siddhartha Singh Sandhu
>> > <sandhus...@gmail.com> wrote:
>> >>
>> >> Hi Erick,
>> >>
>> >> Thank you for the reply. What I meant was suppose I have the config:
>> >>
>> >> 2 shards each with 1 replica.
>> >>
>> >> Hence, on both servers I have
>> >> 1.  shard1_replica1
>> >> 2 . shard2_replica1
>> >>
>> >> Suppose I have 50 documents then,
>> >> shard1_replica1 + shard2_replica1 = 50 ?
>> >>
>> >> or shard2_replica1 = 50 && shard1_replica1 = 50 ?
>> >>
>> >> Regards,
>> >>
>> >> Sid.
>> >>
>> >> On Thu, May 26, 2016 at 2:30 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Q1: Not quite sure what you mean. Let's say I have 2 shards, 3
>> >>> replicas each 16 docs on each.I _think_ you're
>> >>> talking about the "core selector", which shows the docs on that
>> >>> particular core, 16 in our case not 48.
>> >>>
>> >>> Q2: Yes, that's how SolrCloud is designed. It has to be for HA/DR.
>> >>> Every replica in a shard has all the docs, 16 as above. Otherwise if
>> >>> one of your machines went down there could be no guarantee even
>> >>> attempted about there not being data loss.
>> >>>
>> >>> Q3: Yes, indexing will be slower when there is more than one replica
>> >>> per shard since the raw document is forwarded from the leader to all
>> >>> followers before acking back. In distributed situations, you will have
>> >>> a bunch (potentially) more machines doing indexing so total throughput
>> >>> can be faster.
>> >>>
>> >>> Why do you care? Is there a problem or is this just general background
>> >>> info? There are a number of techniques for speeding up indexing, the
>> >>> first is to use SolrJ and CloudSolrClient and send batches of docs at
>> >>> once rather than one-at-a-time.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Wed, May 25, 2016 at 1:54 PM, Siddhartha Singh Sandhu
>> >>> <sandhus...@gmail.com> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > I recently moved to a SolrCloud config. I had a few questions:
>> >>> >
>> >>> > Q1. Does a shard show cumulative number of documents or documents
>> >>> > present
>> >>> > in that particular shard on the admin console of respective shard?
>> >>> >
>> >>> > Q2. If 1's answer is non-cumulative then my shards(on different
>> >>> > servers)
>> >>> > are indexing all the documents on each instance of shard. Is this
>> >>> > natural?
>> >>> > I created the shards with compositeId.
>> >>> >
>> >>> > Q3. If the answer to 1 is cumulative then my indexing was slower
>> then a
>> >>> > single core instance which was on the same machine of which I have 2
>> >>> >  now(my shards). What could I be missing while configuring Solr?
>> >>> >
>> >>> >
>> >>> > I am using Solr 6.0.0 on Ubuntu 14.04 with external zookeeper.
>> >>> >
>> >>> > Regards,
>> >>> >
>> >>> > Sid.
>> >>
>> >>
>> >
>>
>
>

Re: SolrCloud Shard console shows roughly same number of documents?

Reply via email to