Re: SolrCloud High Availability during indexing operation

Erick Erickson Wed, 25 Sep 2013 03:02:51 -0700

And do any of the documents have the same <uniqueKey>, which
is usually called "id"? Subsequent adds of docs with the same
<uniqueKey> replace the earlier one.


It's not definitive because it changes as merges happen, old copies
of docs that have been deleted or updated will be purged, but what
does your admin page show for "maxDoc"? If it's more than "numDocs"
then you have duplicate <uniqueKey>s. NOTE: if you optimize
(which you usually shouldn't) then maxDoc and numDocs will be
the same so if you test this don't optimize.

Best,
Erick


On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
<wun...@wunderwood.org> wrote:
> Did all of the curl update commands return success? Ane errors in the logs?
>
> wunder
>
> On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
>
>> Is it possible that some of those 80K docs were simply not valid? e.g.
>> had a wrong field, had a missing required field, anything like that?
>> What happens if you clear this collection and just re-run the same
>> indexing process and do everything else the same?  Still some docs
>> missing?  Same number?
>>
>> And what if you take 1 document that you know is valid and index it
>> 80K times, with a different ID, of course?  Do you see 80K docs in the
>> end?
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena <ssax...@gopivotal.com> 
>> wrote:
>>> Doc count did not change after I restarted the nodes. I am doing a single
>>> commit after all 80k docs. Using Solr 4.4.
>>>
>>> Regards,
>>> Saurabh
>>>
>>>
>>> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
>>> otis.gospodne...@gmail.com> wrote:
>>>
>>>> Interesting. Did the doc count change after you started the nodes again?
>>>> Can you tell us about commits?
>>>> Which version? 4.5 will be out soon.
>>>>
>>>> Otis
>>>> Solr & ElasticSearch Support
>>>> http://sematext.com/
>>>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena" <ssax...@gopivotal.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am testing High Availability feature of SolrCloud. I am using the
>>>>> following setup
>>>>>
>>>>> - 8 linux hosts
>>>>> - 8 Shards
>>>>> - 1 leader, 1 replica / host
>>>>> - Using Curl for update operation
>>>>>
>>>>> I tried to index 80K documents on replicas (10K/replica in parallel).
>>>>> During indexing process, I stopped 4 Leader nodes. Once indexing is done,
>>>>> out of 80K docs only 79808 docs are indexed.
>>>>>
>>>>> Is this an expected behaviour ? In my opinion replica should take care of
>>>>> indexing if leader is down.
>>>>>
>>>>> If this is an expected behaviour, any steps that can be taken from the
>>>>> client side to avoid such a situation.
>>>>>
>>>>> Regards,
>>>>> Saurabh Saxena
>>>>>
>>>>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>

Re: SolrCloud High Availability during indexing operation

Reply via email to