Re: hdfs - documents missing after hard poweroff

Kevin Risden Wed, 31 Oct 2018 17:22:06 -0700

Also do you have auto add replicas turned on for these collections over
HDFS?


Kevin Risden


On Wed, Oct 31, 2018 at 8:20 PM Kevin Risden <kris...@apache.org> wrote:

> So I'm definitely curious what is going on here.
>
> Are you still able to reproduce this? Can you check if files have been
> modified on HDFS? I'd be curious if tlogs or the index is changing
> underneath for the different restarts. Since there is no new indexing I
> would guess not but something to check.
>
> Can you run check index on the index to make sure its not corrupt when you
> don't get the full result set.
>
> Kevin Risden
>
>
> On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham <kyle.frans...@superna.net>
> wrote:
>
>> Hi,
>>
>> Sometimes after a full poweroff of the solr cloud nodes, we see missing
>> documents from the index. Is there anything about our setup or our
>> recovery
>> procedure that could cause this? Details are below:
>>
>> We see the following (somewhat random) behaviour:
>>
>>  - add 10 documents to index. Commit.
>>  - query for all documents - 10 documents returned.
>>  - restart all solr nodes and reset the collection (procedure is below).
>>  - query for all  documents 10 documents returned.
>>  - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.
>>
>> To summarize, after a full reboot of all the solr nodes, we are finding
>> that (sometimes) not all documents are in the index. This situation
>> doesn't
>> remedy itself by waiting. Restarting all will sometimes re-add them,
>> sometimes not.
>>
>> Our procedure for recovering from a hard poweroff is:
>>  - manually delete all *.lock files from the index folders on hdfs.
>>  - fully delete the znode from zookeeper.
>>  - re-add an empty znode in zookeeper.
>>  - start up all solr nodes.
>>  - re-add the configset.
>>  - re-issue the collection create command.
>>
>> After doing the above, we find that we are able to see all of the files in
>> the index about 60% of the time. Other times, we are missing some
>> documents.
>>
>> Some other things about our environment:
>>  - we're doing this test with 1 collection that has 18 shards distributed
>> across 3 solr cloud nodes.
>>  - solr version 7.5.0
>>  - hdfs is not running on the solr nodes, and is not being restarted.
>>
>> Any thoughts or tips are greatly appreciated,
>>
>> Kyle
>>
>> --
>> CONFIDENTIALITY NOTICE: The information contained in this email is
>> privileged and confidential and intended only for the use of the
>> individual
>> or entity to whom it is addressed.   If you receive this message in
>> error,
>> please notify the sender immediately at 613-729-1100 and destroy the
>> original message and all copies. Thank you.
>>
>

Re: hdfs - documents missing after hard poweroff

Reply via email to