I have some docs that I know i've overwritten, but this is fine because this
is caused by some duplicate docs with same data and same id.

i know of dataloss because I know that a certain doc with certain id should
be in the index but it isnt.



Upayavira wrote
> Are you adding all new documents? If you are not updating documents at
> all, take a look at your maxDocs vs numDocs, if they are not the same,
> then you have overwritten some documents.
> 
> You may also be right that the exception you've seen could be the cause
> of it, just thought the above is worth checking.
> 
> Upayavira
> 
> On Tue, Aug 4, 2015, at 03:06 PM, adfel70 wrote:
>> Hello,
>> I'm using solr 5.2.1
>> I'm running indexing of a collection with 20 shards.
>> around 1.7 billion docs should be indexed.
>> the indexer is a mapreduce job that runs on yarn, running 60  concurrent
>> containers.
>> I index with bulks of 1000 docs and write logs for each bulk that was
>> indexed.
>> each such log message has all the ids of the solr docs that were in the
>> bulk.
>> 
>> Such and indexing process finished without any errors, not in the indexer
>> nor in solr.
>> I have a data validation process that validates that solr has the correct
>> number of docs as it should.
>> I ran this process and got that some docs are missing.
>> I figure out which docs are missing and went back to my logs and saw that
>> these docs appeared in log messages of succeeded bulks.
>> So I have the following scenario:
>> 1. At a certain time during the indexing, a client used solrj to send a
>> bulk
>> of 1000 docs
>> 2. the client got success for this operation
>> 3. solr had no errors.
>> 4. not all the data was indexed.
>> 
>> Further investigation of solr logs broguht me to a conclution that at all
>> times that I had a bulk that had missing docs, solr had the following
>> WARNING log:
>> badMessage: java.lang.IllegalStateException: too much data after closed
>> for
>> HttpChannelOverHttp@5432494a
>> 
>> I saw this post:
>> http://lucene.472066.n3.nabble.com/Too-much-data-after-closed-for-HttpChannelOverHttp-td4170459.html
>> 
>> I tried reducing the bulk size from 1000 to 200 as the post suggests
>> (didn't
>> go to runing each doc in a seperate .add call yet), with no success. In
>> this
>> try I'm getting the same WARNING, but now I also have regular errors such
>> as
>> NoHttpResponseExcpeption which is fine because the client also gets an
>> error
>> and I can handle this.
>> 
>> 
>> Any inputs of this WARNING and the dataloss issue?
>> 
>> thanks.
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723p4221289.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to