Are you adding all new documents? If you are not updating documents at
all, take a look at your maxDocs vs numDocs, if they are not the same,
then you have overwritten some documents.

You may also be right that the exception you've seen could be the cause
of it, just thought the above is worth checking.

Upayavira

On Tue, Aug 4, 2015, at 03:06 PM, adfel70 wrote:
> Hello,
> I'm using solr 5.2.1
> I'm running indexing of a collection with 20 shards.
> around 1.7 billion docs should be indexed.
> the indexer is a mapreduce job that runs on yarn, running 60  concurrent
> containers.
> I index with bulks of 1000 docs and write logs for each bulk that was
> indexed.
> each such log message has all the ids of the solr docs that were in the
> bulk.
> 
> Such and indexing process finished without any errors, not in the indexer
> nor in solr.
> I have a data validation process that validates that solr has the correct
> number of docs as it should.
> I ran this process and got that some docs are missing.
> I figure out which docs are missing and went back to my logs and saw that
> these docs appeared in log messages of succeeded bulks.
> So I have the following scenario:
> 1. At a certain time during the indexing, a client used solrj to send a
> bulk
> of 1000 docs
> 2. the client got success for this operation
> 3. solr had no errors.
> 4. not all the data was indexed.
> 
> Further investigation of solr logs broguht me to a conclution that at all
> times that I had a bulk that had missing docs, solr had the following
> WARNING log:
> badMessage: java.lang.IllegalStateException: too much data after closed
> for
> HttpChannelOverHttp@5432494a
> 
> I saw this post:
> http://lucene.472066.n3.nabble.com/Too-much-data-after-closed-for-HttpChannelOverHttp-td4170459.html
> 
> I tried reducing the bulk size from 1000 to 200 as the post suggests
> (didn't
> go to runing each doc in a seperate .add call yet), with no success. In
> this
> try I'm getting the same WARNING, but now I also have regular errors such
> as
> NoHttpResponseExcpeption which is fine because the client also gets an
> error
> and I can handle this.
> 
> 
> Any inputs of this WARNING and the dataloss issue?
> 
> thanks.
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to