Hello, I'm using solr 5.2.1 I'm running indexing of a collection with 20 shards. around 1.7 billion docs should be indexed. the indexer is a mapreduce job that runs on yarn, running 60 concurrent containers. I index with bulks of 1000 docs and write logs for each bulk that was indexed. each such log message has all the ids of the solr docs that were in the bulk.
Such and indexing process finished without any errors, not in the indexer nor in solr. I have a data validation process that validates that solr has the correct number of docs as it should. I ran this process and got that some docs are missing. I figure out which docs are missing and went back to my logs and saw that these docs appeared in log messages of succeeded bulks. So I have the following scenario: 1. At a certain time during the indexing, a client used solrj to send a bulk of 1000 docs 2. the client got success for this operation 3. solr had no errors. 4. not all the data was indexed. Further investigation of solr logs broguht me to a conclution that at all times that I had a bulk that had missing docs, solr had the following WARNING log: badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@5432494a I saw this post: http://lucene.472066.n3.nabble.com/Too-much-data-after-closed-for-HttpChannelOverHttp-td4170459.html I tried reducing the bulk size from 1000 to 200 as the post suggests (didn't go to runing each doc in a seperate .add call yet), with no success. In this try I'm getting the same WARNING, but now I also have regular errors such as NoHttpResponseExcpeption which is fine because the client also gets an error and I can handle this. Any inputs of this WARNING and the dataloss issue? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723.html Sent from the Solr - User mailing list archive at Nabble.com.