I have some docs that I know i've overwritten, but this is fine because this is caused by some duplicate docs with same data and same id.
i know of dataloss because I know that a certain doc with certain id should be in the index but it isnt. Upayavira wrote > Are you adding all new documents? If you are not updating documents at > all, take a look at your maxDocs vs numDocs, if they are not the same, > then you have overwritten some documents. > > You may also be right that the exception you've seen could be the cause > of it, just thought the above is worth checking. > > Upayavira > > On Tue, Aug 4, 2015, at 03:06 PM, adfel70 wrote: >> Hello, >> I'm using solr 5.2.1 >> I'm running indexing of a collection with 20 shards. >> around 1.7 billion docs should be indexed. >> the indexer is a mapreduce job that runs on yarn, running 60 concurrent >> containers. >> I index with bulks of 1000 docs and write logs for each bulk that was >> indexed. >> each such log message has all the ids of the solr docs that were in the >> bulk. >> >> Such and indexing process finished without any errors, not in the indexer >> nor in solr. >> I have a data validation process that validates that solr has the correct >> number of docs as it should. >> I ran this process and got that some docs are missing. >> I figure out which docs are missing and went back to my logs and saw that >> these docs appeared in log messages of succeeded bulks. >> So I have the following scenario: >> 1. At a certain time during the indexing, a client used solrj to send a >> bulk >> of 1000 docs >> 2. the client got success for this operation >> 3. solr had no errors. >> 4. not all the data was indexed. >> >> Further investigation of solr logs broguht me to a conclution that at all >> times that I had a bulk that had missing docs, solr had the following >> WARNING log: >> badMessage: java.lang.IllegalStateException: too much data after closed >> for >> HttpChannelOverHttp@5432494a >> >> I saw this post: >> http://lucene.472066.n3.nabble.com/Too-much-data-after-closed-for-HttpChannelOverHttp-td4170459.html >> >> I tried reducing the bulk size from 1000 to 200 as the post suggests >> (didn't >> go to runing each doc in a seperate .add call yet), with no success. In >> this >> try I'm getting the same WARNING, but now I also have regular errors such >> as >> NoHttpResponseExcpeption which is fine because the client also gets an >> error >> and I can handle this. >> >> >> Any inputs of this WARNING and the dataloss issue? >> >> thanks. >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723.html >> Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723p4221289.html Sent from the Solr - User mailing list archive at Nabble.com.