Can you share the test please On Thu, Nov 21, 2019 at 7:02 AM Noble Paul <[email protected]> wrote: > > Thanks Colvin, I'll take a look > > On Thu, Nov 21, 2019 at 4:24 AM Colvin Cowie <[email protected]> > wrote: > > > > I've identified the change which has caused the problem to materialize, but > > it shouldn't itself cause a problem. > > > > https://github.com/apache/lucene-solr/commit/e45e8127d5c17af4e4b87a0a4eaf0afaf4f9ff4b#diff-7f7f485122d8257bd5d3210c092b967fR52 > > for https://issues.apache.org/jira/browse/SOLR-13682 > > > > In writeMap, the new BiConsumer unwraps the SolrInputField using getValue > > rather than getRawValue (which the JavaBinCodec calls): > > > > > > * if (o instanceof SolrInputField) { o = ((SolrInputField) > > o).getValue(); }* > > As a result the JavaBinCodec will now be hitting different writer methods > > based on the value retrieved from the SolrInputField, rather than just > > writing the org.apache.solr.common.util.JavaBinCodec.writeKnownType(Object) > > > > > > * if (val instanceof SolrInputField) { return > > writeKnownType(((SolrInputField) val).getRawValue()); }* > > https://github.com/apache/lucene-solr/blob/branch_8_3/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L362 > > > > SolrInputField getValue uses > > org.apache.solr.common.util.ByteArrayUtf8CharSequence.convertCharSeq(Object) > > while getRawValue just returns whatever value the SolrInputField has, so > > the EntryWriter in the JavaBinCodec hits different paths from the ones > > which must non-deterministically produce garbage data when getValue() is > > used. > > > > Changing *getValue()* to *getRawValue()* in the SolrInputDocument's > > *writeMap()* appears to "fix" the problem. (With getValue() the test I have > > reliably fails within 50 iterations of indexing 2500 documents, with > > getRawValue() it succeeds for the 500 iterations I'm running it for) > > > > I'll see about providing a test that can be shared that demonstrates the > > problem, and see if we can find what is going wrong in the codec... > > > > > > On Tue, 19 Nov 2019 at 13:48, Colvin Cowie <[email protected]> > > wrote: > > > > > Hello > > > > > > Apologies for the lack of actual detail in this, we're still digging into > > > it ourselves. I will provide more detail, and maybe some logs, once I have > > > a better idea of what is actually happening. > > > But I thought I might as well ask if anyone knows of changes that were > > > made in the Solr 8.3 release that are likely to have caused an issue like > > > this? > > > > > > We were on Solr 8.1.1 for several months and moved to 8.2.0 for about 2 > > > weeks before moving to 8.3.0 last week. > > > We didn't see this issue at all on the previous releases. Since moving to > > > 8.3 we have had a consistent (but non-deterministic) set of failing tests, > > > on Windows and Linux. > > > > > > The issue we are seeing as that during updates, the data we have sent is > > > *sometimes* corrupted, as though a buffer has been used incorrectly. For > > > example if the well formed data went was > > > *'fieldName':"this is a long string"* > > > The error we see from Solr might be that > > > unknown field * 'fieldNamis a long string" * > > > > > > And variations of that kind of behaviour, were part of the data is missing > > > or corrupted. The data we are indexing does include fields which store > > > (escaped) serialized JSON strings - if that might have any bearing - but > > > the error isn't always on those fields. > > > For example, given a valid document that looks like this (I've replaced > > > the values by hand, so if the json is messed up here, that's not > > > relevant:) > > > when returned with the json response writer: > > > > > > > > > > > > > > > *{ "id": "abcd", "testField": "blah", "jsonField": > > > "{\"thing\":{\"abcd\":\"value\",\"xyz\":[\"abc\",\"def\",\"ghi\"],\"nnn\":\"xyz\"},\"stuff\":[{\"qqq\":\"rrr\"}],\"ttt\":0,\"mmm\":\"Some > > > string\",\"someBool\":true}"}* > > > We've had errors during indexing like: > > > *unknown field > > > 'testField:"value","xyz":["abc","def","ghi"],"nnn":"xyz"},"stuff":[{"qqq":"rrr"}],"ttt":0,"mmm":"Some > > > string","someBool":true}���������������������������'* > > > (those � unprintable characters are part of it) > > > > > > So far we've not been able to reproduce the problem on a collection with a > > > single shard, so it does seem like the problem is only happening > > > internally > > > when updates are distributed to the other shards... But that's not been > > > totally verified. > > > > > > We've also only encountered the problem on one of the collections we build > > > (the data within each collection is generally the same though. The ids are > > > slightly different - but still strings. The main difference is that this > > > problematic index is built using an Iterator<SolrInputDocument> to *solrj > > > org.apache.solr.client.solrj.SolrClient.add(String, > > > Iterator<SolrInputDocument>)* - the *SolrInputDocument*s are not being > > > reused in the client, I checked that -, while the other index is built by > > > streaming CSVs to Solr.) > > > > > > > > > We will look into it further, but if anyone has any ideas of what might > > > have changed in 8.3 from 8.1 / 8.2 that could cause this, that would be > > > helpful. > > > > > > Cheers > > > Colvin > > > > > > > > > > -- > ----------------------------------------------------- > Noble Paul
-- ----------------------------------------------------- Noble Paul
