*> the difference is because the _default config has the dynamic schema building in it, which I assume is pushing it down a different code path. *
Also to add to that, I assumed initially that this just meant that it was working because the corrupted field names would just cause it to create a field with the dodgy name (since that's the idea for the dynamic schema), but checking the documents on retrieval showed they all had the right field names... So I assume it's a result of going into a different branch of code instead. On an unrelated matter, I saw this in the logs when running with embedded zookeeper... I don't think I've seen it mentioned anywhere else, so I will raise an issue for it *2019-11-21 17:25:14.292 INFO (main) [ ] o.a.s.c.SolrZkServer STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 99832019-11-21 17:25:14.792 INFO (main) [ ] o.a.s.c.ZkContainer Zookeeper client=localhost:99832019-11-21 17:25:18.833 WARN (Thread-13) [ ] o.a.z.s.a.AdminServerFactory Unable to load jetty, not starting JettyAdminServer => java.lang.NoClassDefFoundError: org/eclipse/jetty/server/Connector at java.lang.Class.forName0(Native Method)java.lang.NoClassDefFoundError: org/eclipse/jetty/server/Connector at java.lang.Class.forName0(Native Method) ~[?:1.8.0_191] at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_191] at org.apache.zookeeper.server.admin.AdminServerFactory.createAdminServer(AdminServerFactory.java:43) ~[?:?] at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:136) ~[?:?] at org.apache.solr.cloud.SolrZkServer$1.run(SolrZkServer.java:121) ~[?:?]Caused by: java.lang.ClassNotFoundException: org.eclipse.jetty.server.Connector at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:577) ~[jetty-webapp-9.4.19.v20190610.jar:9.4.19.v20190610] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_191] ... 5 more2019-11-21 17:25:19.365 INFO (main) [ ] o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper2019-11-21 17:25:19.396 INFO (zkConnectionManagerCallback-7-thread-1) [ ] o.a.s.c.c.ConnectionManager zkClient has connected2019-11-21 17:25:19.396 INFO (main) [ ] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper* On Thu, 21 Nov 2019 at 17:30, Colvin Cowie <[email protected]> wrote: > I've been a bit snowed under, but I've found the difference is because the > _default config has the dynamic schema building in it, which I assume is > pushing it down a different code path. > > <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" > default="${update.autoCreateFields:true}" > > > processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields"> > > I'm using the vanilla Solr 8.3.0 binary8.3.0 > 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:15:22 with > Eclipse OpenJ9 Eclipse OpenJ9 VM 1.8.0_232 openj9-0.17.0 > and I've checked with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM > 1.8.0_191 25.191-b12 as well > > I've put a testcase and configsets in Google Drive: > https://drive.google.com/open?id=1ibKNWvowT8cXTwSa3bcTwKYLSRNur86U > The configsets are a copy of the _default configset, except the "problem" > configset has autoCreateFields set to false. > I created a collection with 4 shards, replication factor 1 for each > configset. The test case reliably fails on the "problem" collection and > reliably passes against the "no_problem" collection. > > The test (well it's not actually a @Test but still) has static data > (though it was originally generated randomly). The data is a bit mad... but > it was easier to reproduce the problem reliably with this data, than with > the normal documents we use in our product. > Each document has a different (dynamically named) field to index data > into, but it's the same data in each field. > The problem only appears (or probably is just more likely to appear?) when > the field names in the request are of different lengths. > The length / value of the data doesn't appear to matter. Or is less > impactful than variations in the field names. > *If you run the test 10 times you will see a variety of different errors. > i.e. it's not the same error every time.* > I've included some examples of the errors in the Drive folder. One of the > most fundamental (and probably points at the root cause) is this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *2019-11-21 17:02:53.720 ERROR > (updateExecutor-3-thread-6-processing-x:problem_collection_shard2_replica_n2 > r:core_node5 null n:10.0.75.1:8983_solr c:problem_collection s:shard2) > [c:problem_collection s:shard2 r:core_node5 > x:problem_collection_shard2_replica_n2] > o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling > SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=ForwardNode: > http://10.0.75.1:8983/solr/problem_collection_shard3_replica_n4/ > <http://10.0.75.1:8983/solr/problem_collection_shard3_replica_n4/> to > http://10.0.75.1:8983/solr/problem_collection_shard3_replica_n4/ > <http://10.0.75.1:8983/solr/problem_collection_shard3_replica_n4/> => > java.lang.StringIndexOutOfBoundsException at > java.lang.String.<init>(String.java:668)java.lang.StringIndexOutOfBoundsException: > null at java.lang.String.<init>(String.java:668) ~[?:1.8.0_232] at > org.noggit.CharArr.toString(CharArr.java:182) ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.lambda$getStringProvider$1(JavaBinCodec.java:966) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec$$Lambda$668.0000000000000000.apply(Unknown > Source) ~[?:?] at > org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr(ByteArrayUtf8CharSequence.java:156) > ~[?:?] at > org.apache.solr.common.util.ByteArrayUtf8CharSequence.toString(ByteArrayUtf8CharSequence.java:235) > ~[?:?] at > org.apache.solr.common.util.ByteArrayUtf8CharSequence.convertCharSeq(ByteArrayUtf8CharSequence.java:215) > ~[?:?] at > org.apache.solr.common.SolrInputField.getValue(SolrInputField.java:128) > ~[?:?] at > org.apache.solr.common.SolrInputDocument.lambda$writeMap$0(SolrInputDocument.java:55) > ~[?:?] at > org.apache.solr.common.SolrInputDocument$$Lambda$743.000000002774E7B0.accept(Unknown > Source) ~[?:?] at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684) > ~[?:1.8.0_232] at > org.apache.solr.common.SolrInputDocument.writeMap(SolrInputDocument.java:59) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:658) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:813) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:411) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:750) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:395) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:248) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253) > ~[?:?] at > org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:167) > ~[?:?] at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:102) > ~[?:?] at > org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83) > ~[?:?] at > org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:340) > ~[?:?] at > org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:231) > ~[?:?]* > > And > > > > > > > > > > > > *java.lang.StringIndexOutOfBoundsException: String index out of range: 39 > at java.lang.String.<init>(String.java:205) at > org.noggit.CharArr.toString(CharArr.java:182) at > org.apache.solr.common.util.JavaBinCodec._readStr(JavaBinCodec.java:929) at > org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:918) at > org.apache.solr.common.util.JavaBinCodec.readExternString(JavaBinCodec.java:1194) > at > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303) > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:281) > at > org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:625) > at > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:340) > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:281) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:321)* > > *...* > > Sometimes the indexing will succeed because of the nature of the dynamic > field, but retrieving the documents show that the field names have been > corrupted: > Exception in thread "main" java.lang.IllegalStateException: Doc 224 does > not have field *name_wmJmiiWghggUHmNiQAg_prop_s* it has [id, > *SomebodymiiWghggUHmNiQAg_prop_s*, _version_] > Which is a concatenation of the data value "Somebody" from some record, > and part of the actual field name *name_wmJ**miiWghggUHmNiQAg_prop_s * > > On Thu, 21 Nov 2019 at 13:16, Jason Gerlowski <[email protected]> > wrote: > >> Very curious what the config change that's related to reproducing this >> looks like. Maybe it's something that is worth adding >> test-randomization around? Just thinking aloud. >> >
