[ https://issues.apache.org/jira/browse/SOLR-13471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man updated SOLR-13471: ---------------------------- Description: Diagnoses/hypothosis summarized/re-worded from comment below... https://issues.apache.org/jira/browse/SOLR-13471?focusedCommentId=16840999&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16840999 * Assume request execution happens successfully * then, when the QueryResponseWriter goes to marshal the response, assume there is some exception writing the results -- perhaps due to the Resolver w/ ResultContext + Searcher (could be an IOException) ** the Exception from attempting to write the response may propogate all the way up to the "try" in HttpSolrCall.call() ... which once caught is then passed to "sendError" ** sendError creates a completely new SolrQueryResponse, sets the exception on it, and asks the QueryResponseWriter to write it out *** *BUT* the OutputStream has already had a bunch of data written to it ... which may be resulting in a byte sequence that confuses the unmarshal code and may result in weird exceptions -- *or worse: validly structured, but corrupt data* The fundemental problem is that when HttpSolrCall to "start over" writtng the response – but the JavaBinCodec doesn't have any way to recognize that ... it can read partial data and then be confused by the "new" data that comes after it *Perhaps there should be a special "TAG" that means "Ignore everything you've already recieved and start over" that we should emit at the begining of every marshal() call?* ---- Original bug report {panel} I haven't dug into this, or been able to reproduce it (let alone get any additional logging/debugging/breakpoint info) but in a few very sporadic, very rare, instances of running TestReplicationHandlerDiskOverFlow, I triggered ClassCastException's in JavaBinCodec unmarshal code -- indicating that there is some disconnect in expectations between the marshal & unmarshal code paths... {noformat} [junit4] 2> 13342 ERROR (Thread-19) [ ] o.a.s.h.TestReplicationHandlerDiskOverFlow Query Thread Failure [junit4] 2> => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:54505/solr/collection1: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap') [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622) [junit4] 2> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:54505/solr/collection1: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap') [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002) ~[java/:?] [junit4] 2> at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154) ~[test/:?] [junit4] 2> at java.lang.Thread.run(Thread.java:834) [?:?] [junit4] 2> Caused by: java.lang.ClassCastException: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap') [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:223) ~[java/:?] [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:298) ~[java/:?] [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:280) ~[java/:?] [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:193) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:620) ~[java/:?] [junit4] 2> ... 7 more {noformat} It's possible that something about the test code (or code being tested) is doing something it should not be doing, and adding something "unexpeted" to the response object -- but either the unmarshal code needs to be _as forgiving_ as marshal code, or the marshal code should fail fast -- not produce a binary stream that the unmarshal code can't parse. {panel} was: I haven't dug into this, or been able to reproduce it (let alone get any additional logging/debugging/breakpoint info) but in a few very sporadic, very rare, instances of running TestReplicationHandlerDiskOverFlow, I triggered ClassCastException's in JavaBinCodec unmarshal code -- indicating that there is some disconnect in expectations between the marshal & unmarshal code paths... {noformat} [junit4] 2> 13342 ERROR (Thread-19) [ ] o.a.s.h.TestReplicationHandlerDiskOverFlow Query Thread Failure [junit4] 2> => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:54505/solr/collection1: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap') [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622) [junit4] 2> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:54505/solr/collection1: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap') [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002) ~[java/:?] [junit4] 2> at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154) ~[test/:?] [junit4] 2> at java.lang.Thread.run(Thread.java:834) [?:?] [junit4] 2> Caused by: java.lang.ClassCastException: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap') [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:223) ~[java/:?] [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:298) ~[java/:?] [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:280) ~[java/:?] [junit4] 2> at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:193) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:620) ~[java/:?] [junit4] 2> ... 7 more {noformat} It's possible that something about the test code (or code being tested) is doing something it should not be doing, and adding something "unexpeted" to the response object -- but either the unmarshal code needs to be _as forgiving_ as marshal code, or the marshal code should fail fast -- not produce a binary stream that the unmarshal code can't parse. Summary: exceptions during response writing can cause javabin to write a corrupt/missleading response that may caus exceptions or non-sense data when unmarshaled by javabin (was: javabin codec can marshal data that it can't unmarshal (ClassCastException: class java.lang.Boolean cannot be cast to class java.lang.String)) > exceptions during response writing can cause javabin to write a > corrupt/missleading response that may caus exceptions or non-sense data when > unmarshaled by javabin > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-13471 > URL: https://issues.apache.org/jira/browse/SOLR-13471 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > Priority: Major > Attachments: SOLR-13471.patch, SOLR-13471.patch > > > Diagnoses/hypothosis summarized/re-worded from comment below... > https://issues.apache.org/jira/browse/SOLR-13471?focusedCommentId=16840999&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16840999 > * Assume request execution happens successfully > * then, when the QueryResponseWriter goes to marshal the response, assume > there is some exception writing the results -- perhaps due to the Resolver w/ > ResultContext + Searcher (could be an IOException) > ** the Exception from attempting to write the response may propogate all the > way up to the "try" in HttpSolrCall.call() ... which once caught is then > passed to "sendError" > ** sendError creates a completely new SolrQueryResponse, sets the exception > on it, and asks the QueryResponseWriter to write it out > *** *BUT* the OutputStream has already had a bunch of data written to it ... > which may be resulting in a byte sequence that confuses the unmarshal code > and may result in weird exceptions -- *or worse: validly structured, but > corrupt data* > The fundemental problem is that when HttpSolrCall to "start over" writtng the > response – but the JavaBinCodec doesn't have any way to recognize that ... it > can read partial data and then be confused by the "new" data that comes after > it > *Perhaps there should be a special "TAG" that means "Ignore everything you've > already recieved and start over" that we should emit at the begining of every > marshal() call?* > ---- > Original bug report > {panel} > I haven't dug into this, or been able to reproduce it (let alone get any > additional logging/debugging/breakpoint info) but in a few very sporadic, > very rare, instances of running TestReplicationHandlerDiskOverFlow, I > triggered ClassCastException's in JavaBinCodec unmarshal code -- indicating > that there is some disconnect in expectations between the marshal & unmarshal > code paths... > {noformat} > [junit4] 2> 13342 ERROR (Thread-19) [ ] > o.a.s.h.TestReplicationHandlerDiskOverFlow Query Thread Failure > [junit4] 2> => > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:54505/solr/collection1: class > java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean > and java.lang.String are in module java.base of loader 'bootstrap') > [junit4] 2> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622) > [junit4] 2> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:54505/solr/collection1: class > java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean > and java.lang.String are in module java.base of loader 'bootstrap') > [junit4] 2> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622) > ~[java/:?] > [junit4] 2> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > ~[java/:?] > [junit4] 2> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > ~[java/:?] > [junit4] 2> at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207) > ~[java/:?] > [junit4] 2> at > org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987) ~[java/:?] > [junit4] 2> at > org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002) ~[java/:?] > [junit4] 2> at > org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154) > ~[test/:?] > [junit4] 2> at java.lang.Thread.run(Thread.java:834) [?:?] > [junit4] 2> Caused by: java.lang.ClassCastException: class > java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean > and java.lang.String are in module java.base of loader 'bootstrap') > [junit4] 2> at > org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:223) > ~[java/:?] > [junit4] 2> at > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:298) > ~[java/:?] > [junit4] 2> at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:280) > ~[java/:?] > [junit4] 2> at > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:193) > ~[java/:?] > [junit4] 2> at > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50) > ~[java/:?] > [junit4] 2> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:620) > ~[java/:?] > [junit4] 2> ... 7 more > > {noformat} > It's possible that something about the test code (or code being tested) is > doing something it should not be doing, and adding something "unexpeted" to > the response object -- but either the unmarshal code needs to be _as > forgiving_ as marshal code, or the marshal code should fail fast -- not > produce a binary stream that the unmarshal code can't parse. > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org