[jira] [Updated] (SOLR-10115) Corruption in read-side of SOLR-HDFS stack
[ https://issues.apache.org/jira/browse/SOLR-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden updated SOLR-10115: Component/s: hdfs Hadoop Integration > Corruption in read-side of SOLR-HDFS stack > -- > > Key: SOLR-10115 > URL: https://issues.apache.org/jira/browse/SOLR-10115 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Hadoop Integration, hdfs >Affects Versions: 4.4 >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Attachments: YCS_HdfsTest.java > > > I've been trying to track down some random AIOOB exceptions in Lucene for a > customer, and I've managed to reproduce the issue with a unit test of > sufficient size in conjunction with highly concurrent read requests. > A typical stack trace looks like: > {code} > org.apache.solr.common.SolrException; > java.lang.ArrayIndexOutOfBoundsException: 172033655 > at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:149) > at > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc(Lucene41PostingsReader.java:455) > at > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:111) > at > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:157) > {code} > The number of unique stack traces is relatively high, most AIOOB exceptions, > but some EOF. Most exceptions occur in the term index, however I believe > this may be just an artifact of where highly concurrent access is most likely > to occur. The queries that triggered this had many wildcards and other > multi-term queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10115) Corruption in read-side of SOLR-HDFS stack
[ https://issues.apache.org/jira/browse/SOLR-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-10115: Affects Version/s: (was: 4.10) 4.4 > Corruption in read-side of SOLR-HDFS stack > -- > > Key: SOLR-10115 > URL: https://issues.apache.org/jira/browse/SOLR-10115 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 4.4 >Reporter: Yonik Seeley >Assignee: Yonik Seeley > Attachments: YCS_HdfsTest.java > > > I've been trying to track down some random AIOOB exceptions in Lucene for a > customer, and I've managed to reproduce the issue with a unit test of > sufficient size in conjunction with highly concurrent read requests. > A typical stack trace looks like: > {code} > org.apache.solr.common.SolrException; > java.lang.ArrayIndexOutOfBoundsException: 172033655 > at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:149) > at > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc(Lucene41PostingsReader.java:455) > at > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:111) > at > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:157) > {code} > The number of unique stack traces is relatively high, most AIOOB exceptions, > but some EOF. Most exceptions occur in the term index, however I believe > this may be just an artifact of where highly concurrent access is most likely > to occur. The queries that triggered this had many wildcards and other > multi-term queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10115) Corruption in read-side of SOLR-HDFS stack
[ https://issues.apache.org/jira/browse/SOLR-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-10115: Attachment: YCS_HdfsTest.java Attaching unit test that can reproduce. It's not in patch form (and not ready to be committed) since it's the result of a port of one of my hacky lucene tests to solr and then a port of that to solr master, in conjunction with a lot of hacking/experimentation. It only fails about half of the time for me, and due to randomness takes entirely too long sometimes. It will need work before it can be committed. But the priority is finding the actual bug(s), so for that I'm going to start at lower levels and validate that they work correctly under high concurrency. If I disable the block cache, it seems like the errors disappear (this makes the BlockCache the prime suspect, but it's not a lock... decreased performance due to the missing block cache can also decrease the likelihood of seeing other concurrency issues). > Corruption in read-side of SOLR-HDFS stack > -- > > Key: SOLR-10115 > URL: https://issues.apache.org/jira/browse/SOLR-10115 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 4.10 >Reporter: Yonik Seeley > Attachments: YCS_HdfsTest.java > > > I've been trying to track down some random AIOOB exceptions in Lucene for a > customer, and I've managed to reproduce the issue with a unit test of > sufficient size in conjunction with highly concurrent read requests. > A typical stack trace looks like: > {code} > org.apache.solr.common.SolrException; > java.lang.ArrayIndexOutOfBoundsException: 172033655 > at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:149) > at > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc(Lucene41PostingsReader.java:455) > at > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:111) > at > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:157) > {code} > The number of unique stack traces is relatively high, most AIOOB exceptions, > but some EOF. Most exceptions occur in the term index, however I believe > this may be just an artifact of where highly concurrent access is most likely > to occur. The queries that triggered this had many wildcards and other > multi-term queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org