[ 
https://issues.apache.org/jira/browse/OAK-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565915#comment-14565915
 ] 

Alex Parvulescu commented on OAK-2934:
--------------------------------------

My proposal is to try to change the generated query, no longer a 
MultiPhraseQuery made of terms extracted from the index itself, but a 
BooleanQuery made of several clauses, be it TermQuery or WildcardQuery.
So given the original example U=1* the query would change from
{code}
:fulltext:"u ( [set of all tokens] )"
{code}
to
{code}
+:fulltext:u +:fulltext:1*
{code}
the main benefit here being that the WildcardQuery uses a 
CONSTANT_SCORE_REWRITE [0], possibly bypassing the expensive OOME inducing 
operation.
cc [~chetanm], [~teofili], [~mreutegg]

[0] 
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?view=markup#l37

> Certain searches cause lucene index to hit OutOfMemoryError
> -----------------------------------------------------------
>
>                 Key: OAK-2934
>                 URL: https://issues.apache.org/jira/browse/OAK-2934
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Alex Parvulescu
>            Assignee: Alex Parvulescu
>         Attachments: LuceneIndex.java.patch
>
>
> Certain search terms can get split into very small wildcard tokens that will 
> match a huge amount of items from the index, finally resulting in a OOME.
> For example
> {code}
> /jcr:root//*[jcr:contains(., 'U=1*')]
> {code}
> will translate into the following lucene query
> {code}
> :fulltext:"u ( [set of all index terms stating with '1'] )"
> {code}
> this will break down when lucene will try to compute the score for the huge 
> set of tokens:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.<init>(OakDirectory.java:201)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.<init>(OakDirectory.java:155)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.<init>(OakDirectory.java:340)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:345)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:329)
>         at 
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.<init>(Lucene41PostingsReader.java:613)
>         at 
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docsAndPositions(Lucene41PostingsReader.java:252)
>         at 
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docsAndPositions(BlockTreeTermsReader.java:2233)
>         at 
> org.apache.lucene.search.UnionDocsAndPositionsEnum.<init>(MultiPhraseQuery.java:492)
>         at 
> org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:205)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex$1.loadDocs(LuceneIndex.java:352)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex$1.computeNext(LuceneIndex.java:289)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex$1.computeNext(LuceneIndex.java:280)
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex$LucenePathCursor$1.hasNext(LuceneIndex.java:1026)
>         at 
> com.google.common.collect.Iterators$7.computeNext(Iterators.java:645)
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>         at 
> org.apache.jackrabbit.oak.spi.query.Cursors$PathCursor.hasNext(Cursors.java:198)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex$LucenePathCursor.hasNext(LuceneIndex.java:1047)
>         at 
> org.apache.jackrabbit.oak.plugins.index.aggregate.AggregationCursor.fetchNext(AggregationCursor.java:88)
>         at 
> org.apache.jackrabbit.oak.plugins.index.aggregate.AggregationCursor.hasNext(AggregationCursor.java:75)
>         at 
> org.apache.jackrabbit.oak.spi.query.Cursors$ConcatCursor.fetchNext(Cursors.java:474)
>         at 
> org.apache.jackrabbit.oak.spi.query.Cursors$ConcatCursor.hasNext(Cursors.java:466)
>         at 
> org.apache.jackrabbit.oak.spi.query.Cursors$ConcatCursor.fetchNext(Cursors.java:474)
>         at 
> org.apache.jackrabbit.oak.spi.query.Cursors$ConcatCursor.hasNext(Cursors.java:466)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to