[jira] [Resolved] (LUCENE-7231) Problem with NGramAnalyzer, PhraseQuery and Highlighter
[ https://issues.apache.org/jira/browse/LUCENE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-7231. Resolution: Fixed Fix Version/s: 5.6 5.5.2 > Problem with NGramAnalyzer, PhraseQuery and Highlighter > --- > > Key: LUCENE-7231 > URL: https://issues.apache.org/jira/browse/LUCENE-7231 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 5.4.1 >Reporter: Eva Popenda >Assignee: Alan Woodward > Fix For: 6.1, 5.5.2, 5.6, 6.0.1 > > Attachments: LUCENE-7231.patch > > > Using the Highlighter with N-GramAnalyzer and PhraseQuery and searching for a > substring with length = N yields the following exception: > {noformat} > java.lang.IllegalArgumentException: Less than 2 subSpans.size():1 > at > org.apache.lucene.search.spans.ConjunctionSpans.(ConjunctionSpans.java:40) > at > org.apache.lucene.search.spans.NearSpansOrdered.(NearSpansOrdered.java:56) > at > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:292) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:137) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:506) > at > org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) > at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) > {noformat} > Below is a JUnit-Test reproducing this behavior. In case of searching for a > string with more than N characters or using NGramPhraseQuery this problem > doesn't occur. > Why is it that more than 1 subSpans are required? > {code:java} > public class HighlighterTest { >@Rule >public final ExpectedException exception = ExpectedException.none(); >@Test >public void testHighlighterWithPhraseQueryThrowsException() throws > IOException, InvalidTokenOffsetsException { >final Analyzer analyzer = new NGramAnalyzer(4); >final String fieldName = "substring"; >final List list = new ArrayList<>(); >list.add(new BytesRef("uchu")); >final PhraseQuery query = new PhraseQuery(fieldName, list.toArray(new > BytesRef[list.size()])); >final QueryScorer fragmentScorer = new QueryScorer(query, fieldName); >final SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", > ""); >exception.expect(IllegalArgumentException.class); >exception.expectMessage("Less than 2 subSpans.size():1"); >final Highlighter highlighter = new > Highlighter(formatter,TextEncoder.NONE.getEncoder(), fragmentScorer); >highlighter.setTextFragmenter(new SimpleFragmenter(100)); >final String fragment = highlighter.getBestFragment(analyzer, > fieldName, "Buchung"); >assertEquals("Buchung",fragment); >} > public final class NGramAnalyzer extends Analyzer { >private final int minNGram; >public NGramAnalyzer(final int minNGram) { >super(); >this.minNGram = minNGram; >} >@Override >protected TokenStreamComponents createComponents(final String fieldName) { >final Tokenizer source = new NGramTokenizer(minNGram, minNGram); >return new TokenStreamComponents(source); >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-7231) Problem with NGramAnalyzer, PhraseQuery and Highlighter
[ https://issues.apache.org/jira/browse/LUCENE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-7231. Resolution: Fixed Fix Version/s: 6.0.1 > Problem with NGramAnalyzer, PhraseQuery and Highlighter > --- > > Key: LUCENE-7231 > URL: https://issues.apache.org/jira/browse/LUCENE-7231 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 5.4.1 >Reporter: Eva Popenda >Assignee: Alan Woodward > Fix For: 6.1, 6.0.1 > > Attachments: LUCENE-7231.patch > > > Using the Highlighter with N-GramAnalyzer and PhraseQuery and searching for a > substring with length = N yields the following exception: > {noformat} > java.lang.IllegalArgumentException: Less than 2 subSpans.size():1 > at > org.apache.lucene.search.spans.ConjunctionSpans.(ConjunctionSpans.java:40) > at > org.apache.lucene.search.spans.NearSpansOrdered.(NearSpansOrdered.java:56) > at > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:292) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:137) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:506) > at > org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) > at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) > {noformat} > Below is a JUnit-Test reproducing this behavior. In case of searching for a > string with more than N characters or using NGramPhraseQuery this problem > doesn't occur. > Why is it that more than 1 subSpans are required? > {code:java} > public class HighlighterTest { >@Rule >public final ExpectedException exception = ExpectedException.none(); >@Test >public void testHighlighterWithPhraseQueryThrowsException() throws > IOException, InvalidTokenOffsetsException { >final Analyzer analyzer = new NGramAnalyzer(4); >final String fieldName = "substring"; >final List list = new ArrayList<>(); >list.add(new BytesRef("uchu")); >final PhraseQuery query = new PhraseQuery(fieldName, list.toArray(new > BytesRef[list.size()])); >final QueryScorer fragmentScorer = new QueryScorer(query, fieldName); >final SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", > ""); >exception.expect(IllegalArgumentException.class); >exception.expectMessage("Less than 2 subSpans.size():1"); >final Highlighter highlighter = new > Highlighter(formatter,TextEncoder.NONE.getEncoder(), fragmentScorer); >highlighter.setTextFragmenter(new SimpleFragmenter(100)); >final String fragment = highlighter.getBestFragment(analyzer, > fieldName, "Buchung"); >assertEquals("Buchung",fragment); >} > public final class NGramAnalyzer extends Analyzer { >private final int minNGram; >public NGramAnalyzer(final int minNGram) { >super(); >this.minNGram = minNGram; >} >@Override >protected TokenStreamComponents createComponents(final String fieldName) { >final Tokenizer source = new NGramTokenizer(minNGram, minNGram); >return new TokenStreamComponents(source); >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-7231) Problem with NGramAnalyzer, PhraseQuery and Highlighter
[ https://issues.apache.org/jira/browse/LUCENE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-7231. --- Resolution: Fixed Fix Version/s: 6.1 Thanks Eva! > Problem with NGramAnalyzer, PhraseQuery and Highlighter > --- > > Key: LUCENE-7231 > URL: https://issues.apache.org/jira/browse/LUCENE-7231 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 5.4.1 >Reporter: Eva Popenda >Assignee: Alan Woodward > Fix For: 6.1 > > Attachments: LUCENE-7231.patch > > > Using the Highlighter with N-GramAnalyzer and PhraseQuery and searching for a > substring with length = N yields the following exception: > {noformat} > java.lang.IllegalArgumentException: Less than 2 subSpans.size():1 > at > org.apache.lucene.search.spans.ConjunctionSpans.(ConjunctionSpans.java:40) > at > org.apache.lucene.search.spans.NearSpansOrdered.(NearSpansOrdered.java:56) > at > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:292) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:137) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:506) > at > org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) > at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) > {noformat} > Below is a JUnit-Test reproducing this behavior. In case of searching for a > string with more than N characters or using NGramPhraseQuery this problem > doesn't occur. > Why is it that more than 1 subSpans are required? > {code:java} > public class HighlighterTest { >@Rule >public final ExpectedException exception = ExpectedException.none(); >@Test >public void testHighlighterWithPhraseQueryThrowsException() throws > IOException, InvalidTokenOffsetsException { >final Analyzer analyzer = new NGramAnalyzer(4); >final String fieldName = "substring"; >final List list = new ArrayList<>(); >list.add(new BytesRef("uchu")); >final PhraseQuery query = new PhraseQuery(fieldName, list.toArray(new > BytesRef[list.size()])); >final QueryScorer fragmentScorer = new QueryScorer(query, fieldName); >final SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", > ""); >exception.expect(IllegalArgumentException.class); >exception.expectMessage("Less than 2 subSpans.size():1"); >final Highlighter highlighter = new > Highlighter(formatter,TextEncoder.NONE.getEncoder(), fragmentScorer); >highlighter.setTextFragmenter(new SimpleFragmenter(100)); >final String fragment = highlighter.getBestFragment(analyzer, > fieldName, "Buchung"); >assertEquals("Buchung",fragment); >} > public final class NGramAnalyzer extends Analyzer { >private final int minNGram; >public NGramAnalyzer(final int minNGram) { >super(); >this.minNGram = minNGram; >} >@Override >protected TokenStreamComponents createComponents(final String fieldName) { >final Tokenizer source = new NGramTokenizer(minNGram, minNGram); >return new TokenStreamComponents(source); >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org