[jira] [Assigned] (LUCENE-2287) Unexpected terms are highlighted within nested SpanQuery instances

David Smiley (JIRA) Fri, 05 Jan 2018 06:40:17 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley reassigned LUCENE-2287:
------------------------------------

         Assignee: David Smiley
    Fix Version/s: 7.3

> Unexpected terms are highlighted within nested SpanQuery instances
> ------------------------------------------------------------------
>
>                 Key: LUCENE-2287
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2287
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 2.9.1
>         Environment: Linux, Solaris, Windows
>            Reporter: Michael Goddard
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 7.3
>
>         Attachments: LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch, 
> LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I haven't yet been able to resolve why I'm seeing spurious highlighting in 
> nested SpanQuery instances.  Briefly, the issue is illustrated by the second 
> instance of "Lucene" being highlighted in the test below, when it doesn't 
> satisfy the inner span.  There's been some discussion about this on the 
> java-dev list, and I'm opening this issue now because I have made some 
> initial progress on this.
> This new test, added to the  HighlighterTest class in lucene_2_9_1, 
> illustrates this:
> /*
>  * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/
>  */
> public void testHighlightingNestedSpans2() throws Exception {
>   String theText = "The Lucene was made by Doug Cutting and Lucene great 
> Hadoop was"; // Problem
>   //String theText = "The Lucene was made by Doug Cutting and the great 
> Hadoop was"; // Works okay
>   String fieldName = "SOME_FIELD_NAME";
>   SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
>     new SpanTermQuery(new Term(fieldName, "lucene")),
>     new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true);
>   Query query = new SpanNearQuery(new SpanQuery[] { spanNear,
>     new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true);
>   String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and 
> Lucene great <B>Hadoop</B> was";
>   //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and 
> the great <B>Hadoop</B> was";
>   String observed = highlightField(query, fieldName, theText);
>   System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + 
> observed);
>   assertEquals("Why is that second instance of the term \"Lucene\" 
> highlighted?", expected, observed);
> }
> Is this an issue that's arisen before?  I've been reading through the source 
> to QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and 
> NearSpansOrdered, but haven't found the solution yet.  Initially, I thought 
> that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should 
> be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't 
> get me too far.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-2287) Unexpected terms are highlighted within nested SpanQuery instances

Reply via email to