[ 
https://issues.apache.org/jira/browse/SOLR-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629248#action_12629248
 ] 

Chris Harris commented on SOLR-195:
-----------------------------------

I just rediscovered this bug for myself, and was about to re-report it, but 
then I found this JIRA issue. Even though it's a bit redundant, I'm going to 
paste my bug report here, since A) I think it's a good summary of the problem 
B) it has a remark for when usePhraseHighlighter=true, and C) it includes a few 
test cases.

****

Highlighting with wildcards (whether * is in the middle of a term or at
the end) doesn't work right now for the standard request handler.
The high-level view of the problem is as follows:

1. Extracting terms is central to highlighting
2. Wildcard queries get parsed into ConstantScoreQuery objects
3. It's not currently possible to extract terms from
   ConstantScoreQuery objects

****

Wildcard queries get turned into ConstantScoreQuery objects. For non-prefix
wildcards (e.g. "l*g"), the query parser directly returns a
ConstantScoreQuery with filter = WildcardFilter. For prefix wildcards
(e.g. "lon*"), the query parser returns a ConstantScorePrefixQuery,
but it gets rewritten (by Query.rewrite(), which gets called in the
highlighting component) into a ConstantScoreQuery with
filter = PrefixFilter.

If usePhraseHighlighter=false, then a key part of highlighting is
Query.extractTerms(). However, ConstantScoreQuery.extractTerms()
is an empty method. The source itself notes that this may not
be good for highlighting: "OK to not add any terms when used for
MultiSearcher, but may not be OK for highlighting."

If usePhraseHighlighter=true, then a key part of highlighting is
WeightedSpanTermExtractor.extract(Query, Map). Now extract() has
a number of different instanceof clauses, each with knowledge about
how to extract terms from a particular kind of query. However, there
is no instanceof clause that matches ConstantScoreQuery.

****

Here are four variants on testDefaultFieldHighlight() that all fail, even
though I think they should pass. (The differences from
testDefaultFieldHighlight are the hl.usePhraseHighlighter param and the
use of wildcard in sumLRF.makeRequest.) When I run them, they each return
a document, as expected, but they don't find any highlight blocks.

{code}
  public void testDefaultFieldPrefixWildcardHighlight() {

    // do summarization using re-analysis of the field
    HashMap<String,String> args = new HashMap<String,String>();
    args.put("hl", "true");
    args.put("df", "t_text");
    args.put("hl.fl", "");
    args.put("hl.usePhraseHighlighter", "false");
    TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory(
      "standard", 0, 200, args);
    
    assertU(adoc("t_text", "a long day's night", "id", "1"));
    assertU(commit());
    assertU(optimize());
    assertQ("Basic summarization",
            sumLRF.makeRequest("lon*"),
            "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
            "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str"
            );

  }

  public void testDefaultFieldPrefixWildcardHighlight2() {

    // do summarization using re-analysis of the field
    HashMap<String,String> args = new HashMap<String,String>();
    args.put("hl", "true");
    args.put("df", "t_text");
    args.put("hl.fl", "");
    args.put("hl.usePhraseHighlighter", "true");
    TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory(
      "standard", 0, 200, args);
    
    assertU(adoc("t_text", "a long day's night", "id", "1"));
    assertU(commit());
    assertU(optimize());
    assertQ("Basic summarization",
            sumLRF.makeRequest("lon*"),
            "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
            "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str"
            );

  }

  public void testDefaultFieldNonPrefixWildcardHighlight() {

    // do summarization using re-analysis of the field
    HashMap<String,String> args = new HashMap<String,String>();
    args.put("hl", "true");
    args.put("df", "t_text");
    args.put("hl.fl", "");
    args.put("hl.usePhraseHighlighter", "false");
    TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory(
      "standard", 0, 200, args);
    
    assertU(adoc("t_text", "a long day's night", "id", "1"));
    assertU(commit());
    assertU(optimize());
    assertQ("Basic summarization",
            sumLRF.makeRequest("l*g"),
            "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
            "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str"
            );

  }

  public void testDefaultFieldNonPrefixWildcardHighlight2() {

    // do summarization using re-analysis of the field
    HashMap<String,String> args = new HashMap<String,String>();
    args.put("hl", "true");
    args.put("df", "t_text");
    args.put("hl.fl", "");
    args.put("hl.usePhraseHighlighter", "true");
    TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory(
      "standard", 0, 200, args);
    
    assertU(adoc("t_text", "a long day's night", "id", "1"));
    assertU(commit());
    assertU(optimize());
    assertQ("Basic summarization",
            sumLRF.makeRequest("l*g"),
            "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
            "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str"
            );

  }
{code}

> Wildcard/prefix queries not highlighted
> ---------------------------------------
>
>                 Key: SOLR-195
>                 URL: https://issues.apache.org/jira/browse/SOLR-195
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 1.1.0, 1.2
>            Reporter: Mike Klaas
>            Priority: Minor
>
> Possible bug in query rewrite()ing:
> http://www.nabble.com/return-matched-terms---fuzzy-or-wildcard-searches-tf3452757.html#a9640214

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to