[ https://issues.apache.org/jira/browse/SOLR-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629248#action_12629248 ]
Chris Harris commented on SOLR-195: ----------------------------------- I just rediscovered this bug for myself, and was about to re-report it, but then I found this JIRA issue. Even though it's a bit redundant, I'm going to paste my bug report here, since A) I think it's a good summary of the problem B) it has a remark for when usePhraseHighlighter=true, and C) it includes a few test cases. **** Highlighting with wildcards (whether * is in the middle of a term or at the end) doesn't work right now for the standard request handler. The high-level view of the problem is as follows: 1. Extracting terms is central to highlighting 2. Wildcard queries get parsed into ConstantScoreQuery objects 3. It's not currently possible to extract terms from ConstantScoreQuery objects **** Wildcard queries get turned into ConstantScoreQuery objects. For non-prefix wildcards (e.g. "l*g"), the query parser directly returns a ConstantScoreQuery with filter = WildcardFilter. For prefix wildcards (e.g. "lon*"), the query parser returns a ConstantScorePrefixQuery, but it gets rewritten (by Query.rewrite(), which gets called in the highlighting component) into a ConstantScoreQuery with filter = PrefixFilter. If usePhraseHighlighter=false, then a key part of highlighting is Query.extractTerms(). However, ConstantScoreQuery.extractTerms() is an empty method. The source itself notes that this may not be good for highlighting: "OK to not add any terms when used for MultiSearcher, but may not be OK for highlighting." If usePhraseHighlighter=true, then a key part of highlighting is WeightedSpanTermExtractor.extract(Query, Map). Now extract() has a number of different instanceof clauses, each with knowledge about how to extract terms from a particular kind of query. However, there is no instanceof clause that matches ConstantScoreQuery. **** Here are four variants on testDefaultFieldHighlight() that all fail, even though I think they should pass. (The differences from testDefaultFieldHighlight are the hl.usePhraseHighlighter param and the use of wildcard in sumLRF.makeRequest.) When I run them, they each return a document, as expected, but they don't find any highlight blocks. {code} public void testDefaultFieldPrefixWildcardHighlight() { // do summarization using re-analysis of the field HashMap<String,String> args = new HashMap<String,String>(); args.put("hl", "true"); args.put("df", "t_text"); args.put("hl.fl", ""); args.put("hl.usePhraseHighlighter", "false"); TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory( "standard", 0, 200, args); assertU(adoc("t_text", "a long day's night", "id", "1")); assertU(commit()); assertU(optimize()); assertQ("Basic summarization", sumLRF.makeRequest("lon*"), "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']", "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str" ); } public void testDefaultFieldPrefixWildcardHighlight2() { // do summarization using re-analysis of the field HashMap<String,String> args = new HashMap<String,String>(); args.put("hl", "true"); args.put("df", "t_text"); args.put("hl.fl", ""); args.put("hl.usePhraseHighlighter", "true"); TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory( "standard", 0, 200, args); assertU(adoc("t_text", "a long day's night", "id", "1")); assertU(commit()); assertU(optimize()); assertQ("Basic summarization", sumLRF.makeRequest("lon*"), "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']", "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str" ); } public void testDefaultFieldNonPrefixWildcardHighlight() { // do summarization using re-analysis of the field HashMap<String,String> args = new HashMap<String,String>(); args.put("hl", "true"); args.put("df", "t_text"); args.put("hl.fl", ""); args.put("hl.usePhraseHighlighter", "false"); TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory( "standard", 0, 200, args); assertU(adoc("t_text", "a long day's night", "id", "1")); assertU(commit()); assertU(optimize()); assertQ("Basic summarization", sumLRF.makeRequest("l*g"), "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']", "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str" ); } public void testDefaultFieldNonPrefixWildcardHighlight2() { // do summarization using re-analysis of the field HashMap<String,String> args = new HashMap<String,String>(); args.put("hl", "true"); args.put("df", "t_text"); args.put("hl.fl", ""); args.put("hl.usePhraseHighlighter", "true"); TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory( "standard", 0, 200, args); assertU(adoc("t_text", "a long day's night", "id", "1")); assertU(commit()); assertU(optimize()); assertQ("Basic summarization", sumLRF.makeRequest("l*g"), "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']", "//[EMAIL PROTECTED]'1']/[EMAIL PROTECTED]'t_text']/str" ); } {code} > Wildcard/prefix queries not highlighted > --------------------------------------- > > Key: SOLR-195 > URL: https://issues.apache.org/jira/browse/SOLR-195 > Project: Solr > Issue Type: Bug > Components: highlighter > Affects Versions: 1.1.0, 1.2 > Reporter: Mike Klaas > Priority: Minor > > Possible bug in query rewrite()ing: > http://www.nabble.com/return-matched-terms---fuzzy-or-wildcard-searches-tf3452757.html#a9640214 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.