[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602541#action_12602541 ]
Mike Klaas commented on SOLR-556: --------------------------------- Ah, I see what the problem is: Although it is impossible for tokens from different values to appear in the same fragment (due to the semantics of MultiValuedTokenFilter), the non-token text (typically, punctuation) from different values can bleed into the same fragment, since lucene's highlighter can only create a new fragment on token boundaries. Unfortunately SOLR-553 was committed a day after you submitted your patch, and rearranges the code slightly so that it no longer applies. Could you sync the patch with trunk? I think the basic approach is sound. > Highlighting of multi-valued fields returns snippets which span multiple > different values > ----------------------------------------------------------------------------------------- > > Key: SOLR-556 > URL: https://issues.apache.org/jira/browse/SOLR-556 > Project: Solr > Issue Type: Bug > Components: highlighter > Affects Versions: 1.3 > Environment: Tomcat 5.5 > Reporter: Lars Kotthoff > Assignee: Mike Klaas > Priority: Minor > Fix For: 1.3 > > Attachments: solr-highlight-multivalued-example.xml, > solr-highlight-multivalued.patch > > > When highlighting multi-valued fields, the highlighter sometimes returns > snippets which span multiple values, e.g. with values "foo" and "bar" and > search term "ba" the highlighter will create the snippet "foo<em>ba</em>r". > Furthermore it sometimes returns smaller snippets than it should, e.g. with > value "foobar" and search term "oo" it will create the snippet "<em>oo</em>" > regardless of hl.fragsize. > I have been unable to determine the real cause for this, or indeed what > actually goes on at all. To reproduce the problem, I've used the following > steps: > * create an index with multi-valued fields, one document should have at least > 3 values for these fields (in my case strings of length between 5 and 15 > Japanese characters -- as far as I can tell plain old ASCII should produce > the same effect though) > * search for part of a value in such a field with highlighting enabled, the > additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, > hl.mergeContiguous=true (changing the parameters does not seem to have any > effect on the result though) > * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.