[ https://issues.apache.org/jira/browse/LUCENE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032472#comment-13032472 ]
Pierre Gossé edited comment on LUCENE-3087 at 5/12/11 3:59 PM: --------------------------------------------------------------- Yes, that would be the best. But I'm not sure how to do that : - Check for positions in token stream ? Not sure it "guaranties" anything. :) - add some kind of additionnal properties to the TermFreqVector returned by the IndexReader.getTermFreqVector() since it already access fields info ? Not sure it has'nt too much impact. - Ask the index for field infos from TokenSources.getTokenStream ? Not sure it is the place but looks like the less dangerous option. I haven't much time 'till the end of month to take a serious look at this, but I'll try to take some time next month. was (Author: pigo): Yes, that would be the best. But I'm not sure how to do that : - Check for positions in token stream ? Not sure it "guaranties" anything. :) - add some kind of additionnal properties to the tokenstream returned by the IndexReader.getTermFreqVector since it access fields info ? Not sure it has'nt too much impact. - Ask the index for field infos from TokenSources.getTokenStream ? Not sure it is the place but look like the less dangerous option. I haven't much time 'till the end of month to take a serious look at this, but I'll try to take some time next month. > highlighting exact phrase with overlapping tokens fails. > -------------------------------------------------------- > > Key: LUCENE-3087 > URL: https://issues.apache.org/jira/browse/LUCENE-3087 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/highlighter > Affects Versions: 2.9.4, 3.1 > Reporter: Pierre Gossé > Assignee: Michael McCandless > Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3087.patch > > > Fields with overlapping token are not highlighted in search results when > searching exact phrases, when using TermVector.WITH_OFFSET. > The document builded in MemoryIndex for highlight does not preserve positions > of tokens in this case. Overlapping tokens get "flattened" (position > increment always set to 1), the spanquery used for searching relevant > fragment will fail to identify the correct token sequence because the > position shift. > I corrected this by adding a position increment calculation in sub class > StoredTokenStream. I added junit test covering this case. > I used the eclipse codestyle from trunk, but style add quite a few format > differences between repository and working copy files. I tried to reduce > them, but some linewrapping rules still doesn't match. > Correction patch joined -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org