[jira] [Issue Comment Edited] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.

JIRA Thu, 12 May 2011 09:01:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032472#comment-13032472
 ]


Pierre Gossé edited comment on LUCENE-3087 at 5/12/11 3:59 PM:
---------------------------------------------------------------

Yes, that would be the best.

But I'm not sure how to do that :
- Check for positions in token stream ? Not sure it "guaranties" anything. :)
- add some kind of additionnal properties to the TermFreqVector returned by the 
IndexReader.getTermFreqVector() since it already access fields info ? Not sure 
it has'nt too much impact.
- Ask the index for field infos from TokenSources.getTokenStream ? Not sure it 
is the place but looks like the less dangerous option.

I haven't much time 'till the end of month to take a serious look at this, but 
I'll try to take some time next month.

      was (Author: pigo):
    Yes, that would be the best.

But I'm not sure how to do that :
- Check for positions in token stream ? Not sure it "guaranties" anything. :)
- add some kind of additionnal properties to the tokenstream returned by the 
IndexReader.getTermFreqVector since it access fields info ? Not sure it has'nt 
too much impact.
- Ask the index for field infos from TokenSources.getTokenStream ? Not sure it 
is the place but look like the less dangerous option.

I haven't much time 'till the end of month to take a serious look at this, but 
I'll try to take some time next month.
  
> highlighting exact phrase with overlapping tokens fails.
> --------------------------------------------------------
>
>                 Key: LUCENE-3087
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3087
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9.4, 3.1
>            Reporter: Pierre Gossé
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3087.patch
>
>
> Fields with overlapping token are not highlighted in search results when 
> searching exact phrases, when using TermVector.WITH_OFFSET.
> The document builded in MemoryIndex for highlight does not preserve positions 
> of tokens in this case. Overlapping tokens get "flattened" (position 
> increment always set to 1), the spanquery used for searching relevant 
> fragment will fail to identify the correct token sequence because the 
> position shift.
> I corrected this by adding a position increment calculation in sub class 
> StoredTokenStream. I added junit test covering this case.
> I used the eclipse codestyle from trunk, but style add quite a few format 
> differences between repository and working copy files. I tried to reduce 
> them, but some linewrapping rules still doesn't match.
> Correction patch joined

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.

Reply via email to