Re: Highlighting + phrase queries

2008-01-10 Thread Mark Miller
I don't think you would see much of gain. Shoving the TokenStream into the MemoryIndex is actually pretty fast and I wouldn't be surprised if it was much faster than reading from disk. Most of the computational time is spent in reconstructing the TokenStream, whether you use term-vectors or re-

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: That is why the original contrib does not work with PhraseQuery's. It simply matches Tokens from the query with those in the TokenStream. LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex. Then, after converting the query to a SpanQuery approximation, getSp

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Marjan Celikik wrote: Mark Miller wrote: The Highlighter works by comparing the TokenStream of the document with the Tokens in the query. The TokenStream can be rebuilt from the index if you use TermVectors with TokenSources or you can get it by reanalyzing the document. Each Token from the T

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: The Highlighter works by comparing the TokenStream of the document with the Tokens in the query. The TokenStream can be rebuilt from the index if you use TermVectors with TokenSources or you can get it by reanalyzing the document. Each Token from the TokenStream is checked

Re: Highlighting + phrase queries

2008-01-10 Thread Mark Miller
The Highlighter works by comparing the TokenStream of the document with the Tokens in the query. The TokenStream can be rebuilt from the index if you use TermVectors with TokenSources or you can get it by reanalyzing the document. Each Token from the TokenStream is checked against Tokens in th

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: Oh yeah...something that you may not have seen is that this has a dependency on MemoryIndex from contrib. You need that jar as well. - Mark Hm, I need the source code. How do I download the files from https://issues.apache.org/jira/browse/LUCENE-794 (all I see are some .pat

Re: Highlighting + phrase queries

2008-01-10 Thread Mark Miller
Oh yeah...something that you may not have seen is that this has a dependency on MemoryIndex from contrib. You need that jar as well. - Mark Marjan Celikik wrote: Mark Miller wrote: The contrib Highlighter doesn't know and highlights them all. Check out my patch here for position sensitive hi

Re: Highlighting + phrase queries

2008-01-10 Thread Mark Miller
It should work no problem with 2.2. What are the compile errors you are getting? If you send me a note directly I will send you a jar. - Mark Marjan Celikik wrote: Mark Miller wrote: The contrib Highlighter doesn't know and highlights them all. Check out my patch here for position sensitive

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: The contrib Highlighter doesn't know and highlights them all. Check out my patch here for position sensitive highlighting: https://issues.apache.org/jira/browse/LUCENE-794 It seems that the patch does not work with Lucene 2.2 as I get some compile errors. Is this really the

Re: Highlighting + phrase queries

2008-01-09 Thread Mark Miller
It works exactly the same as the standard contrib Highlighter except that it tries not to highlight spurious results for a positional query. This is exact with Span queries, but more approximate for phrase queries. The approximation is pretty darn good, but let me know if you find a case that d

Re: Highlighting + phrase queries

2008-01-09 Thread Marjan Celikik
Mark Miller wrote: The contrib Highlighter doesn't know and highlights them all. Check out my patch here for position sensitive highlighting: https://issues.apache.org/jira/browse/LUCENE-794 OK, before trying it out, I would like to know does the patch work for mixed queries, e.g. "a b" +c -d "

Re: Highlighting + phrase queries

2008-01-09 Thread Mark Miller
The contrib Highlighter doesn't know and highlights them all. Check out my patch here for position sensitive highlighting: https://issues.apache.org/jira/browse/LUCENE-794 Marjan Celikik wrote: Dear all, Let's assume I have a phrase query and a document which contain the phrase but also it co

Highlighting + phrase queries

2008-01-09 Thread Marjan Celikik
Dear all, Let's assume I have a phrase query and a document which contain the phrase but also it contains separate occurrences of each query term. How does the highlighter know that should only display fragments which contain phrases and not fragments which contain only the query words (not as