RE : amusing interaction between advanced tokenizers and highlighter
A question before I dive into coding a fix: can I assume (for all analyzers) that the tokens produced by the tokenStream have the following property: currentToken.startOffset() = lastToken.startOffset() The analyzers I have tested the highlighter with so far have the property: currentToken.startOffset() lastToken.endOffset() so aren't overlapping but I understand this isn't the case for others (all demonstrable examples of such problem analyzers would be appreciated for testing purposes). There is such an analyzer here http://savannah.nongnu.org/projects/aramorph . If I can assume that tokenstreams always produce a zero or more increment in token.startOffset I think I can design a solution that still works using a single pass of the token stream. I suspect an additional flushText method will be required on the Formatter interface to allow implementations to use a buffer. This buffer would be required to accumulate overlapping token scores when trying to decide if a section of the original text merited any highlight markup. I am not familiar with your most recent highlighter package, but I have implemented this myself with some older rudimentary highlighting code that just uses a Vector to keep track of all tokens for the same offset positions. Highlighting based on those tokens accumulated in the Vector is triggered when currentToken.startOffset() lastToken.startOffset() is satisfied, after which the token Vector is simply cleared and the new token position tracking begins. Don't forget to make sure that the same input/term text isn't output/highlighted more than once for multiple output tokens. Regards, RBP - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
index and search question
Let's say I index documents using this Document doc = new Document(); doc.add(Field.Text(file1, (Reader) new InputStreamReader(is))); doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2))); And want to do a search like this file1:Word file2:Word2 Basically doing a search using mutiple segments, file1 and file2 in the same query, how would this be possible? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index and search question
On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote: Let's say I index documents using this Document doc = new Document(); doc.add(Field.Text(file1, (Reader) new InputStreamReader(is))); doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2))); And want to do a search like this file1:Word file2:Word2 Basically doing a search using mutiple segments, file1 and file2 in the same query, how would this be possible? Just as you wrote. If you use the QueryParser, you can search with file1:Word file2:Word2 or e.g. +file1:Word +file2:Word2etc. Or you can build a boolean query programmatically (if I understood your question). incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]