Re: index and search question
On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote: > Let's say I index documents using this > > Document doc = new Document(); > doc.add(Field.Text("file1", (Reader) new InputStreamReader(is))); > doc.add(Field.Text("file2", (Reader) new InputStreamReader(is2))); > > And want to do a search like this > > file1:Word file2:Word2 > > Basically doing a search using mutiple segments, file1 and file2 in the > same query, how would this be possible? Just as you wrote. If you use the QueryParser, you can search with file1:Word file2:Word2 or e.g. +file1:Word +file2:Word2etc. Or you can build a boolean query programmatically (if I understood your question). incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
index and search question
Let's say I index documents using this Document doc = new Document(); doc.add(Field.Text("file1", (Reader) new InputStreamReader(is))); doc.add(Field.Text("file2", (Reader) new InputStreamReader(is2))); And want to do a search like this file1:Word file2:Word2 Basically doing a search using mutiple segments, file1 and file2 in the same query, how would this be possible? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE : amusing interaction between advanced tokenizers and highlighter
> A question before I dive into coding a fix: can I assume (for > all analyzers) that the tokens produced by the tokenStream > have the following property: >currentToken.startOffset() >= lastToken.startOffset() > > The analyzers I have tested the highlighter with so far have > the property: >currentToken.startOffset() > lastToken.endOffset() > so aren't overlapping but I understand this isn't the case for > others (all demonstrable examples of such "problem" analyzers > would be appreciated for testing purposes). There is such an analyzer here http://savannah.nongnu.org/projects/aramorph . > If I can assume that tokenstreams always produce a zero or more > increment in token.startOffset I think I can > design a solution that still works using a single pass of the > token stream. > I suspect an additional "flushText" method will be required on > the Formatter interface to allow implementations > to use a buffer. This buffer would be required to accumulate > overlapping token scores when trying to decide if a > section of the original text merited any highlight markup. I am not familiar with your most recent highlighter package, but I have implemented this myself with some older rudimentary highlighting code that just uses a Vector to keep track of all tokens for the same offset positions. Highlighting based on those tokens accumulated in the Vector is triggered when currentToken.startOffset() > lastToken.startOffset() is satisfied, after which the token Vector is simply cleared and the new token position tracking begins. Don't forget to make sure that the same input/term text isn't output/highlighted more than once for multiple output tokens. Regards, RBP - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]