RE : amusing interaction between advanced tokenizers and highlighter

2004-06-20 Thread Rasik Pandey

 A question before I dive into coding a fix: can I assume (for
 all analyzers) that the tokens produced by the tokenStream
 have the following property:
currentToken.startOffset() = lastToken.startOffset()
 
 The analyzers I have tested the highlighter with so far have
 the property:
currentToken.startOffset()  lastToken.endOffset()
 so aren't overlapping but I understand this isn't the case for
 others (all demonstrable examples of such problem analyzers
 would be appreciated for testing purposes).

There is such an analyzer here
http://savannah.nongnu.org/projects/aramorph .

 If I can assume that tokenstreams always produce a zero or more
 increment in token.startOffset I think I can
 design a solution that still works using a single pass of the
 token stream.
 I suspect an additional flushText method will be required on
 the Formatter interface to allow implementations
 to use a buffer. This buffer would be required to accumulate
 overlapping token scores when trying to decide if a
 section of the original text merited any highlight markup.

I am not familiar with your most recent highlighter package, but I have implemented 
this myself with some older rudimentary highlighting code that just uses a Vector to 
keep track of all tokens for the same offset positions. Highlighting based on those 
tokens accumulated in the Vector is triggered when currentToken.startOffset()  
lastToken.startOffset() is satisfied, after which the token Vector is simply cleared 
and the new token position tracking begins. Don't forget to make sure that the same 
input/term text isn't output/highlighted more than once for multiple output tokens.

Regards,
RBP 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



index and search question

2004-06-20 Thread Dmitrii PapaGeorgio
Let's say I index documents using this
 Document doc = new Document();
 doc.add(Field.Text(file1, (Reader) new InputStreamReader(is)));
 doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2)));
And want to do a search like this
file1:Word file2:Word2
Basically doing a search using mutiple segments, file1 and file2 in the 
same query, how would this be possible?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: index and search question

2004-06-20 Thread Incze Lajos
On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote:
 Let's say I index documents using this
 
  Document doc = new Document();
  doc.add(Field.Text(file1, (Reader) new InputStreamReader(is)));
  doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2)));
 
 And want to do a search like this
 
 file1:Word file2:Word2
 
 Basically doing a search using mutiple segments, file1 and file2 in the 
 same query, how would this be possible?

Just as you wrote. If you use the QueryParser, you can search with

file1:Word file2:Word2  or e.g.
+file1:Word +file2:Word2etc.

Or you can build a boolean query programmatically (if I understood
your question).

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]