date:20040620

Re: index and search question

2004-06-20 Thread Incze Lajos

On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote:
> Let's say I index documents using this
> 
>  Document doc = new Document();
>  doc.add(Field.Text("file1", (Reader) new InputStreamReader(is)));
>  doc.add(Field.Text("file2", (Reader) new InputStreamReader(is2)));
> 
> And want to do a search like this
> 
> file1:Word file2:Word2
> 
> Basically doing a search using mutiple segments, file1 and file2 in the 
> same query, how would this be possible?

Just as you wrote. If you use the QueryParser, you can search with

file1:Word file2:Word2  or e.g.
+file1:Word +file2:Word2etc.

Or you can build a boolean query programmatically (if I understood
your question).

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

index and search question

2004-06-20 Thread Dmitrii PapaGeorgio

Let's say I index documents using this
 Document doc = new Document();
 doc.add(Field.Text("file1", (Reader) new InputStreamReader(is)));
 doc.add(Field.Text("file2", (Reader) new InputStreamReader(is2)));
And want to do a search like this
file1:Word file2:Word2
Basically doing a search using mutiple segments, file1 and file2 in the 
same query, how would this be possible?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE : amusing interaction between advanced tokenizers and highlighter

2004-06-20 Thread Rasik Pandey


> A question before I dive into coding a fix: can I assume (for
> all analyzers) that the tokens produced by the tokenStream
> have the following property:
>currentToken.startOffset() >= lastToken.startOffset()
> 
> The analyzers I have tested the highlighter with so far have
> the property:
>currentToken.startOffset() > lastToken.endOffset()
> so aren't overlapping but I understand this isn't the case for
> others (all demonstrable examples of such "problem" analyzers
> would be appreciated for testing purposes).

There is such an analyzer here
http://savannah.nongnu.org/projects/aramorph .

> If I can assume that tokenstreams always produce a zero or more
> increment in token.startOffset I think I can
> design a solution that still works using a single pass of the
> token stream.
> I suspect an additional "flushText" method will be required on
> the Formatter interface to allow implementations
> to use a buffer. This buffer would be required to accumulate
> overlapping token scores when trying to decide if a
> section of the original text merited any highlight markup.

I am not familiar with your most recent highlighter package, but I have implemented 
this myself with some older rudimentary highlighting code that just uses a Vector to 
keep track of all tokens for the same offset positions. Highlighting based on those 
tokens accumulated in the Vector is triggered when currentToken.startOffset() > 
lastToken.startOffset() is satisfied, after which the token Vector is simply cleared 
and the new token position tracking begins. Don't forget to make sure that the same 
input/term text isn't output/highlighted more than once for multiple output tokens.

Regards,
RBP 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: index and search question

index and search question

RE : amusing interaction between advanced tokenizers and highlighter

3 matches

Site Navigation

Mail list logo

Footer information