Re: positional token info

2003-10-22 Thread Otis Gospodnetic
Then we agree, and it is StopFilter that needs to be patched to take into account the number of removed terms, and add appropriate positional info to each term. Otis --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Tuesday, October 21, 2003, at 07:31 PM, Otis Gospodnetic wrote: > > So "phone boy

Re: positional token info

2003-10-22 Thread Erik Hatcher
On Tuesday, October 21, 2003, at 07:31 PM, Otis Gospodnetic wrote: So "phone boy" would match documents containing "phone the boy"? That doesn't sound right to me, as it assumes what the user is trying to do. That is correct currently a match would be found. Here's a little test case I'm wo

Re: positional token info

2003-10-21 Thread Tatu Saloranta
On Tuesday 21 October 2003 17:31, Otis Gospodnetic wrote: > > It does seem handy to avoid exact phrase matches on "phone boy" when > > a > > stop word is removed though, so patching StopFilter to put in the > > missing positions seems reasonable to me currently. Any objections > > to that? > > So

Re: positional token info

2003-10-21 Thread Otis Gospodnetic
> It does seem handy to avoid exact phrase matches on "phone boy" when > a > stop word is removed though, so patching StopFilter to put in the > missing positions seems reasonable to me currently. Any objections > to that? So "phone boy" would match documents containing "phone the boy"? That d

Re: positional token info

2003-10-21 Thread Erik Hatcher
On Tuesday, October 21, 2003, at 12:53 PM, Doug Cutting wrote: If however you want "phone the boy" to match "phone X boy" where X is any word, then PhraseQuery would have to be extended. It's actually a pretty simple extension. Each term in a PhraseQuery corresponds to a PhrasePositions objec

Re: positional token info

2003-10-21 Thread Otis Gospodnetic
I think "phone the boy" query should match exactly that, and not "phone X boy", nor "phone boy". To me, entering a query as a phrase query means that the user wants to find documents with _exactly_ that sequence of terms. If you know that your users will be entering phrases with stop words, then

Re: positional token info

2003-10-21 Thread Doug Cutting
Erik Hatcher wrote: Just for fun, I've written a simple stop filter that bumps the position increments to account for the stop words removed: But its practically impossible to formulate a Query that can take advantage of this. A PhraseQuery, because Terms don't have positional info (only the t

Re: positional token info

2003-10-21 Thread Steve Rowe
Erik, I've submitted a patch (BUG# 23730) very similar to yours, in response to a request to fix phrases matching where they should not: http://mail-archive.com/[EMAIL PROTECTED]/msg04349.html> Bug #23730: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730> > But, how would you actually *u

Re: positional token info

2003-10-21 Thread Erik Hatcher
On Tuesday, October 21, 2003, at 03:36 AM, Pierrick Brihaye wrote: The basic idea is to have several tokens at the same position (i.e. setPositionIncrement(0)) which are different possible stems for the same word. Right. Like I said, I recognize the benefits of using a position increment of 0.

Re: positional token info

2003-10-21 Thread Pierrick Brihaye
Hi, Erik Hatcher a écrit: Is anyone doing anything interesting with the Token.setPositionIncrement during analysis? I think so :-) Well... my arabic analyzer is based on this functionnality. The basic idea is to have several tokens at the same position (i.e. setPositionIncrement(0)) which are

positional token info

2003-10-20 Thread Erik Hatcher
Is anyone doing anything interesting with the Token.setPositionIncrement during analysis? Just for fun, I've written a simple stop filter that bumps the position increments to account for the stop words removed: public final Token next() throws IOException { int increment = 0; for (To