Then we agree, and it is StopFilter that needs to be patched to take
into account the number of removed terms, and add appropriate
positional info to each term.
Otis
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> On Tuesday, October 21, 2003, at 07:31 PM, Otis Gospodnetic wrote:
> > So "phone boy
On Tuesday, October 21, 2003, at 07:31 PM, Otis Gospodnetic wrote:
So "phone boy" would match documents containing "phone the boy"? That
doesn't sound right to me, as it assumes what the user is trying to do.
That is correct currently a match would be found. Here's a little
test case I'm wo
On Tuesday 21 October 2003 17:31, Otis Gospodnetic wrote:
> > It does seem handy to avoid exact phrase matches on "phone boy" when
> > a
> > stop word is removed though, so patching StopFilter to put in the
> > missing positions seems reasonable to me currently. Any objections
> > to that?
>
> So
> It does seem handy to avoid exact phrase matches on "phone boy" when
> a
> stop word is removed though, so patching StopFilter to put in the
> missing positions seems reasonable to me currently. Any objections
> to that?
So "phone boy" would match documents containing "phone the boy"? That
d
On Tuesday, October 21, 2003, at 12:53 PM, Doug Cutting wrote:
If however you want "phone the boy" to match "phone X boy" where X is
any word, then PhraseQuery would have to be extended. It's actually a
pretty simple extension. Each term in a PhraseQuery corresponds to a
PhrasePositions objec
I think "phone the boy" query should match exactly that, and not "phone
X boy", nor "phone boy". To me, entering a query as a phrase query
means that the user wants to find documents with _exactly_ that
sequence of terms.
If you know that your users will be entering phrases with stop words,
then
Erik Hatcher wrote:
Just for fun, I've written a simple stop filter that bumps the position
increments to account for the stop words removed:
But its practically impossible to formulate a Query that can take
advantage of this. A PhraseQuery, because Terms don't have positional
info (only the t
Erik,
I've submitted a patch (BUG# 23730) very similar to yours, in response
to a request to fix phrases matching where they should not:
http://mail-archive.com/[EMAIL PROTECTED]/msg04349.html>
Bug #23730:
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730>
> But, how would you actually *u
On Tuesday, October 21, 2003, at 03:36 AM, Pierrick Brihaye wrote:
The basic idea is to have several tokens at the same position (i.e.
setPositionIncrement(0)) which are different possible stems for the
same word.
Right. Like I said, I recognize the benefits of using a position
increment of 0.
Hi,
Erik Hatcher a écrit:
Is anyone doing anything interesting with the Token.setPositionIncrement
during analysis?
I think so :-) Well... my arabic analyzer is based on this functionnality.
The basic idea is to have several tokens at the same position (i.e.
setPositionIncrement(0)) which are
Is anyone doing anything interesting with the
Token.setPositionIncrement during analysis?
Just for fun, I've written a simple stop filter that bumps the position
increments to account for the stop words removed:
public final Token next() throws IOException {
int increment = 0;
for (To
11 matches
Mail list logo