ch"~slop
>
> This includes documents with one of more of the terms, but prefers those
> with an edit distance <= the slop.
>
>
> -Original Message-
> From: Joel Halbert
> Reply-To: java-user@lucene.apache.org
> To: java-user@lucene.apache.org
> Subje
age-
From: Joel Halbert
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: scoring adjacent terms without proximity search
Date: Sat, 31 Oct 2009 08:38:29 +
Thank you all for your suggestions, I shall have a little think about
the best way forward, and report
nt to
include documents that might only have a subset of words from the
phrase. (e.g. just cheese, or just sandwich, but not both).
-Original Message-
From: Robert Muir
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: scoring adjacent terms without proximity s
> I suppose you could precompute the proximity associations by indexing
> n-grams (in this case, called Lucene calls them shingles), such that there
> is a single token in your index containing cheese_sandwich (effectively)
>
>
doh, I see Grant already lead you in this direction. (sorry for the
dup
yet another thing to look into that might improve things a bit is using
ShingleFilter in contrib.
this way cheese sandwich would form a shingle of "cheese sandwich" and would
get a higher score for the "Toasted Cheese Sandwich" document.
it wouldn't solve the proximity problem in general, but may
Hi Joel,
You could index every possible word combination in your document text, with one
field for each possible distance. (You would have to write an index-time
analyzer to do this, since AFAIK nothing like this exists currently. Shingles
wouldn't work, since you want to ignore intervening t
On Oct 30, 2009, at 5:49 AM, Joel Halbert wrote:
Hi,
Without using a proximity search i.e. "cheese sandwich"~5
What's the best way of up-scoring results in which the search terms
are
closer to each other?
I'm not aware of any query technique to score based on proximity that
doesn't, it