Re: scoring adjacent terms without proximity search

2009-11-23 Thread liat oren
ch"~slop > > This includes documents with one of more of the terms, but prefers those > with an edit distance <= the slop. > > > -Original Message- > From: Joel Halbert > Reply-To: java-user@lucene.apache.org > To: java-user@lucene.apache.org > Subje

Re: scoring adjacent terms without proximity search

2009-11-02 Thread Joel Halbert
age- From: Joel Halbert Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: scoring adjacent terms without proximity search Date: Sat, 31 Oct 2009 08:38:29 + Thank you all for your suggestions, I shall have a little think about the best way forward, and report

Re: scoring adjacent terms without proximity search

2009-10-31 Thread Joel Halbert
nt to include documents that might only have a subset of words from the phrase. (e.g. just cheese, or just sandwich, but not both). -Original Message- From: Robert Muir Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: scoring adjacent terms without proximity s

Re: scoring adjacent terms without proximity search

2009-10-30 Thread Robert Muir
> I suppose you could precompute the proximity associations by indexing > n-grams (in this case, called Lucene calls them shingles), such that there > is a single token in your index containing cheese_sandwich (effectively) > > doh, I see Grant already lead you in this direction. (sorry for the dup

Re: scoring adjacent terms without proximity search

2009-10-30 Thread Robert Muir
yet another thing to look into that might improve things a bit is using ShingleFilter in contrib. this way cheese sandwich would form a shingle of "cheese sandwich" and would get a higher score for the "Toasted Cheese Sandwich" document. it wouldn't solve the proximity problem in general, but may

RE: scoring adjacent terms without proximity search

2009-10-30 Thread Steven A Rowe
Hi Joel, You could index every possible word combination in your document text, with one field for each possible distance. (You would have to write an index-time analyzer to do this, since AFAIK nothing like this exists currently. Shingles wouldn't work, since you want to ignore intervening t

Re: scoring adjacent terms without proximity search

2009-10-30 Thread Grant Ingersoll
On Oct 30, 2009, at 5:49 AM, Joel Halbert wrote: Hi, Without using a proximity search i.e. "cheese sandwich"~5 What's the best way of up-scoring results in which the search terms are closer to each other? I'm not aware of any query technique to score based on proximity that doesn't, it