Re: Getting unique key of a document inside of a Similarity class.

J-Pro Fri, 20 Feb 2015 07:17:59 -0800

from all the examples of what you've described, i'm fairly certain all you
really need is a TFIDF based Similarity where coord(), idf(), tf() and
queryNorm() return 1 allways, and you omitNorms from all fields.

Yeah, that's what I did in the very first iteration. It works only forcases #1 and #2. If you try query 3 and 4 with such Similarity, you'll get:


3. place:(34\ High\ Street)^3 => doc1(score=9), doc2(score=9)

4. name:DocumentOne^7 OR place:(34\ High\ Street)^3 => doc1(score=16),doc2(score=9)

That is not what I need. As I described above, in case of multipletokens match for a field, method SimScorer.score is called X times,where X is number of matched tokens (in cases #3 and #4 there are 3tokens), therefore score sums up. I need to score only once in thiscase, regardless of number of tokens.

How to do it? First idea was HashSet based on fieldName, so that afterscoring once, it don't score anymore. But in this case only firstdocument was scoring (since second and other documents have the samefield name). So I understood that I need also docID for that. And itworked fine until I found out (thank you for that) about that docID issegment-specific. So now I need segmentID as well (or something similar).

(You didn't give any examples of what you expect to happen with exclusion
clauses in your BooleanQueries

For my needs I won't need exclusion clauses, but in this case the samewould happen - it would score depending on weight, because condition istrue:


5. (NOT name:DocumentOne)^7 => doc2(score=7)

Re: Getting unique key of a document inside of a Similarity class.

Reply via email to