from all the examples of what you've described, i'm fairly certain all you
really need is a TFIDF based Similarity where coord(), idf(), tf() and
queryNorm() return 1 allways, and you omitNorms from all fields.
Yeah, that's what I did in the very first iteration. It works only for
cases #1 and #2. If you try query 3 and 4 with such Similarity, you'll get:
3. place:(34\ High\ Street)^3 => doc1(score=9), doc2(score=9)
4. name:DocumentOne^7 OR place:(34\ High\ Street)^3 => doc1(score=16),
doc2(score=9)
That is not what I need. As I described above, in case of multiple
tokens match for a field, method SimScorer.score is called X times,
where X is number of matched tokens (in cases #3 and #4 there are 3
tokens), therefore score sums up. I need to score only once in this
case, regardless of number of tokens.
How to do it? First idea was HashSet based on fieldName, so that after
scoring once, it don't score anymore. But in this case only first
document was scoring (since second and other documents have the same
field name). So I understood that I need also docID for that. And it
worked fine until I found out (thank you for that) about that docID is
segment-specific. So now I need segmentID as well (or something similar).
(You didn't give any examples of what you expect to happen with exclusion
clauses in your BooleanQueries
For my needs I won't need exclusion clauses, but in this case the same
would happen - it would score depending on weight, because condition is
true:
5. (NOT name:DocumentOne)^7 => doc2(score=7)