TermFrequency in a multi-valued field

Jeff Wartes Wed, 07 Aug 2013 13:06:53 -0700

This might end up being more of a Lucene question, but anyway...

For a multivalued field, it appears that term frequency is calculated as
something a little like:


sum(tf(value1), ..., tf(valueN))

I'd rather my score not give preference based on how *many* of the values
in the multivalued field matched, I want it to give preference based on
the value that matched *best*. In other words, something more like:

max(tf(value1), ..., tf(valueN))


Put another way, I want a search like q=mvf:foo against a document with a
multivalued field: 
mvf: [ "foo" ]
to get scored the exact same as a document with a multivalued field:
mvf: [ "foo", "foo" ]
but worse than a document with a multivalued field:
mvf: [ "foo foo" ]


I'm guessing this'd require a custom Similarity implementation, but I'm
beginning to wonder if even that is low enough level.
Other thoughts? This seems like a pretty obvious desire.

Thanks.

TermFrequency in a multi-valued field

Reply via email to