A multivalued text field is directly equivalent to concatenating the values, with a possible position gap between the last and first terms of adjacent values.

Term frequency is driven by the terms from the query, not the terms from the field(tf(query-term), not tf(field-term)). Your "max" formula doesn't quite make sense in that sense.

Why do you have two "foo" in the same field if you don't mean them to be... two "foo"??

You can use the Uniq update processer to eliminate duplicate values in multivalued fields (where the whole value matches, not individual terms within values.)

You need to clarify your use case.

-- Jack Krupansky

-----Original Message----- From: Jeff Wartes
Sent: Wednesday, August 07, 2013 4:05 PM
To: solr-user@lucene.apache.org
Subject: TermFrequency in a multi-valued field


This might end up being more of a Lucene question, but anyway...

For a multivalued field, it appears that term frequency is calculated as
something a little like:

sum(tf(value1), ..., tf(valueN))

I'd rather my score not give preference based on how *many* of the values
in the multivalued field matched, I want it to give preference based on
the value that matched *best*. In other words, something more like:

max(tf(value1), ..., tf(valueN))


Put another way, I want a search like q=mvf:foo against a document with a
multivalued field:
mvf: [ "foo" ]
to get scored the exact same as a document with a multivalued field:
mvf: [ "foo", "foo" ]
but worse than a document with a multivalued field:
mvf: [ "foo foo" ]


I'm guessing this'd require a custom Similarity implementation, but I'm
beginning to wonder if even that is low enough level.
Other thoughts? This seems like a pretty obvious desire.

Thanks.

Reply via email to