Re: TermFrequency in a multi-valued field

Jack Krupansky Wed, 07 Aug 2013 13:44:48 -0700

A multivalued text field is directly equivalent to concatenating the values,with a possible position gap between the last and first terms of adjacentvalues.

Term frequency is driven by the terms from the query, not the terms from thefield(tf(query-term), not tf(field-term)). Your "max" formula doesn't quitemake sense in that sense.

Why do you have two "foo" in the same field if you don't mean them to be...two "foo"??

You can use the Uniq update processer to eliminate duplicate values inmultivalued fields (where the whole value matches, not individual termswithin values.)


You need to clarify your use case.

-- Jack Krupansky

-----Original Message-----From: Jeff Wartes

Sent: Wednesday, August 07, 2013 4:05 PM
To: solr-user@lucene.apache.org
Subject: TermFrequency in a multi-valued field

This might end up being more of a Lucene question, but anyway...

For a multivalued field, it appears that term frequency is calculated as
something a little like:

sum(tf(value1), ..., tf(valueN))

I'd rather my score not give preference based on how *many* of the values
in the multivalued field matched, I want it to give preference based on
the value that matched *best*. In other words, something more like:

max(tf(value1), ..., tf(valueN))

Put another way, I want a search like q=mvf:foo against a document with a
multivalued field:
mvf: [ "foo" ]
to get scored the exact same as a document with a multivalued field:
mvf: [ "foo", "foo" ]
but worse than a document with a multivalued field:
mvf: [ "foo foo" ]

I'm guessing this'd require a custom Similarity implementation, but I'm
beginning to wonder if even that is low enough level.
Other thoughts? This seems like a pretty obvious desire.

Thanks.

Re: TermFrequency in a multi-valued field

Reply via email to