Re: Maximum Term Frequency and Minimum Document Length

Geir Henning Pettersen Thu, 05 Feb 2009 03:43:36 -0800

Hi,

We have used some time on a similar problem. I am pretty sure that setting a
custom similarity implementation is the best way to do it in solr too. We
ended up by solving our problem outside solr, but it probably would have
been better to set the similarity like you are describing here.


As far as I know, you can set solr to use your similarity class in the solr
config. Have a look at http://wiki.apache.org/solr/SolrPlugins and
http://wiki.apache.org/solr/SchemaXml in the sections about Similarity.

-- 
Geir H. Pettersen
Technical Development
T-Rank AS

On Thu, Feb 5, 2009 at 1:29 AM, Jonah Schwartz <jonah...@gmail.com> wrote:

> We want to configure solr so that fields are indexed with a maximum term
> frequency and a minimum document length. If a term appears more than N
> times
> in a field it will be considered to have appeared only N times. If a
> document length is under M terms, it will be considered to exactly M terms.
> We have done this in the past in raw Lucene by writing a Similarity class
> like this:
>
> public class LimitingSimilarity extends DefaultSimilarity {
>   public float lengthNorm(String fieldName, int numTerms) {
>       return super.lengthNorm(fieldName, Math.max(minNumTerms, numTerms));
>   }
>   public float tf(float freq) {
>       freq = Math.min(maxTermFrequency,freq);
>       return super.tf(freq);
>   }
> }
>
>
> Is there a better way to this within solr configuration files?
>
> Thanks,
> Jonah
>

Re: Maximum Term Frequency and Minimum Document Length

Reply via email to