Re: No Effect of omitNorms and omitTermFreqAndPositions when using MLT handler?

Ravish Bhagdev Mon, 21 May 2012 03:17:09 -0700

I found this:

https://issues.apache.org/jira/browse/LUCENE-2236


So, it seems this feature is not supported in Solr 1.4 at all.  Is there
any possible work around?  If not, I'll have to consider splitting my
schema into two which will be quite a big change :(

- Ravish

On Mon, May 21, 2012 at 11:03 AM, Ravish Bhagdev
<ravish.bhag...@gmail.com>wrote:

> Ahh, this is because I have to override DefaultSimilarity to turn off
> tf/idf scoring?  But this will apply to all the fields and general search
> on text fields as well?  Is there a way to apply custom similarity to
> specific field types or fields only?  Is there no way of turning TF/IDF off
> without this?
>
> Thanks,
> Ravish
>
>
> On Mon, May 21, 2012 at 10:24 AM, Ravish Bhagdev <ravish.bhag...@gmail.com
> > wrote:
>
>> Hi All,
>>
>> I was wondering if omitNorms will have any effect on MLT handler at all?
>>
>> I'm using schema version 1.2 with Solr 1.4 and have defined couple of
>> fields, which I want to use for MLT lookup and don't want factors like
>> field length or TF/IDF to affect the scores.  The definitions are as below:
>>
>>      <fieldType name="lowercase" class="solr.TextField"
>> positionIncrementGap="100" omitNorms="true" omitTermFreqAndPositions="true">
>>  <analyzer>
>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>  <filter class="solr.LowerCaseFilterFactory" />
>> </analyzer>
>>  </fieldType>
>>
>> <fieldType name="text_nonorms" class="solr.TextField"
>> positionIncrementGap="100" omitNorms="true" omitTermFreqAndPositions="true">
>>  <analyzer type="index">
>> <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>  <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="0" />
>>  <filter class="solr.LowerCaseFilterFactory" />
>> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"
>> />
>>  <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>> </analyzer>
>>  <analyzer type="query">
>> <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true" />
>>  <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
>> catenateAll="0" />
>>  <filter class="solr.LowerCaseFilterFactory" />
>> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"
>> />
>>  <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>> </analyzer>
>>  </fieldType>
>>
>> <!-- and the fields that use the above field types are -->
>>  <field name="PROFILE_TAGS" type="lowercase" indexed="true"
>> stored="true" multiValued="true" termVectors="true"/>
>>  <field name="PROFILE_TAGS_TXT" type="text_nonorms" indexed="true"
>> stored="true" multiValued="true" termVectors="true"/>
>>
>> In My solrconfig.xml I have defined following for my MLT request handler:
>>
>>   <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
>>  <lst name="defaults">
>> <str name="mlt.fl">PROFILE_TAGS,PROFILE_TAGS_TXT</str>
>>  <str name="mlt.qf">PROFILE_TAGS^10.0 PROFILE_TAGS_TXT^2.0</str>
>> <int name="mlt.mindf">1</int>
>>  <int name="mlt.mintf">1</int>
>> <str name="fl">id,score</str>
>>  <str name="mlt.fl">PROFILE_TAGS,PROFILE_TAGS_TXT</str>
>> </lst>
>>   </requestHandler>
>>
>>
>> However, when I run my query as follows:
>>
>> http://localhost:9090/solr/mlt?fl=*,score&start=0&q=id:4417454.matchRecord&qt=/mlt&fq=targetDB:ConnectMeDB&rows=1000&&debugQuery=on
>>
>> the debug scoring info shows following:
>>
>> <str name="5042172.matchRecord">
>> 0.17156276 = (MATCH) product of:
>>   1.4296896 = (MATCH) sum of:
>>     0.24737607 = (MATCH) weight(PROFILE_TAGS_TXT:system^5.0 in 1472),
>> product of:
>>       0.06376338 = queryWeight(PROFILE_TAGS_TXT:system^5.0), product of:
>>         5.0 = boost
>>         3.8795946 = idf(docFreq=538, maxDocs=9598)
>>         0.0032871156 = queryNorm
>>       3.8795946 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:system in 1472),
>> product of:
>>         1.0 = tf(termFreq(PROFILE_TAGS_TXT:system)=1)
>>         3.8795946 = idf(docFreq=538, maxDocs=9598)
>>         1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
>>     0.65193653 = (MATCH) weight(PROFILE_TAGS_TXT:adapt^5.0 in 1472),
>> product of:
>>       0.10351306 = queryWeight(PROFILE_TAGS_TXT:adapt^5.0), product of:
>>         5.0 = boost
>>         6.298109 = idf(docFreq=47, maxDocs=9598)
>>         0.0032871156 = queryNorm
>>       6.298109 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:adapt in 1472),
>> product of:
>>         1.0 = tf(termFreq(PROFILE_TAGS_TXT:adapt)=1)
>>         6.298109 = idf(docFreq=47, maxDocs=9598)
>>         1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
>>     0.530377 = (MATCH) weight(PROFILE_TAGS_TXT:optic^5.0 in 1472),
>> product of:
>>       0.093365155 = queryWeight(PROFILE_TAGS_TXT:optic^5.0), product of:
>>         5.0 = boost
>>         5.6806736 = idf(docFreq=88, maxDocs=9598)
>>         0.0032871156 = queryNorm
>>       5.6806736 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:optic in 1472),
>> product of:
>>         1.0 = tf(termFreq(PROFILE_TAGS_TXT:optic)=1)
>>         5.6806736 = idf(docFreq=88, maxDocs=9598)
>>         1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
>>   0.12 = coord(3/25)
>> </str>
>>
>> Which seems to suggest that the TF/IDF is being performed on these
>> fields!  Also, does it make any difference if I specify omitNorms in
>> <field> definition vs specifying in <fieldType> definition?
>>
>> I will appreciate any help with this.
>>
>> Thanks,
>> Ravish
>>
>
>

Re: No Effect of omitNorms and omitTermFreqAndPositions when using MLT handler?

Reply via email to