Ahh, this is because I have to override DefaultSimilarity to turn off tf/idf scoring? But this will apply to all the fields and general search on text fields as well? Is there a way to apply custom similarity to specific field types or fields only? Is there no way of turning TF/IDF off without this?
Thanks, Ravish On Mon, May 21, 2012 at 10:24 AM, Ravish Bhagdev <ravish.bhag...@gmail.com>wrote: > Hi All, > > I was wondering if omitNorms will have any effect on MLT handler at all? > > I'm using schema version 1.2 with Solr 1.4 and have defined couple of > fields, which I want to use for MLT lookup and don't want factors like > field length or TF/IDF to affect the scores. The definitions are as below: > > <fieldType name="lowercase" class="solr.TextField" > positionIncrementGap="100" omitNorms="true" omitTermFreqAndPositions="true"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > > <fieldType name="text_nonorms" class="solr.TextField" > positionIncrementGap="100" omitNorms="true" omitTermFreqAndPositions="true"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" > /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" > /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> > </analyzer> > </fieldType> > > <!-- and the fields that use the above field types are --> > <field name="PROFILE_TAGS" type="lowercase" indexed="true" stored="true" > multiValued="true" termVectors="true"/> > <field name="PROFILE_TAGS_TXT" type="text_nonorms" indexed="true" > stored="true" multiValued="true" termVectors="true"/> > > In My solrconfig.xml I have defined following for my MLT request handler: > > <requestHandler name="/mlt" class="solr.MoreLikeThisHandler"> > <lst name="defaults"> > <str name="mlt.fl">PROFILE_TAGS,PROFILE_TAGS_TXT</str> > <str name="mlt.qf">PROFILE_TAGS^10.0 PROFILE_TAGS_TXT^2.0</str> > <int name="mlt.mindf">1</int> > <int name="mlt.mintf">1</int> > <str name="fl">id,score</str> > <str name="mlt.fl">PROFILE_TAGS,PROFILE_TAGS_TXT</str> > </lst> > </requestHandler> > > > However, when I run my query as follows: > > http://localhost:9090/solr/mlt?fl=*,score&start=0&q=id:4417454.matchRecord&qt=/mlt&fq=targetDB:ConnectMeDB&rows=1000&&debugQuery=on > > the debug scoring info shows following: > > <str name="5042172.matchRecord"> > 0.17156276 = (MATCH) product of: > 1.4296896 = (MATCH) sum of: > 0.24737607 = (MATCH) weight(PROFILE_TAGS_TXT:system^5.0 in 1472), > product of: > 0.06376338 = queryWeight(PROFILE_TAGS_TXT:system^5.0), product of: > 5.0 = boost > 3.8795946 = idf(docFreq=538, maxDocs=9598) > 0.0032871156 = queryNorm > 3.8795946 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:system in 1472), > product of: > 1.0 = tf(termFreq(PROFILE_TAGS_TXT:system)=1) > 3.8795946 = idf(docFreq=538, maxDocs=9598) > 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) > 0.65193653 = (MATCH) weight(PROFILE_TAGS_TXT:adapt^5.0 in 1472), > product of: > 0.10351306 = queryWeight(PROFILE_TAGS_TXT:adapt^5.0), product of: > 5.0 = boost > 6.298109 = idf(docFreq=47, maxDocs=9598) > 0.0032871156 = queryNorm > 6.298109 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:adapt in 1472), > product of: > 1.0 = tf(termFreq(PROFILE_TAGS_TXT:adapt)=1) > 6.298109 = idf(docFreq=47, maxDocs=9598) > 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) > 0.530377 = (MATCH) weight(PROFILE_TAGS_TXT:optic^5.0 in 1472), product > of: > 0.093365155 = queryWeight(PROFILE_TAGS_TXT:optic^5.0), product of: > 5.0 = boost > 5.6806736 = idf(docFreq=88, maxDocs=9598) > 0.0032871156 = queryNorm > 5.6806736 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:optic in 1472), > product of: > 1.0 = tf(termFreq(PROFILE_TAGS_TXT:optic)=1) > 5.6806736 = idf(docFreq=88, maxDocs=9598) > 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472) > 0.12 = coord(3/25) > </str> > > Which seems to suggest that the TF/IDF is being performed on these fields! > Also, does it make any difference if I specify omitNorms in <field> > definition vs specifying in <fieldType> definition? > > I will appreciate any help with this. > > Thanks, > Ravish >