Re: Can we manipulate termfreq to count as 1 for multiple matches?
: parameter *omitTermFreqAndPositions* the key thing to remember being: if you use this, then by omiting positions you can no longer do phrase queries. : or you can use a custom similarity class that overrides the term freq and : return one for only that field. : http://wiki.apache.org/solr/SchemaXml#Similarity There is actaully a SImilarity class already written designed to target this specific problem of keyword spamming in text fields... : Document_1 : Name = Blue Jeans : Description = This jeans is very soft. Jeans is pretty nice. : : Now, If I Search for Jeans then Jeans is found in 2 places in : Description field. ...first off, it's important to remember that 'tf' doesn't afect things in isolation -- usually there is also a lenghtNorm factor that would penalize the score of that document compared to another one that had a short description that only included the word Jeans once (ie: These are Red Jeans) Using the SweetSpotSimilarity, you can specify target values identifying what ideal values (ie: sweet spot) you anticipate in a typical document for both the tf and lengthNorm ... https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html https://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/SweetSpotSimilarity.html ...so if you want to say that 1 to 4 instances of the term are equally good, and above that start to reward docs more you could configure the tf function to do that. (If you really want the same tf() scoring factor for all docs, regardless on how many times the term is mentioned -- then you would need to write your own SImilarity subclass at the moment) -Hoss
Re: Can we manipulate termfreq to count as 1 for multiple matches?
Hi! Take a look on http://wiki.apache.org/solr/SchemaXml#Common_field_options parameter *omitTermFreqAndPositions* or you can use a custom similarity class that overrides the term freq and return one for only that field. http://wiki.apache.org/solr/SchemaXml#Similarity fieldType name=text_dfr class=solr.TextField analyzer class=org.apache.lucene.analysis.standard.StandardAnalyzer/ similarity class=solr.MyCustomSimiliratyWithoutTermFreq /similarity /fieldType Best, On Wed, Mar 13, 2013 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am wondering if there is a way to alter term frequency of a certain field as 1, even if there are multiple matches in that document? Use Case is: Let's say that I have a document with 2 fields - Name and - Description And, there is a document with data like this Document_1 Name = Blue Jeans Description = This jeans is very soft. Jeans is pretty nice. Now, If I Search for Jeans then Jeans is found in 2 places in Description field. Term Frequency for Description is 2 I want Solr to count term frequency for Description as 1 even if Jeans is found multiple times in this field. For all other fields, i do want to get the term frequency, as it is. Is this doable in Solr with any of the functions? Any inputs are welcome. Thanks Saroj -- Felipe Lahti Consultant Developer - ThoughtWorks Porto Alegre
Can we manipulate termfreq to count as 1 for multiple matches?
Hi All I am wondering if there is a way to alter term frequency of a certain field as 1, even if there are multiple matches in that document? Use Case is: Let's say that I have a document with 2 fields - Name and - Description And, there is a document with data like this Document_1 Name = Blue Jeans Description = This jeans is very soft. Jeans is pretty nice. Now, If I Search for Jeans then Jeans is found in 2 places in Description field. Term Frequency for Description is 2 I want Solr to count term frequency for Description as 1 even if Jeans is found multiple times in this field. For all other fields, i do want to get the term frequency, as it is. Is this doable in Solr with any of the functions? Any inputs are welcome. Thanks Saroj