Re: Can we manipulate termfreq to count as 1 for multiple matches?

2013-03-22 Thread Chris Hostetter

: parameter *omitTermFreqAndPositions*

the key thing to remember being: if you use this, then by omiting 
positions you can no longer do phrase queries.

: or you can use a custom similarity class that overrides the term freq and
: return one for only that field.
: http://wiki.apache.org/solr/SchemaXml#Similarity

There is actaully a SImilarity class already written designed to target 
this specific problem of keyword spamming in text fields...

:  Document_1
:  Name = Blue Jeans
:  Description = This jeans is very soft.  Jeans is pretty nice.
: 
:  Now, If I Search for Jeans then Jeans is found in 2 places in
:  Description field.

...first off, it's important to remember that 'tf' doesn't afect things in 
isolation -- usually there is also a lenghtNorm factor that would 
penalize the score of that document compared to another one that had a 
short description that only included the word Jeans once (ie: These are 
Red Jeans)

Using the SweetSpotSimilarity, you can specify target values identifying 
what ideal values (ie: sweet spot) you anticipate in a typical document 
for both the tf and lengthNorm ... 

https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html
https://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/SweetSpotSimilarity.html

...so if you want to say that 1 to 4 instances of the term are equally 
good, and above that start to reward docs more you could configure the tf 
function to do that.

(If you really want the same tf() scoring factor for all docs, regardless 
on how many times the term is mentioned -- then you would need to write 
your own SImilarity subclass at the moment)

-Hoss


Re: Can we manipulate termfreq to count as 1 for multiple matches?

2013-03-14 Thread Felipe Lahti
Hi!

Take a look on
http://wiki.apache.org/solr/SchemaXml#Common_field_options
parameter *omitTermFreqAndPositions*

or you can use a custom similarity class that overrides the term freq and
return one for only that field.
http://wiki.apache.org/solr/SchemaXml#Similarity

  fieldType name=text_dfr class=solr.TextField
analyzer class=org.apache.lucene.analysis.standard.StandardAnalyzer/
similarity class=solr.MyCustomSimiliratyWithoutTermFreq
/similarity
  /fieldType


Best,

On Wed, Mar 13, 2013 at 8:43 PM, roz dev rozde...@gmail.com wrote:

 Hi All

 I am wondering if there is a way to alter term frequency of a certain field
 as 1, even if there are multiple matches in that document?

 Use Case is:

 Let's say that I have a document with 2 fields

 - Name and
 - Description

 And, there is a document with data like this

 Document_1
 Name = Blue Jeans
 Description = This jeans is very soft.  Jeans is pretty nice.

 Now, If I Search for Jeans then Jeans is found in 2 places in
 Description field.

 Term Frequency for Description is 2

 I want Solr to count term frequency for Description as 1 even if Jeans is
 found multiple times in this field.

 For all other fields, i do want to get the term frequency, as it is.

 Is this doable in Solr with any of the functions?

 Any inputs are welcome.

 Thanks
 Saroj




-- 
Felipe Lahti
Consultant Developer - ThoughtWorks Porto Alegre


Can we manipulate termfreq to count as 1 for multiple matches?

2013-03-13 Thread roz dev
Hi All

I am wondering if there is a way to alter term frequency of a certain field
as 1, even if there are multiple matches in that document?

Use Case is:

Let's say that I have a document with 2 fields

- Name and
- Description

And, there is a document with data like this

Document_1
Name = Blue Jeans
Description = This jeans is very soft.  Jeans is pretty nice.

Now, If I Search for Jeans then Jeans is found in 2 places in
Description field.

Term Frequency for Description is 2

I want Solr to count term frequency for Description as 1 even if Jeans is
found multiple times in this field.

For all other fields, i do want to get the term frequency, as it is.

Is this doable in Solr with any of the functions?

Any inputs are welcome.

Thanks
Saroj