Hi Walter,

Thanks for the idea.  However, in my case the volume of data requires I 
take advantage of Sphinx if possible (the site is http://teenormous.com).

The previous ranker did not add to the weight score when duplicates were 
within the same search field.  So, my previous approach to limit keyword 
stuffing was to put all the fields I wanted to have the same impact on the 
results into one combined search_data field.  However, this is no longer 
working as I am seeing duplicate keywords cause a slightly higher weight 
score.

I just found the answer as I was typing this.  The solution (for me) to 
ignore duplicate works is to use the :proximity ranker, rather than the 
default :proximity_bm25.  For example:

Article.search "pancakes", :ranker => :proximity

The clue came from this 
thread: http://sphinxsearch.com/forum/view.html?id=3675.  Specifically this 
portion talking about the weight calculation for the default 
:proximity_bm25 ranker: 

" the last three digits, are the 3 most significant decimal places from the 
> BM25 ranking algorithm (whose aim is to rank rare words higher, or many 
> instances of a word in a single document higher)."


Thanks,

Tom

On Wednesday, April 8, 2015 at 8:06:24 AM UTC-4, Walter Davis wrote:
>
> I don't know how to do this in Sphinx per se, but what about adding a 
> keywords column to your table, and only index that. Then you could control 
> what makes it in there with some logic (split and uniq and join).
>
> Walter
>
> On Apr 7, 2015, at 5:58 PM, Tom Davies <[email protected] <javascript:>> 
> wrote:
>
> I just recently upgraded from ThinkingSphinx from 2.0.14 to 3.1.3 and 
> Sphinx from 2.0.4 to 2.2.8 and for the most part everything is working well.
>
> However, I noticed a slight change to the default weight calculation based 
> on the new ranker.  The new default ranker appears to be adding additional 
> weight for repeated phrases within the same field.
>
> Is there an easy way to only apply a weight for one occurrence in each 
> field?
>
> For example, if I have two t-shirts:
>
> Tee 1:  
>
> name: "Star Wars"
>
> Tee 2:
>
> name: "Star Wars Star Wars"
>
> I would like them to end up with the same weight.  In my case, I have 
> t-shirt data from a variety of merchants and many of them are prone to 
> stuffing keywords over and over.  This is causing them to get an extra bump 
> as a result.
>
> Thanks!
> Tom
>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.

Reply via email to