Good to know you found a way through Tom :) > On 8 Apr 2015, at 11:12 pm, Tom Davies <[email protected]> wrote: > > Hi Walter, > > Thanks for the idea. However, in my case the volume of data requires I take > advantage of Sphinx if possible (the site is http://teenormous.com). > > The previous ranker did not add to the weight score when duplicates were > within the same search field. So, my previous approach to limit keyword > stuffing was to put all the fields I wanted to have the same impact on the > results into one combined search_data field. However, this is no longer > working as I am seeing duplicate keywords cause a slightly higher weight > score. > > I just found the answer as I was typing this. The solution (for me) to > ignore duplicate works is to use the :proximity ranker, rather than the > default :proximity_bm25. For example: > > Article.search "pancakes", :ranker => :proximity > > The clue came from this thread: > http://sphinxsearch.com/forum/view.html?id=3675. Specifically this portion > talking about the weight calculation for the default :proximity_bm25 ranker: > > " the last three digits, are the 3 most significant decimal places from the > BM25 ranking algorithm (whose aim is to rank rare words higher, or many > instances of a word in a single document higher)." > > Thanks, > > Tom > > On Wednesday, April 8, 2015 at 8:06:24 AM UTC-4, Walter Davis wrote: > I don't know how to do this in Sphinx per se, but what about adding a > keywords column to your table, and only index that. Then you could control > what makes it in there with some logic (split and uniq and join). > > Walter > > On Apr 7, 2015, at 5:58 PM, Tom Davies <[email protected] <javascript:>> > wrote: > >> I just recently upgraded from ThinkingSphinx from 2.0.14 to 3.1.3 and Sphinx >> from 2.0.4 to 2.2.8 and for the most part everything is working well. >> >> However, I noticed a slight change to the default weight calculation based >> on the new ranker. The new default ranker appears to be adding additional >> weight for repeated phrases within the same field. >> >> Is there an easy way to only apply a weight for one occurrence in each field? >> >> For example, if I have two t-shirts: >> >> Tee 1: >> >> name: "Star Wars" >> >> Tee 2: >> >> name: "Star Wars Star Wars" >> >> I would like them to end up with the same weight. In my case, I have >> t-shirt data from a variety of merchants and many of them are prone to >> stuffing keywords over and over. This is causing them to get an extra bump >> as a result. >> >> Thanks! >> Tom >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/thinking-sphinx >> <http://groups.google.com/group/thinking-sphinx>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at http://groups.google.com/group/thinking-sphinx > <http://groups.google.com/group/thinking-sphinx>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>.
-- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
