I'm most worried about losing ordering and I think I can just order the items A 
B C by convention.

Using Mahout to do clustering we used to double or triple add the title to get 
artificial boosting without fields. The technique works and may be worth an 
experiment later, thanks.

BTW it looks like similarity and TFIDF are plugable in Solr and seem pretty 
easy to change. Planning to use cosine for the first cut since it's default.

On Jul 24, 2013, at 4:10 AM, Michael Sokolov <[email protected]> 
wrote:

On 7/23/13 7:26 PM, Pat Ferrel wrote:
> Honestly not trying to make this more complicated but…
> 
> 
> 
> From past experience I strongly suspect item similarity rank is not something 
> we want to lose so unless someone has a better idea I'll just order the IDs 
> in the fields and call it good for now.
> 
> 
If I understand you correctly, you are concerned about just throwing all the 
items in without regard to order, or weight).  I think Ted's suggestion was not 
to worry about that, but if you do have time and want to tackle this, one thing 
you can do is to add an item multiple times.  For example, suppose you have 
items A, B, C, ... with A ranked highest.  Then index a "document" in Solr like 
this:

A A A B B C

this will end up giving A a higher frequency count in the index.

The number of repeats would be kind of arbitrary.  You might want to make it a 
linear function of rank or a quantized version of the similarity score.

But this might end up being a noise-level effect ... it's probably not worth 
losing sleep over.  On the other hand, it's probably less useful to order the 
IDs since once they get put in the index the token "order" is stored as a 
"position" which isn't (usually) used for scoring, although I suppose some 
custom scorer could do that, too.

-Mike

Reply via email to