I'm most worried about losing ordering and I think I can just order the items A B C by convention.
Using Mahout to do clustering we used to double or triple add the title to get artificial boosting without fields. The technique works and may be worth an experiment later, thanks. BTW it looks like similarity and TFIDF are plugable in Solr and seem pretty easy to change. Planning to use cosine for the first cut since it's default. On Jul 24, 2013, at 4:10 AM, Michael Sokolov <[email protected]> wrote: On 7/23/13 7:26 PM, Pat Ferrel wrote: > Honestly not trying to make this more complicated but… > > > > From past experience I strongly suspect item similarity rank is not something > we want to lose so unless someone has a better idea I'll just order the IDs > in the fields and call it good for now. > > If I understand you correctly, you are concerned about just throwing all the items in without regard to order, or weight). I think Ted's suggestion was not to worry about that, but if you do have time and want to tackle this, one thing you can do is to add an item multiple times. For example, suppose you have items A, B, C, ... with A ranked highest. Then index a "document" in Solr like this: A A A B B C this will end up giving A a higher frequency count in the index. The number of repeats would be kind of arbitrary. You might want to make it a linear function of rank or a quantized version of the similarity score. But this might end up being a noise-level effect ... it's probably not worth losing sleep over. On the other hand, it's probably less useful to order the IDs since once they get put in the index the token "order" is stored as a "position" which isn't (usually) used for scoring, although I suppose some custom scorer could do that, too. -Mike
