You mean that there are two items with near identical data, and one shows up in recs and the other doesn't? I can make a general guess, and it comes down to the fact that your similarity data isn't "transitive". This comes up in a minor and a major way.
The minor way is just theoretical: these metrics aren't necessarily by nature transitive; your modification, maybe less so. A and B and C may be individually pairwise fairly similar, but they don't necessarily obey the "triangle inequality". Yet, good metrics ought to be transitive-ish, with enough good data. The more major point is practical: it may be that A and B and C are all pairwise very similar, but perhaps your data lacks enough information to know enough about one of the pairs. Maybe A and B are similar, and A and C are similar, and B and C are too but the B-C similarity can't be computed, or computed well, due to sparsity or missing data. So then: a user who rates C would perhaps only see A recommended, even though A and B are intuitively both nearly equally good. B is good too; the data may just not be able to know it. On Sun, Jan 29, 2012 at 10:05 PM, Nick Jordan <[email protected]> wrote: > Hey There, > > Quick questions about the expected behavior when using a customer item > similarity model that extends AbstractItemSimilarity within a > KnnItemBasedRecommender. Basically I have some logic in doItemSimilarity() > that returns 1.0 in certain cases. > > When then looking at actual recommendations for a user, I notice that some > of the items who should have perfect similarity based on the above logic > have wildly varying recommendations. I'm not sure that this is unexpected > behavior, but if it is not I'd like to understand why two items with > perfect similarity would have different estimated preferences for the same > user. The user has not actually rated any of the items in question. > > Thanks in advance. > > Nick >
