Hello Sean, Thanks very much for the detailed response! The proximity() is actually a similarity metric not a distance one. In my earlier implementations, I used distance tfidf.distance, hence the comment you saw in the code says distance.
I am working on decomposing the content based implementation from sales based implementation. So, thank you for that. As for merging the scores, I need an OR rule, which translates to the addition. If I used AND that will make the likelihood smaller because the probabilities will be multiplied. This will restrict the clusters to items that appears in the intersection of content-based similarity AND sales correlations. Does this sound right to you? A very important issue I am having now is about evaluation. How do we evaluate these clusters resulting from a TreeClusteringRecommender? I would appreciate any insight. Thanks so much for this lively discussion! -Ahmed On Thu, Mar 22, 2012 at 6:17 PM, Sean Owen <[email protected]> wrote: > Yes, but you can't use it as both things at once. I meant that you > swap them at the broadest level -- at your original input. So all > "items" are really users and vice versa. At the least you need two > separate implementations, encapsulating two different notions of > similarity. > > Similarity is item-item or user-user, not item-item. It makes some > sense to implement item-item similarity based on tags, so the first > half of the method looks OK (excepting I'd expect you to implement > itemSimilarity()). > > I think the other half makes more sense if you are calling > getUsersForItem() -- input is item, output are users. > > As for the final line -- my original comment stands, though it's right > for a wrong reason. You are not combining two distances here. You're > combining a similarity value and a distance (right? proximity is a > distance function?) and that's definitely not right. They go opposite > ways: big distance means small similarity. > > If you handle two similarities, the simple thing that is in the > ballpark on theoretically sound is to take their product. > > > On Thu, Mar 22, 2012 at 9:48 PM, Ahmed Abdeen Hamed > <[email protected]> wrote: > > You are correct. In a previous post, I inquired about the use of > > TreeClusteringRecommender which is based upon a UserSimilarity metrix. My > > question was whether I can use it for ItemSimialrity, and your answer was > > yes, just feed the itemID as a userID and vice versa and that's what I am > > doing in it the method. This is what this code is doing > > > > The purpose of this method is to derive a similarity that is based on > item > > attributes (name, brand, category) in addition to what the loglikelihood > > offers, so I am guaranteed to be getting recommendations for items such > as > > ("The Matrix", and "The Matrix Reloaded") if they never co-occur in the > data > > model. This is why I need to merge to the two scores somehow. > > > > Thanks again! > > Ahmed >
