Yes. You can turn the normal item-item relationships around to get this. What you have is an item x feature matrix. Normally, one has a user x item matrix in cooccurrence analysis and you get an item x item matrix.
If you consider the features to be "users" in the computation, then the resulting indicator matrix would be just what you want. The basic idea is that items would be related if they share features. Two items that have the same feature would be said to co-occur on that feature. Finding anomalous cooccurrence would be what you need to do to find items that co-occur on many features. This works by building a small 2x2 matrix that relates item A and item B. The entries would be feature counts. The upper left entry of the matrix is the number of features that A and B both have, the upper right is the number of features that B has that A does not and so on. Put another way, the columns represent features that A has or does not have respectively and the rows represent the features that B has or does not have respectively. Items that give high root log-likelihood ratio values should considered connected. Those that have small values for root LLR should be considered not connected. The value of the root-LLR should be discarded after thresholding and should not be considered a measure of the strength of the relationship. I would recommend the same down-sampling that the rowSimilarityJob already does. On Sun, Sep 29, 2013 at 3:40 AM, Mridul Kapoor <[email protected]>wrote: > Hi > > I have records - items - with many features. > Something like > > ID, feature1, feature2, ..., featureN > > > > Can I leverage Mahout's log-likelihood similarity metrics for calculating > the K-Most similar items to a given item X? > > - > Thanks > Mridul >
