Hello I have a problem with finding similarities between items in Mahout. Most of the similarity values are "NaN". In my work, I want to calculate the similarity between research papers that users bookmark in their libraries. Those users are connected using three different implicit social networks based on their bookmarking behavior in CiteULike, now I want to show which social network will connect the most similar users (user-based) OR show that those connected users share similar information (item-based). So, I need to compute the similarities between users. Since, there is no explicit rating, I tried Loglikelihood and Tanimoto in Mahout but I am getting lots of NaN values. I tried user-based and Item-based. I am not sure if my code is 100% correct since I am new to Mahout. Especially for the item-based since that I am not sure if the inverted matrix is built by mahout. I mean building the item-item matrix .
I tried to build the model using: DataModel model = *new* GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.*toDataMap*(*new* FileDataModel(*new* File("FILENAME.csv")))); Then I calculate the similarity using: ItemSimilarity similarity = *new* TanimotoCoefficientSimilarity(model); Then I used the list of the paperids to go through the matrix and print the similarities. For instance, if I have paperids: 1,2,3,4,etc. I tried to print the similarities between paper1 and paper as: System.*out*.println("item similarity:"+similarity.itemSimilarity(1, 2)); When I checked the NaN values, it seems if the paper is not bookmarked twice in the dataset, I got NaN In the case of user-based I used the following: DataModel model = *new* FileDataModel(*new* File("FILENAME.csv")); UserSimilarity similarity = *new* LogLikelihoodSimilarity(model); UserSimilarity jaccsimilarity = *new* TanimotoCoefficientSimilarity(model); UserNeighborhood neighborhood = *new* NearestNUserNeighborhood(5, similarity, model); Then from the list of userids, I tried to print the similarities between users who are connected using the social network as follows: System.*out*.println("user Similarity:"+String.*valueOf*(similarity.userSimilarity(user1, user2))); Could you please help me to understand why I am getting lots of NaN values, and how I can deal with them to compare the different average similarities of the three social networks. should I replace them with zero !! (mathematically, if the intersection is zero in TanimotoCoefficient, and Logliklihood, this means I should get zero) Thanks Shaikhah