Hi My RowSimiliarityJob returns a DRM with some rows missing. The input file is very sparse. there are about 600 columns but only 1 - 6 would have a value (for each row). The output file has some rows missing. The missing rows are the ones with only 1 - 2 values filled. Not all rows with 1 or 2 values are missing, just some of them. And the missing rows are not always the same for each RowSimilarityJob execution
What I would like to achieve is to find the relative strength between rows. For example, if there are 600 books, user1 and user2 like only one book (the same book), then there should be a correlation between these 2 users. But my RowSimilarityJob output file seems to skip some of the users with sparse preferences. I am running the job locally with 4 options: input, output, SIMILARITY_LOGLIKELIHOOD, and temp dir. What would be the right approach to pick up similarity between users with sparse preferences? Thanks! Edith
