RowSimilarityJob with sparse matrix skips rows

Edith Au Tue, 22 Jul 2014 09:00:06 -0700

Hi

My RowSimiliarityJob returns a DRM with some rows missing.   The input file
is very sparse.  there are about 600 columns but only 1 - 6 would have a
value (for each row).   The output file has some rows missing.  The missing
rows are the ones with only 1 - 2 values filled.  Not all rows with 1 or 2
values are missing, just some of them.  And the missing rows are not always
the same for each RowSimilarityJob execution


What I would like to achieve is to find the relative strength between
rows.  For example, if there are 600 books, user1  and user2 like only one
book (the same book), then there should be a correlation between these 2
users.

But my RowSimilarityJob output file seems to skip some of the users with
sparse preferences.  I am running the job locally with 4 options: input,
output, SIMILARITY_LOGLIKELIHOOD, and temp dir.   What would be the right
approach to pick up similarity between users with sparse preferences?

Thanks!

Edith

RowSimilarityJob with sparse matrix skips rows

Reply via email to