A less simple but better way to cluster would be to run the vectors in the DRM 
through SSVD and cluster the factorized vectors. This turns sometimes very 
sparse vectors into dimensionally reduced dense vectors and can improve the 
clusters. Same applies to the item vectors. Also I've been told that 
streaming-kmeans works better for very sparse vectors. I'm planning to try this 
for clustering items shortly.

On Sep 18, 2013, at 6:15 PM, Pat Ferrel <[email protected]> wrote:

The simplest way to cluster users would be to take the output of 
PreparePreferenceMatrixJob, which creates a DistributedRowMatrix (DRM) of all 
user prefs. The rows are users the columns items, the values are preference 
values. Cluster the rows. Transpose that matrix and clustering rows will give 
you item clusters--nifty.

On Sep 17, 2013, at 1:41 PM, "Martin, Nick" <[email protected]> wrote:

Hi all,

I'm looking for the best way to get user clusters from my recommendation 
output. Idea being I have my recommended items for users (user, item, score) 
based on their preferences but I want to see how the users were clustered 
together (and their similarity) so I can run some other analytics on those 
clusters. I found some discussion on this here 
(http://lucene.472066.n3.nabble.com/Turning-Preference-Files-Into-Vectors-td640035.html)
 but I'm not sure if any updates have been made since this thread that would 
make this a bit easier? If not, is what's discussed in the thread my best 
approach?

Hope that makes sense...

Thanks,
Nick


Reply via email to