Re: KMeans & clustering experiments

Lance Norskog Mon, 15 Nov 2010 20:13:00 -0800


Ted Dunning wrote:

On Mon, Nov 15, 2010 at 1:54 AM, Lance Norskog<[email protected]>  wrote:

I have some questions about KMeans and clustering. I'm generating matrices
from recommendation data models.


what does "generating matrices" mean?

The matrix output is a pair of matrices. Each is a separate set of vectors, one 
for each item and one for each user.

For this project I create a set of canopies with the CanopyClustererfrom the item matrix. Then, I run KMeans using the Canopy cluster set.This approach is suggested in Mahout In Action, Section 9.1.5.

To decide whether the generated matrices have interesting data, I'm
generating and charting KMeans clusters. Next, I'm mapping all of the
vectors in the matrix to a nearest "corner" and then clustering those
corners.

This mapping sounds like assignment to a randomly generated cluster.

Why does clustering those corners give you any different results before or
after mapping vectors to the corners?  Does the mapping change the corner?

Ah! I'm not using KMeans on random clusters; I'm using it on the canopyoutput. I make the canopies from the training set. I then run KMeans onthe test set using the canopies from the training set. You mentionedrecently that this should come out very different. I also ran a random"item vector" matrix using the same canopies, and they look as wrong asthe KMeans output from the test set.

Now, to the corner concept. I quantize the training set vectors. Theoutput is just corners, and there may be several items at the samecorner. I then ran KMeans on on the quantized vectors, again using thecanopies from training set. In other words, I just made alower-information version of the training set and clustered it accordingto the more precise canopies. This is what made the really crazyheart-shaped spiral.


Oh well, thanks for your time.

Lance

Re: KMeans & clustering experiments

Reply via email to