Ted Dunning wrote:
On Mon, Nov 15, 2010 at 1:54 AM, Lance Norskog<[email protected]>  wrote:

I have some questions about KMeans and clustering. I'm generating matrices
from recommendation data models.

what does "generating matrices" mean?

The matrix output is a pair of matrices. Each is a separate set of vectors, one 
for each item and one for each user.

For this project I create a set of canopies with the CanopyClusterer from the item matrix. Then, I run KMeans using the Canopy cluster set. This approach is suggested in Mahout In Action, Section 9.1.5.

To decide whether the generated matrices have interesting data, I'm
generating and charting KMeans clusters. Next, I'm mapping all of the
vectors in the matrix to a nearest "corner" and then clustering those
corners.

This mapping sounds like assignment to a randomly generated cluster.

Why does clustering those corners give you any different results before or
after mapping vectors to the corners?  Does the mapping change the corner?

Ah! I'm not using KMeans on random clusters; I'm using it on the canopy output. I make the canopies from the training set. I then run KMeans on the test set using the canopies from the training set. You mentioned recently that this should come out very different. I also ran a random "item vector" matrix using the same canopies, and they look as wrong as the KMeans output from the test set.

Now, to the corner concept. I quantize the training set vectors. The output is just corners, and there may be several items at the same corner. I then ran KMeans on on the quantized vectors, again using the canopies from training set. In other words, I just made a lower-information version of the training set and clustered it according to the more precise canopies. This is what made the really crazy heart-shaped spiral.

Oh well, thanks for your time.

Lance

Reply via email to