I don't even think that clustering is all that necessary.

The reduced cooccurrence matrix will give you items related to each item.

You can use something like PCA, but SVD is just as good here due to near
zero mean.  You could SSVD or ALS from Mahout to do this analysis and then
use k-means on the right singular vectors (aka item representation).

What is the high level goal that you are trying to solve with this
clustering?




On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner <[email protected]>wrote:

> And running the clustering on the cooccurrence matrix or doing PCA by
> removing eigenvalues/vectors?
>
> On May 6, 2013, at 8:52 PM, Ted Dunning <[email protected]> wrote:
>
> > On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner <[email protected]
> >wrote:
> >
> >> Oh, and I forgot how the views and sales are used to build product
> >> vectors. As of now, I implemented binary vectors, vectors counting the
> >> number of views and sales (e.g 1view=1count, 1sale=10counts) and
> ordinary
> >> vectors ( view => 1, sale=>5).
> >>
> >
> > I would recommend just putting the view and sale in different columns and
> > doing cooccurrence analysis on this.
>
>

Reply via email to