Re: Clustering product views and sales

2013-05-07 Thread Pat Ferrel
You always will have a cold start problem for a subset of users--the new ones to a site. Popularity doesn't always work either. Sometimes you have a flat purchase frequency distribution, as I've seen. In these cases a metadata or content based recommender is nice to fill in. If you have no

Clustering product views and sales

2013-05-06 Thread Dominik Hübner
I am currently working on a dataset containing product views and sales of about 10^7 users and 6000 items for my master's thesis in CS. My goal is to build product clusters from this. As expected, item-(row)-vectors are VERY sparse. My current approach is to implement PCA using the SVDSolver

Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
And running the clustering on the cooccurrence matrix or doing PCA by removing eigenvalues/vectors? On May 6, 2013, at 8:52 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner cont...@dhuebner.comwrote: Oh, and I forgot how the views and sales are

Re: Clustering product views and sales

2013-05-06 Thread Ted Dunning
I don't even think that clustering is all that necessary. The reduced cooccurrence matrix will give you items related to each item. You can use something like PCA, but SVD is just as good here due to near zero mean. You could SSVD or ALS from Mahout to do this analysis and then use k-means on

Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
Well, as you already might have guessed, I am building a product recommender system for my thesis. I am planning to evaluate ALS (both, implicit and explicit) as well as item -similarity recommendation for users with at least a few known products. Nevertheless, the majority of users only has

Re: Clustering product views and sales

2013-05-06 Thread Koobas
Since Dominik mentioned item-based and ALS, let me throw in a question here. I believe that one of the Netflix price solutions combined KNN and ALS. 1) What is the best way to combine the results of both? 2) Is there really merit to this approach? 3) Are there other combinations that make sense?

Re: Clustering product views and sales

2013-05-06 Thread Ted Dunning
On Mon, May 6, 2013 at 12:50 PM, Koobas koo...@gmail.com wrote: Since Dominik mentioned item-based and ALS, let me throw in a question here. I believe that one of the Netflix price solutions combined KNN and ALS. 1) What is the best way to combine the results of both? I think that

Re: Clustering product views and sales

2013-05-06 Thread Koobas
I think I see the picture now. Thanks! On Mon, May 6, 2013 at 5:25 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, May 6, 2013 at 12:50 PM, Koobas koo...@gmail.com wrote: Since Dominik mentioned item-based and ALS, let me throw in a question here. I believe that one of the Netflix

Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
The cluster was mostly intended for tackling the cold start problem for new users. I want to build a recommender based on existing components or to be precise a combination of them. Unfortunately, the only product meta-data I currently have is the product price. Furthermore, this is a project

Re: Clustering product views and sales

2013-05-06 Thread Sean Owen
It sounds like you don't quite have a cold start problem. You have a few behaviors, a few views or clicks, not zero. So you really just need to find an approach that's quite comfortable with sparse input. A low-rank factorization model like ALS works fine in this case, for example. There's a

Re: Clustering product views and sales

2013-05-06 Thread Ted Dunning
Truly cold start is best handled by recommending the most popular items. If you know *anything* at all such as geo or browser or OS, then you can use that to recommend using conventional techniques (that is, you can recommend for the characteristics rather than for the person). Within a very few

Re: Clustering product views and sales

2013-05-06 Thread Dominik Hübner
One more thing for now @Ted: What do you refer to with sparsification and reconstruction? On May 7, 2013, at 12:19 AM, Ted Dunning ted.dunn...@gmail.com wrote: Truly cold start is best handled by recommending the most popular items. If you know *anything* at all such as geo or browser or OS,

Re: Clustering product views and sales

2013-05-06 Thread Johannes Schulte
Hi! As a starting point I remember this conversation containing both elements (although the reconstruction part is rather small, hint!) http://markmail.org/message/5cfewal3oyt6vw2k On Tue, May 7, 2013 at 1:00 AM, Dominik Hübner cont...@dhuebner.com wrote: One more thing for now @Ted: What do