Re: [gsoc] Collaborative filtering algorithms

2009-04-01 Thread Ted Dunning
The machinery of SVD is almost always described in terms of least squares matrix approximation without mentioning the probabilistic underpinnings of why least-squares is a good idea. The connection, however, goes all the way back to Gauss' reduction of planetary position observations (this is *why

Re: [gsoc] Collaborative filtering algorithms

2009-04-01 Thread Atul Kulkarni
Thanks David, that helped. On Wed, Apr 1, 2009 at 1:47 AM, David Hall wrote: > On Tue, Mar 31, 2009 at 11:43 PM, Atul Kulkarni > wrote: > > questions in line. > > > > On Wed, Apr 1, 2009 at 1:27 AM, Ted Dunning > wrote: > > > >> Nobody is working on SVD yet, but one GSOC applicant has said th

Re: [gsoc] Collaborative filtering algorithms

2009-03-31 Thread David Hall
On Tue, Mar 31, 2009 at 11:43 PM, Atul Kulkarni wrote: > questions in line. > > On Wed, Apr 1, 2009 at 1:27 AM, Ted Dunning wrote: > >> Nobody is working on SVD yet, but one GSOC applicant has said that they >> would like to work on LDA which is a probabilistic relative of SVD. >> > I do not unde

Re: [gsoc] Collaborative filtering algorithms

2009-03-31 Thread Atul Kulkarni
On Wed, Apr 1, 2009 at 1:30 AM, Ted Dunning wrote: > I would hope that your SVD implementation would not be limited to NetFlix > like problems, but would be applicable to any reasonably sparse matrix-like > data. > Yes, ofcourse. it would apply to any large sparse matrix implementation. > > Like

Re: [gsoc] Collaborative filtering algorithms

2009-03-31 Thread Atul Kulkarni
questions in line. On Wed, Apr 1, 2009 at 1:27 AM, Ted Dunning wrote: > Nobody is working on SVD yet, but one GSOC applicant has said that they > would like to work on LDA which is a probabilistic relative of SVD. > I do not understand the relation in LDA and SVD. In my limited understanding I u

Re: [gsoc] Collaborative filtering algorithms

2009-03-31 Thread Ted Dunning
I would hope that your SVD implementation would not be limited to NetFlix like problems, but would be applicable to any reasonably sparse matrix-like data. Likewise, I would expect a good SVD implementation to be useful for nearest neighbor methods or direct prediction by smoothing the history vec

Re: [gsoc] Collaborative filtering algorithms

2009-03-31 Thread Ted Dunning
Nobody is working on SVD yet, but one GSOC applicant has said that they would like to work on LDA which is a probabilistic relative of SVD. The approach in your reference (3) is highly amenable to parallel implementation. Large-scale SVD would be a very interesting application for Mahout. On Tue

Re: [gsoc] Collaborative filtering algorithms

2009-03-31 Thread Atul Kulkarni
>I agree that getting a parallel SVD running is in and of itself >probably a good project in terms of size. On the other hand it would >be better to end up with a basic recommender as a final product. But >even if SVD by itself doesn't make up a complete unit by itself for >collaborative filtering

Re: [gsoc] Collaborative filtering algorithms

2009-03-07 Thread Ted Dunning
SVD or a cousin was a very common feature among the leading netflix entries. SVD is, indeed, very slow if you do a complete decomposition. The point, of course, for large sparse matrices is that you want an approximation so you only compute the first few singular vectors/values. To do this effic

Re: [gsoc] Collaborative filtering algorithms

2009-03-07 Thread Ted Dunning
Simple co-occurrence counting is at the heart of most large-scale recommendation systems. Counting plus simple (but sound) statistical filtering suffices for a broad range of recommendation tasks with very high quality results. For statistical filtering, I typically recommend the G^2 statistic as

Re: [gsoc] Collaborative filtering algorithms

2009-03-05 Thread Jason Rennie
On Thu, Mar 5, 2009 at 4:24 AM, Sean Owen wrote: > This would be a fantastic project, implementing a Recommender based on > this approach . I tried implementing an SVD technique a couple years > ago and it was waaay too slow on one machine. Revisiting with Hadoop > sounds great. SVD (at least h

Re: [gsoc] Collaborative filtering algorithms

2009-03-05 Thread Sean Owen
On Thu, Mar 5, 2009 at 1:08 PM, QIU, Yin wrote: > Glad that you are so positive about this. I just googled and found the > article addressing parallel SVD [1], which was devised by Google. I > shall spend some time reading this. If we are really going to do this > project, implementing only the SV

Re: [gsoc] Collaborative filtering algorithms

2009-03-05 Thread QIU, Yin
Hi Sean, > Really, I have never run this code in a real Hadoop environment. There > could be bugs, or improvements, that fall out from that. For example > there might be some more efficient way to use Hadoop that I don't see. > I don't have anything specific in mind -- these are unknown-unknowns >

Re: [gsoc] Collaborative filtering algorithms

2009-03-05 Thread Sean Owen
On Thu, Mar 5, 2009 at 7:27 AM, QIU, Yin wrote: > I don't know slope one recommender yet. Maybe I should read that first > to know how you manage to divide the tasks. However, a little > explanation in advance would be appreciated. http://en.wikipedia.org/wiki/Slope_One explains slope one pretty

Re: [gsoc] Collaborative filtering algorithms

2009-03-04 Thread QIU, Yin
Hi! > Yes there is a framework in the code for running a Recommender across > machines in Hadoop, and a Hadoop job which distributes part of the > processing for a slope one recommender. I don't know slope one recommender yet. Maybe I should read that first to know how you manage to divide the ta

Re: [gsoc] Collaborative filtering algorithms

2009-03-04 Thread Sean Owen
(Oops, of course. Didn't mean to imply there should be a side conversation but that's how it came out. I just mean there is definitely at least one person here who could and would 'mentor' such a project.) On 4 Mar 2009, 12:20 PM, "Grant Ingersoll" wrote: On Mar 4, 2009, at 3:55 AM, Sean Owen wr

Re: [gsoc] Collaborative filtering algorithms

2009-03-04 Thread Grant Ingersoll
On Mar 4, 2009, at 3:55 AM, Sean Owen wrote: Yes there is a framework in the code for running a Recommender across machines in Hadoop, and a Hadoop job which distributes part of the processing for a slope one recommender. Both could use testing, refinement and enhancement. I do not know of an

Re: [gsoc] Collaborative filtering algorithms

2009-03-04 Thread Sean Owen
Yes there is a framework in the code for running a Recommender across machines in Hadoop, and a Hadoop job which distributes part of the processing for a slope one recommender. Both could use testing, refinement and enhancement. I do not know of an algorithm which is by nature efficiently distrib

[gsoc] Collaborative filtering algorithms

2009-03-03 Thread QIU, Yin
Hi mahout folks, For this year's GSoC, I'm particularly interested in CF-related algorithms running on MapReduce-like environments. Will anyone tell me about the current status of recommender algorithms in Mahout please? Does it need any improvement? Thanks a lot. -- Yin Qiu