Hi Sanjib,

MAHOUT-542 uses a different algorithmic approach to factorize the matrix (as described in "Large-scale Parallel Collaborative Filtering for the Netflix Prize" http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf ), it is not related to MAHOUT-371.

On 24.11.2010 07:28, Sanjib Kumar Das wrote:
 From what I understand Mahout-371 tries to address the
DistributedSVDRecommenderJob. Is it fully ready for use?

@Sebastian : The above recommender uses the DistributedLanczosSolver to
achieve the SVD. So, should the distributed Matrix Factorization(Mahout-542)
you were talking about be integrated with it instead?

I am slightly confused....
On Fri, Nov 19, 2010 at 4:32 PM, Ted Dunning<[email protected]>  wrote:

On Fri, Nov 19, 2010 at 2:27 PM, Sebastian Schelter<[email protected]>
wrote:

Can I use the new LanczosSolver to
achieve this?
The paper "Large-scale Parallel Collaborative Filtering for the Netflix
Prize" says that you can't use Lanczos to factorize a rating matrix as
it is only partially specified. However someone with more mathematical
expertise than me should validate that statement, hope I didn't get that
wrong :)

You correctly quoted the statement.  But I don't think that the statement
is
entirely
correct.  The difference in practice isn't all that big a deal.


Ted is working on LatentFactorLogLinear models in MAHOUT-525 which can
be used for recommendations too and should be superior to the approach
of MAHOUT-542. They're not distributed but in the paper in which they
are described the authors state that they could train the 1M Movielens
Dataset in 7 minutes so they should be fast enough for your testcase.

This is where I would push for recommendations.  I have a preliminary
implementation
available on github, but I don't think it is ready to commit.  It does do
roughly what it
is supposed to do (on one test) but I don't have enough runtime with it to
have any
level of confidence yet.


Reply via email to