Re: SVD in Mahout (was: Mahout Lanczos SVD complexity)

Radim Rehurek Wed, 21 Dec 2011 03:14:42 -0800

Hi Sean,

good to hear from a user! :)



> That's great info. Do you have a distributed version of this? :) I was
> actually hoping you would...

Actually, yes. It's only small-scale though (a handful of company/lab 
computers, a few billion non-zeroes), to scratch my own itch. We never had 
hundreds or thousands of machines, the robustness requirements are quite 
different there. That's exactly why I say I'm interested in the Mahout 
experiments. Trust me, I get no kicks out of prodding other implementations, I 
just want to learn about their results, stories and pitfalls. Saying that "no 
evaluation needed, we are different" is not an acceptable answer doesn't mean 
that I'm upset.

Also interesting: I'm not sure it's understood, but distributing SVD (the math) 
doesn't bring you anything here. The cost is dominated by the number of passes 
over input data, which is dominated by I/O (for such small values of `k`). This 
is an area where Mahout can truly shine, because of HDFS -- if the data is 
already pre-distributed to workers, the cost of IO can be shared. If, on the 
other hand, you'd need to read the data first, then distribute them further to 
nodes for processing, then a sequential algo will be faster.


> Your interests are not the same as, say, mine, as a user of the SVD
> for recs. A reconstruction with small error is good all else equal,
> but all else is not equal. The quality of my output does not scale
> proportionally with accuracy. Unfortunately what you suggest simply
> doesn't exist in a form I can use at scale, which is a big issue!


The form is to simply use more power iterations. It already exists, right 
there, in Mahout, bar the numerical issues I pointed out. It's not like I'm 
suggesting some crazy to-epsilon accuracy settings, but I doubt any application 
can be happy with a decomposition that most resembles a baseline of "return 
zero matrices" in quality.


> I don't think anyone pinged you to claim what's in the project now is
> even finished, let alone optimal. For example I'm not sure if you've
> seen the improvement Dmitriy has done in this area on a few open JIRA
> issues? 

Cool -- hence my plea for cc on progress.

Best of luck,
Radim

Re: SVD in Mahout (was: Mahout Lanczos SVD complexity)

Reply via email to