Hi Sean, good to hear from a user! :)
> That's great info. Do you have a distributed version of this? :) I was > actually hoping you would... Actually, yes. It's only small-scale though (a handful of company/lab computers, a few billion non-zeroes), to scratch my own itch. We never had hundreds or thousands of machines, the robustness requirements are quite different there. That's exactly why I say I'm interested in the Mahout experiments. Trust me, I get no kicks out of prodding other implementations, I just want to learn about their results, stories and pitfalls. Saying that "no evaluation needed, we are different" is not an acceptable answer doesn't mean that I'm upset. Also interesting: I'm not sure it's understood, but distributing SVD (the math) doesn't bring you anything here. The cost is dominated by the number of passes over input data, which is dominated by I/O (for such small values of `k`). This is an area where Mahout can truly shine, because of HDFS -- if the data is already pre-distributed to workers, the cost of IO can be shared. If, on the other hand, you'd need to read the data first, then distribute them further to nodes for processing, then a sequential algo will be faster. > Your interests are not the same as, say, mine, as a user of the SVD > for recs. A reconstruction with small error is good all else equal, > but all else is not equal. The quality of my output does not scale > proportionally with accuracy. Unfortunately what you suggest simply > doesn't exist in a form I can use at scale, which is a big issue! The form is to simply use more power iterations. It already exists, right there, in Mahout, bar the numerical issues I pointed out. It's not like I'm suggesting some crazy to-epsilon accuracy settings, but I doubt any application can be happy with a decomposition that most resembles a baseline of "return zero matrices" in quality. > I don't think anyone pinged you to claim what's in the project now is > even finished, let alone optimal. For example I'm not sure if you've > seen the improvement Dmitriy has done in this area on a few open JIRA > issues? Cool -- hence my plea for cc on progress. Best of luck, Radim
