Rahim, thank you for your candor attitude.
The estimate for method accuracy is already given in the study. I did also study accuracy on good decay and bad decay spectrums for in-memory datasets. I did not compare Frobenious norm but rather eigen values compared to other method and synthetic values. For good decay it's great, for non-existent decay it's bad. So that's a laugh check, and the rest is in the study (there's exact formula in there for accuracy error based on singular values decay and probability of such). I did the laugh check, I also did the R simulation and compared to it, and I think some other folks who worked on it did the same. But i don't think anyone had a purpose to re-validate the original study's estimates so much exactly. So that's the method, which trades off some accuracy esp. with bad decay spectrums for speed and ability to take on larger problems . Nobody ever argued with that. Nobody ever marketed it as an accurate method. In fact, I continuously warned users that they will not get good results with a bad decay spectrum (with the notion that why would they want to study randomized inputs at all pragmatically though). It is also first thing that is listed on the wiki in trade-offs list. That said, implementation is believed to be accurate to the method description. There are probably nuances in terms of what solver to use for projected problem in front end, whether to solve SVD of B' or eigen of BB', and whether to use Cholesky decomposition instead of QR, so that may induce some analysis later. We do simulate corner cases at small volumes and assert laugh check accuracy (even in unit tests). We don't do the same for bigger volumes that we can't verify with alternative solvers. That said, I feel we need to keep people fully informed, provide reference to original research work for those who want to dig deeper, and humbly hope that it will be useful for some pragmatical problems (and it is has been successfully used here and there). 2011/12/21 Radim Rehurek <[email protected]>: >> Od: Ted Dunning <[email protected]> >> Předmět: Re: SVD in Mahout (was: Mahout Lanczos SVD complexity) >> Datum: 19.12.2011 16:58:57 >> ---------------------------------------- >> The users of SVD on Mahout are more interested in their own application >> level metrics than the metrics associated with the decomposition itself. >> Moreover, at the scale people have been working with, getting any >> decomposition at all is refreshing. > > > Heh. Is that the official Mahout philosophy for other algorithms as well? > "Who cares about correctness, we are happy we can run it on some data at all, > so shut up?" I hope you're not serious Ted. > > Aren't you afraid people will draw wrong conclusions about an SVD > application, using your (possibly wildly inaccurate) SVD implementation? > Retract publications? > > By all means, use whatever decomposition suits you. But SVD already has a > well-established meaning in linear algebra and using that acronym comes with > certain expectations. People unfamiliar with the pitfalls of your > implementation may assume they're really getting SVD (or at least a version > that's "reasonably close" -- in the numerical computing sense). A big fat > accuracy warning is in order here. Nobody expects more or less random > vectors, even if these happen to perform better than the real truncated SVD > in your app [citation needed]. > >> The examples that you gave in your thread involved walking *way* down the >> spectrum to the smaller singular values. That is absolutely not the >> interest with most Mahout users because that would involve fairly massive >> over-fitting. > > > Too many opinions, too little data. Instead, I decided to run the English > wikipedia experiments with factors=10 and oversampling=5, as per your > concerns. > > (cross-posting to the gensim mailing list, as this might be of interest to > gensim users as well) > > Data: English Wikipedia as term-document matrix (0.42B non-zeroes, 3.5M > documents, 10K features). > Requesting top 10 factors (1% of the total singular value mass), not 500 > factors like before (15% total mass). Accuracy is evaluated by comparing > reconstruction error to Lapack's DSYEV in-core routine on A*A^T. Error = > |A*A^T-U*S^2*U^T| / |A*A^T-U_lapack*Sigma_lapack*U_lapack^T|. > > batch algo error > --------------+------ > baseline* 1.986 > 0 power iters 1.877 > 1 power iter 1.094 > 2 power iters 1.009 > 4 power iters 1.0005 > 6 power iters 1.00009 > > The results are completely in line with Martinsson et al.'s [1] as well as my > previous experiments: no power iteration steps with massive truncation = > rubbish output. Accuracy improves exponentially with increasing no. of > iteration steps (but see my initial warning re. numerical issues with higher > number of steps if implemented naively). > > So, your worry that the SVD inaccuracy is somehow due to asking too many > factors and irrelevant for thinner SVDs is without substance. Your users > certainly deserve to know that without power iterations, the SVD output is on > par with baseline, which is a "decomposition" where all factors are simply > zero. > > From all the dev replies here -- no users actually replied -- I get the vibe > that the accuracy discussion annoys you. Now, I dropped by to give a friendly > hint about possible serious accuracy concerns, based on experience with > mid-scale (billions of non-zeroes) SVD computations in one specific domain > (term-document matrices in NLP). And possibly learning about your issues on > tera-feature scale datasets in return, which I'm very interested in. > Apparently neither of us is getting anything out of this, so I'll stop here. > > Best, > Radim > > [1] http://www.sci.ccny.cuny.edu/~szlam/npisvdnipsshort.pdf > > > >> >> 2011/12/19 Radim Rehurek <[email protected]> >> >> > No problem. >> > >> > If you decide to run some SSVD accuracy experiments in the future (don't >> > your users ask for that?), please include me in cc -- just in case I miss >> > the post on this mailing list. >> > >> >> >>
