Radim, I looked at the discussion that you referenced and have a few reservations about your conclusions.
You are correct that the randomized projection methods are of limited accuracy when the singular values are not decreasing rapidly, but I think that you were asking for too many singular values (500) from a relatively small matrix to make much sense. My argument for saying this is that the singular values at this point have become quite small and are likely to be modeling the randomized variation in your corpus rather than the systematic patterns of occurrence. As such, the "accuracy" of the decomposition is of little interest. It may well be that using a high number of singular values increases apparent accuracy in your application over using a smaller number, but it is likely that augmenting a small decomposition with random vectors instead of true singular vectors would provide virtually identical performance. As such, computing these extra singular vectors more or less accuracy is hardly of the essence. In order to determine whether this is the case, the interesting comparison to make is whether you get higher application accuracy on held out data from, say, 100 singular vectors + 400 random vectors or your 500 singular vectors. It is common with LSI-like applications to test 100 singular vectors against 500 singular vectors, but does not really measure how much information is being extracted in the singular decomposition. Regarding the accuracy of the decomposition, your choice of k is well below the bound suggested by the theoretical considerations in the Halko, et al, paper. For small matrices with the size measured in thousands or tens of thousands and very tight desired accuracies, this bound is not all that helpful and it isn't surprising that you may have some problems. In any, case I think that your experience is divergent enough from the applications being considered in Mahout, that conclusions should be drawn with some caution. On Sat, Dec 17, 2011 at 10:00 PM, Radim Rehurek <[email protected]>wrote: > Ted Dunning wrote: > > Also, it isn't entirely clear yet whether power iterations are more > > efficient than simply increasing the fudge factor p. Power iterations are > > very effective, and increasing p increases costs in the cube, but running > > MR passes is expensive enough that some increase in p might be sufficient > > and still faster than a power iteration. > > Don't even think about it -- on large matrices (significant truncation), > the randomized SVD without power iterations is complete rubbish, regardless > of any sane value of `p`. > > (assuming your fudge factor p = oversampling parameter `p` from Halko et > al.). > > You can see some accuracy experiments (wikipedia, images) here: > http://groups.google.com/group/gensim/msg/1f3106b85837ce9c and here > http://groups.google.com/group/gensim/msg/240a348b70f18b30 . > > Also see the end of that (longish) thread for some notes on accuracy in > face of many power iterations (~ beware of numeric overflows). > > Best, > Radim >
