2011/11/2 Radim Rehurek <[email protected]>: > Hi guys, > >> Od: Olivier Grisel <[email protected]> >> 2011/11/2 Stéfan van der Walt <[email protected]>: >> > Hi all, >> > >> > Maybe this paper, from the current issue from SIAM Journal on >> > Scientific Computing is of some interest: >> > >> > http://epubs.siam.org/sisc/resource/1/sjoce3/v33/i5/p2580_s1?view=print >> >> AFAIK, Radim Rehurek in CC has already implemented this algorithm in >> gensim. I will read the paper though. Thanks for the link. > > > yes, I implemented a version of this algo that runs streamed (no random > access to observations) and in O(mk) memory -- unlike the original Halko et > al. that requires O((m+n)k). > > If you decide to implement the randomized PCA, I can offer some observations: > > 1. oversampling does little, accuracy comes mostly from the extra power > iteration steps > 2. no power iterations result in miserable accuracy > 3. extra power iteration steps quickly lead to numerical overflows; but QR is > pretty fast, so in gensim, I orthonormalize the intermediate matrices H after > each power iteration step. That's exactly the same method that remark 3.3 > refers to.
Interesting. The current implementation in scikit-learn (which is neither streamed nor parallel) does quite a bit of oversampling (if k components are rrequired, 2 * k random vectors are used) and uses 3 power iterations by default but does not do qr inside the power iteration steps: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L346 https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/extmath.py#L126 I would be interesting to experiment with reducing the oversampling and using otthonormalization after each power iteration. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
