Hi,
Just to summarize the situation and to avoid confusion.
There are mainly two things where I was focusing my attention.
1 - Nipals PCA (
http://en.wikipedia.org/wiki/Principal_component_analysis#The_NIPALS_method
)
This is a good alternative to SVD and it is much faster in situations where
we have a lot of variables and we are interested only in a small number of
components.
This is a well known and tested algorithm and I was actually surprised when
I discovered that it is not in sklearn. (Maybe it has been replaced by a
faster alternative?)
2 - "Nipals" Sparse PCA. This is a more recent algorithm described in
this paper
http://www.sciencedirect.com/science/article/pii/S0047259X07000887
and it is very similar to nipals but with an L1 regularization.
This is much faster for high dimensional data problems.
Try to find the first 2 sparse principal components on a matrix of 2000
samples and 10000 variables.
So both Nipals and "sparse Nipals" are good in situations where we have a
large number of variables but we are interested only in a small number of
principal components.
There are very few cases where nipals does not converge. Maybe your data
was one of this or maybe this is due an error I discovered in the code.
The exit condition "if np.linalg.norm(v_new - v_old)<tol" should be
replaced by "if np.linalg.norm(v_new - v_old)/len(v_new)<tol"
Let me know about that.
Best,
Luca
Message: 3
> Date: Sat, 31 May 2014 22:19:42 +0200
> From: bthirion <bertrand.thir...@inria.fr>
> Subject: Re: [Scikit-learn-general] PCA nipals and SparsePCA
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <538a395e.6090...@inria.fr>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear Luca,
>
> If I understand correctly, your approach is deflationary PCA that uses
> the l1 prox to enforce sparsity.
> I am not sure how this compares to the lars-based implementation of the
> scikit (the non-convexity of the problem makes it hard to compare
> algorithms).
> Moreover, I have run your benchmark on a large, yet arguably small
> dataset, with shape = (10000,200).
> Two things happen:
> - your implementation did not converge with the number of predefined
> iterations
> - it is slower than sparsePCA.
>
> In [4]: %run pca.py
> start
> No Convergence. Error!!!
> ('time iterative SPCA=', 104.21303296089172, 'time SparsePCA=',
> 46.90740513801575)
> finish!
>
> I'm not sure what to conclude at the moment, but since there is a huge
> amount of work to reach skl's standard in terms of quality, I would
> focus first on the motivation for such an implementation:
> In the relevant parameter space (n_samples, n_features, n_components)
> what is the region in which the sparse nipals is expected to perform
> better than lasso-based SparsePCA ? Note that there are probably
> publications on that topic.
> Best,
>
> Bertrand
>
>
> On 31/05/2014 01:03, Luca Puggini wrote:
> > Hi Bertrand,
> > I am not familiar with RandomizedPCA so I do not known if nipals is
> > faster than RandomizedPCA. It is for sure faster than SVD when we are
> > interested only in few components. My impression is that
> > RandomizedPCA is an approximation of PCA while nipals should be an
> > algorithm that theoretically converges to the same result of SVD.
> >
> > Regarding SparsePCA I don't think that there is an equivalent of
> > RandomizedPCA in sklearn. Here http://justpaste.it/fobq I have
> > implemented the algorithm described in "Sparse PCA through Low-rank
> > Approximations". In the linked file is included a speed comparison
> > with SparsePCA. The value of the penalty alpha seems to have a
> > slightly different effect but the overall result is almost the same.
> >
> > Let me know about that.
> > Best,
> > Luca
> >
> >
> >
> >
> > Dear Luca,
> >
> >
> > In terms of efficiency, do you think that nipals outperforms
> the
> > RandomizedPCA ? I'm not an expert in these methods, but it sounds
> like
> > they rely on similar tricks.
> > My suggestion would be to run a benchmark on some dataset of
> the
> > scikit to compare the accurcay/computation time tradeoffs.
> > Best,
> >
> > Bertrand
> >
> > On 28/05/2014 18:45, Luca Puggini wrote:
> > > Hi,
> > > I was looking to the PCA and SparsePCA implementation of sklearn.
> > > They are both based on SVD but I think that the nipals
> > implementation
> > > of the same algorithm can really increase the speed in some
> > situations.
> > >
> > > In particular with sparse PCA we usually use a small number of
> > > components and so its speed can be increased using nipals to
> compute
> > > the initial value of u,v (in the class dictionary learning).
> > >
> > > Always on this road there is a nipals like algorithm for Sparse
> > PCA. I
> > > have already written a python implementation for this and should
> not
> > > be a problem for me to integrate it with sklearn.
> > >
> > > Is this considered useful for the community or is this off topic?
> > > Are there others people already working on it?
> > >
> > > Thanks,
> > > Luca
> > >
> >
> >
> >
>
------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general