Dear Luca,
If I understand correctly, your approach is deflationary PCA that uses
the l1 prox to enforce sparsity.
I am not sure how this compares to the lars-based implementation of the
scikit (the non-convexity of the problem makes it hard to compare
algorithms).
Moreover, I have run your benchmark on a large, yet arguably small
dataset, with shape = (10000,200).
Two things happen:
- your implementation did not converge with the number of predefined
iterations
- it is slower than sparsePCA.
In [4]: %run pca.py
start
No Convergence. Error!!!
('time iterative SPCA=', 104.21303296089172, 'time SparsePCA=',
46.90740513801575)
finish!
I'm not sure what to conclude at the moment, but since there is a huge
amount of work to reach skl's standard in terms of quality, I would
focus first on the motivation for such an implementation:
In the relevant parameter space (n_samples, n_features, n_components)
what is the region in which the sparse nipals is expected to perform
better than lasso-based SparsePCA ? Note that there are probably
publications on that topic.
Best,
Bertrand
On 31/05/2014 01:03, Luca Puggini wrote:
Hi Bertrand,
I am not familiar with RandomizedPCA so I do not known if nipals is
faster than RandomizedPCA. It is for sure faster than SVD when we are
interested only in few components. My impression is that
RandomizedPCA is an approximation of PCA while nipals should be an
algorithm that theoretically converges to the same result of SVD.
Regarding SparsePCA I don't think that there is an equivalent of
RandomizedPCA in sklearn. Here http://justpaste.it/fobq I have
implemented the algorithm described in "Sparse PCA through Low-rank
Approximations". In the linked file is included a speed comparison
with SparsePCA. The value of the penalty alpha seems to have a
slightly different effect but the overall result is almost the same.
Let me know about that.
Best,
Luca
Dear Luca,
In terms of efficiency, do you think that nipals outperforms the
RandomizedPCA ? I'm not an expert in these methods, but it sounds like
they rely on similar tricks.
My suggestion would be to run a benchmark on some dataset of the
scikit to compare the accurcay/computation time tradeoffs.
Best,
Bertrand
On 28/05/2014 18:45, Luca Puggini wrote:
> Hi,
> I was looking to the PCA and SparsePCA implementation of sklearn.
> They are both based on SVD but I think that the nipals
implementation
> of the same algorithm can really increase the speed in some
situations.
>
> In particular with sparse PCA we usually use a small number of
> components and so its speed can be increased using nipals to compute
> the initial value of u,v (in the class dictionary learning).
>
> Always on this road there is a nipals like algorithm for Sparse
PCA. I
> have already written a python implementation for this and should not
> be a problem for me to integrate it with sklearn.
>
> Is this considered useful for the community or is this off topic?
> Are there others people already working on it?
>
> Thanks,
> Luca
>
------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general