Hi all,
I've been using scikit-learn's sparse PCA class to analyze some
data, and I'd like to characterize the amount of variance explained by
each component. I've consulted off-list about this a bit, and from that
correspondence I understand that calculating explained variance for
sparse PCA is more complex than for non-sparse PCA because sparse PCA
ignores (or reduces the priority of) the orthogonality constraint of
non-sparse PCA. However, the original Zou et al. 2006 sparse PCA paper
(http://www.tandfonline.com/doi/abs/10.1198/106186006X113430) indicates
that the problem is not intractable, and they offer a solution (Eq. 3.19).
The suggestion that I received for calculating explained variance
would be implemented in sklearn as follows:
from sklearn.decomposition import SparsePCA
import numpy as np
np.random.seed(1)
X = np.random.randn(50,20)
spca = SparsePCA()
Xr = spca.fit_transform(X)
fro_comp0 = np.linalg.norm(np.outer(Xr[0], spca.components_[0]), 'fro')
fro_full = np.linalg.norm(X, 'fro')
var_exp0 = fro_comp0 ** 2. / fro_full ** 2.
print fro_comp0, fro_full, var_exp0
This seems to be very much in line with the Zou et al. suggestion,
but my matrix algebra is not up to the task of rigorously evaluating the
implementation. What do you think of this approach? Many thanks for
any help! Best,
Dave
--
David E. Warren
Associate
Department of Neurology
Carver College of Medicine
University of Iowa Hospitals and Clinics
[email protected]
________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and
may be legally privileged. If you are not the intended recipient, you are
hereby notified that any retention, dissemination, distribution, or copying of
this communication is strictly prohibited. Please reply to the sender that you
have received the message in error, then delete it. Thank you.
________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general