Hi all,

      I've been using scikit-learn's sparse PCA class to analyze some
data, and I'd like to characterize the amount of variance explained by
each component.  I've consulted off-list about this a bit, and from that
correspondence I understand that calculating explained variance for
sparse PCA is more complex than for non-sparse PCA because sparse PCA
ignores (or reduces the priority of) the orthogonality constraint of
non-sparse PCA.  However, the original Zou et al. 2006 sparse PCA paper
(http://www.tandfonline.com/doi/abs/10.1198/106186006X113430) indicates
that the problem is not intractable, and they offer a solution (Eq. 3.19).

      The suggestion that I received for calculating explained variance
would be implemented in sklearn as follows:

from sklearn.decomposition import SparsePCA
import numpy as np

np.random.seed(1)
X = np.random.randn(50,20)
spca = SparsePCA()
Xr = spca.fit_transform(X)
fro_comp0 = np.linalg.norm(np.outer(Xr[0], spca.components_[0]), 'fro')
fro_full = np.linalg.norm(X, 'fro')
var_exp0 = fro_comp0 ** 2. / fro_full ** 2.
print fro_comp0, fro_full, var_exp0

      This seems to be very much in line with the Zou et al. suggestion,
but my matrix algebra is not up to the task of rigorously evaluating the
implementation.  What do you think of this approach?  Many thanks for
any help!  Best,

Dave

--
David E. Warren
Associate
Department of Neurology
Carver College of Medicine
University of Iowa Hospitals and Clinics
david-e-war...@uiowa.edu



________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the 
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and 
may be legally privileged.  If you are not the intended recipient, you are 
hereby notified that any retention, dissemination, distribution, or copying of 
this communication is strictly prohibited.  Please reply to the sender that you 
have received the message in error, then delete it.  Thank you.
________________________________

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to