Re: [scikit-learn] Finding the PC that captures a specific variable

Oliver Tomic via scikit-learn Sun, 24 Jan 2021 03:54:56 -0800

Hi Mahmood,



the information you need is given by the individual explained variance for each 
variable / feature. You get that information from the hoggorm package (Python):



https://github.com/olivertomic/hoggorm

https://hoggorm.readthedocs.io/en/latest/index.html 



Here is one of the PCA examples provided in a Jupyter notebook:

https://github.com/olivertomic/hoggorm/blob/master/examples/PCA/PCA_on_cancer_data.ipynb





When you do PCA you get the information by calling for example:



cumCalExplVar_individualVariable = model.X_cumCalExplVar() (which gives you the 
cumulative calibrated explained variance for each variable, cell 21 in the 
notebook)



cumValExplVar_individualVariable = model.X_cumValExplVar_indVar() (which gives 
you the cumulative validated explained variance variable, cell 30 in the 
notebook)



The component where you get the biggest jump for the variable of interest is 
the component you are looking for. 



You could also have a look at the correlation loadings to identify the 
component you are looking for. 


cheers
Oliver













---- On Fri, 22 Jan 2021 21:48:46 +0100 Mahmood Naderan <[email protected]> 
wrote ----



Hi 
Thanks for the replies. I read about the available functions in the 
PCA section. Consider the following code 
 
x = StandardScaler().fit_transform(x) 
pca = PCA() 
principalComponents = pca.fit_transform(x) 
principalDf = pd.DataFrame(data = principalComponents) 
loadings = pca.components_ 
finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 
1) 
print( "First and second observations\n", finalDf.loc[0:1] ) 
print( "loadings[0:1]\n", loadings[0], loadings[1] ) 
print ("explained_variance_ratio_\n",pca.explained_variance_ratio_) 
 
 
The output looks like 
 
First and second observations 
0 1 2 3 4 kernel 
0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1 
1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2 
loadings[0:1] 
[0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375 
-0.01257726 0.29718078 0.07493325 0.07562934] 
explained_variance_ratio_ 
[7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06] 
 
 
 
As you can see for two kernels named ELEC1 and ELEC2, there are five 
PCs from 0 to 4. 
Now based on the numbers in the loadings, I expect that loadings[0] 
which is the first variable is better shown on PC1-PC2 plane 
(0.49137412,0.46511098). However, loadings[1] which is the second 
variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078). 
Is this understanding correct? 
 
I don't understand what explained_variance_ratio_ is trying to say here. 
 
 
Regards, 
Mahmood 
 
On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <mailto:[email protected]> wrote: 
> 
> Hi Mahmood, 
> 
> There are different pieces of info that you can get from PCA: 
> 
> 1. How important is a given PC to reconstruct the entire dataset -> This 
> is given by explained_variance_ratio_ as Guillaume suggested 
> 
> 2. What is the contribution of each feature to each PC (remember that a 
> PC is a linear combination of all the features i.e.: PC_1 = X_1 . 
> alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what 
> you're looking for and they are given in the components_ matrix which is 
> a n_components x n_features matrix. 
> 
> Nicolas 
> 
> On 1/22/21 9:13 AM, Mahmood Naderan wrote: 
> > Hi 
> > I have a question about PCA and that is, how we can determine, a 
> > variable, X,  is better captured by which factor (principal 
> > component)? For example, maybe one variable has low weight in the 
> > first PC but has a higher weight in the fifth PC. 
> > 
> > When I use the PCA from Scikit, I have to manually work with the PCs, 
> > therefore, I may miss the point that although a variable is weak in 
> > PC1-PC2 plot, it may be strong in PC4-PC5 plot. 
> > 
> > Any comment on that? 
> > 
> > Regards, 
> > Mahmood 
> > _______________________________________________ 
> > scikit-learn mailing list 
> > mailto:[email protected] 
> > https://mail.python.org/mailman/listinfo/scikit-learn 
> _______________________________________________ 
> scikit-learn mailing list 
> mailto:[email protected] 
> https://mail.python.org/mailman/listinfo/scikit-learn 
_______________________________________________ 
scikit-learn mailing list 
mailto:[email protected] 
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Finding the PC that captures a specific variable

Reply via email to