Re: [scikit-learn] Finding the PC that captures a specific variable

Mahmood Naderan Sun, 24 Jan 2021 12:39:48 -0800

Hi Olivier,
Thanks for the suggestion. The package seems to be handy. I will try that.



Regards,
Mahmood



On Sun, Jan 24, 2021 at 12:55 PM Oliver Tomic via scikit-learn
<scikit-learn@python.org> wrote:
>
> Hi Mahmood,
>
> the information you need is given by the individual explained variance for 
> each variable / feature. You get that information from the hoggorm package 
> (Python):
>
> https://github.com/olivertomic/hoggorm
> https://hoggorm.readthedocs.io/en/latest/index.html
>
> Here is one of the PCA examples provided in a Jupyter notebook:
> https://github.com/olivertomic/hoggorm/blob/master/examples/PCA/PCA_on_cancer_data.ipynb
>
>
> When you do PCA you get the information by calling for example:
>
> cumCalExplVar_individualVariable = model.X_cumCalExplVar() (which gives you 
> the cumulative calibrated explained variance for each variable, cell 21 in 
> the notebook)
>
> cumValExplVar_individualVariable = model.X_cumValExplVar_indVar() (which 
> gives you the cumulative validated explained variance variable, cell 30 in 
> the notebook)
>
>
> The component where you get the biggest jump for the variable of interest is 
> the component you are looking for.
>
> You could also have a look at the correlation loadings to identify the 
> component you are looking for.
>
> cheers
> Oliver
>
>
>
>
>
>
> ---- On Fri, 22 Jan 2021 21:48:46 +0100 Mahmood Naderan 
> <mahmood...@gmail.com> wrote ----
>
> Hi
> Thanks for the replies. I read about the available functions in the
> PCA section. Consider the following code
>
> x = StandardScaler().fit_transform(x)
> pca = PCA()
> principalComponents = pca.fit_transform(x)
> principalDf = pd.DataFrame(data = principalComponents)
> loadings = pca.components_
> finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 
> 1)
> print( "First and second observations\n", finalDf.loc[0:1] )
> print( "loadings[0:1]\n", loadings[0], loadings[1] )
> print ("explained_variance_ratio_\n",pca.explained_variance_ratio_)
>
>
> The output looks like
>
> First and second observations
> 0 1 2 3 4 kernel
> 0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1
> 1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2
> loadings[0:1]
> [0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375
> -0.01257726 0.29718078 0.07493325 0.07562934]
> explained_variance_ratio_
> [7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06]
>
>
>
> As you can see for two kernels named ELEC1 and ELEC2, there are five
> PCs from 0 to 4.
> Now based on the numbers in the loadings, I expect that loadings[0]
> which is the first variable is better shown on PC1-PC2 plane
> (0.49137412,0.46511098). However, loadings[1] which is the second
> variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078).
> Is this understanding correct?
>
> I don't understand what explained_variance_ratio_ is trying to say here.
>
>
> Regards,
> Mahmood
>
> On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <nio...@gmail.com> wrote:
> >
> > Hi Mahmood,
> >
> > There are different pieces of info that you can get from PCA:
> >
> > 1. How important is a given PC to reconstruct the entire dataset -> This
> > is given by explained_variance_ratio_ as Guillaume suggested
> >
> > 2. What is the contribution of each feature to each PC (remember that a
> > PC is a linear combination of all the features i.e.: PC_1 = X_1 .
> > alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what
> > you're looking for and they are given in the components_ matrix which is
> > a n_components x n_features matrix.
> >
> > Nicolas
> >
> > On 1/22/21 9:13 AM, Mahmood Naderan wrote:
> > > Hi
> > > I have a question about PCA and that is, how we can determine, a
> > > variable, X, is better captured by which factor (principal
> > > component)? For example, maybe one variable has low weight in the
> > > first PC but has a higher weight in the fifth PC.
> > >
> > > When I use the PCA from Scikit, I have to manually work with the PCs,
> > > therefore, I may miss the point that although a variable is weak in
> > > PC1-PC2 plot, it may be strong in PC4-PC5 plot.
> > >
> > > Any comment on that?
> > >
> > > Regards,
> > > Mahmood
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn@python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Finding the PC that captures a specific variable

Reply via email to