Hi Mahmood,
the information you need is given by the individual explained variance for each
variable / feature. You get that information from the hoggorm package (Python):
https://github.com/olivertomic/hoggorm
https://hoggorm.readthedocs.io/en/latest/index.html
Here is one of the PCA examples provided in a Jupyter notebook:
https://github.com/olivertomic/hoggorm/blob/master/examples/PCA/PCA_on_cancer_data.ipynb
When you do PCA you get the information by calling for example:
cumCalExplVar_individualVariable = model.X_cumCalExplVar() (which gives you the
cumulative calibrated explained variance for each variable, cell 21 in the
notebook)
cumValExplVar_individualVariable = model.X_cumValExplVar_indVar() (which gives
you the cumulative validated explained variance variable, cell 30 in the
notebook)
The component where you get the biggest jump for the variable of interest is
the component you are looking for.
You could also have a look at the correlation loadings to identify the
component you are looking for.
cheers
Oliver
---- On Fri, 22 Jan 2021 21:48:46 +0100 Mahmood Naderan <mahmood...@gmail.com>
wrote ----
Hi
Thanks for the replies. I read about the available functions in the
PCA section. Consider the following code
x = StandardScaler().fit_transform(x)
pca = PCA()
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents)
loadings = pca.components_
finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])],
1)
print( "First and second observations\n", finalDf.loc[0:1] )
print( "loadings[0:1]\n", loadings[0], loadings[1] )
print ("explained_variance_ratio_\n",pca.explained_variance_ratio_)
The output looks like
First and second observations
0 1 2 3 4 kernel
0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1
1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2
loadings[0:1]
[0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375
-0.01257726 0.29718078 0.07493325 0.07562934]
explained_variance_ratio_
[7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06]
As you can see for two kernels named ELEC1 and ELEC2, there are five
PCs from 0 to 4.
Now based on the numbers in the loadings, I expect that loadings[0]
which is the first variable is better shown on PC1-PC2 plane
(0.49137412,0.46511098). However, loadings[1] which is the second
variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078).
Is this understanding correct?
I don't understand what explained_variance_ratio_ is trying to say here.
Regards,
Mahmood
On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <mailto:nio...@gmail.com> wrote:
>
> Hi Mahmood,
>
> There are different pieces of info that you can get from PCA:
>
> 1. How important is a given PC to reconstruct the entire dataset -> This
> is given by explained_variance_ratio_ as Guillaume suggested
>
> 2. What is the contribution of each feature to each PC (remember that a
> PC is a linear combination of all the features i.e.: PC_1 = X_1 .
> alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what
> you're looking for and they are given in the components_ matrix which is
> a n_components x n_features matrix.
>
> Nicolas
>
> On 1/22/21 9:13 AM, Mahmood Naderan wrote:
> > Hi
> > I have a question about PCA and that is, how we can determine, a
> > variable, X, is better captured by which factor (principal
> > component)? For example, maybe one variable has low weight in the
> > first PC but has a higher weight in the fifth PC.
> >
> > When I use the PCA from Scikit, I have to manually work with the PCs,
> > therefore, I may miss the point that although a variable is weak in
> > PC1-PC2 plot, it may be strong in PC4-PC5 plot.
> >
> > Any comment on that?
> >
> > Regards,
> > Mahmood
> > _______________________________________________
> > scikit-learn mailing list
> > mailto:scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> mailto:scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
mailto:scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn