Hi Thanks for the replies. I read about the available functions in the PCA section. Consider the following code
x = StandardScaler().fit_transform(x) pca = PCA() principalComponents = pca.fit_transform(x) principalDf = pd.DataFrame(data = principalComponents) loadings = pca.components_ finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1) print( "First and second observations\n", finalDf.loc[0:1] ) print( "loadings[0:1]\n", loadings[0], loadings[1] ) print ("explained_variance_ratio_\n",pca.explained_variance_ratio_) The output looks like First and second observations 0 1 2 3 4 kernel 0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1 1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2 loadings[0:1] [0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375 -0.01257726 0.29718078 0.07493325 0.07562934] explained_variance_ratio_ [7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06] As you can see for two kernels named ELEC1 and ELEC2, there are five PCs from 0 to 4. Now based on the numbers in the loadings, I expect that loadings[0] which is the first variable is better shown on PC1-PC2 plane (0.49137412,0.46511098). However, loadings[1] which is the second variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078). Is this understanding correct? I don't understand what explained_variance_ratio_ is trying to say here. Regards, Mahmood On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <nio...@gmail.com> wrote: > > Hi Mahmood, > > There are different pieces of info that you can get from PCA: > > 1. How important is a given PC to reconstruct the entire dataset -> This > is given by explained_variance_ratio_ as Guillaume suggested > > 2. What is the contribution of each feature to each PC (remember that a > PC is a linear combination of all the features i.e.: PC_1 = X_1 . > alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what > you're looking for and they are given in the components_ matrix which is > a n_components x n_features matrix. > > Nicolas > > On 1/22/21 9:13 AM, Mahmood Naderan wrote: > > Hi > > I have a question about PCA and that is, how we can determine, a > > variable, X, is better captured by which factor (principal > > component)? For example, maybe one variable has low weight in the > > first PC but has a higher weight in the fifth PC. > > > > When I use the PCA from Scikit, I have to manually work with the PCs, > > therefore, I may miss the point that although a variable is weak in > > PC1-PC2 plot, it may be strong in PC4-PC5 plot. > > > > Any comment on that? > > > > Regards, > > Mahmood > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn