Hi I have a test csv file and I have written a code to show the PCA for that. I also use another tool in Excel (XLSTAT) to compare the results. The XLSTAT automatically calculates the number of features, however, based on my understanding, I have to specify how many components are needed using the scikit package. For example, while XLSTAT shows 5 features:
Factor scores: F1 F2 F3 F4 F5 A1 -1.293 -0.663 -0.462 -0.713 0.010 A2 -0.297 0.293 -1.429 0.397 0.056 A3 2.328 0.069 0.987 -0.108 0.062 A4 -0.556 -2.273 0.538 0.344 -0.032 A5 1.823 0.775 -0.597 -0.052 -0.085 A6 -2.005 1.799 0.963 0.133 -0.011 In the following code, I specified 2 components: x = StandardScaler().fit_transform(x) pca = PCA(n_components=2) principalComponents = pca.fit_transform(x) print( principalComponents ) [[-1.29292842 0.66325508] [-0.29706395 -0.29346337] [ 2.32751305 -0.06850045] [-0.5558091 2.27288988] [ 1.82312052 -0.77527304] [-2.0048321 -1.7989081 ]] As you can see, the first column in XLSTAT and scikit are the same. However, the second columns are negated. For example, considering F1 and F2, we see XLSTAT => -1.293 -0.663 scikit => [-1.29292842 0.66325508] So, my questions are 1) Isn't there any way to use scikit for an unknown number of principal components? So that I can query the number of principal components and use a scree plot then. 2) Considering the F1 and F2 as a XY scatter point, I want to know why the value of Y in XLSTAT and scikit are opposite? The code which I write is available at https://pastebin.com/ghJQ6L4C Any idea? Regards, Mahmood
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn