Hi All, Hope you are doing good.
We are using Spark MLLIB (1.4.1) PCA functionality for dimensionality reduction. So far we are able to condense n features into k features using https://spark.apache.org/docs/1.4.1/mllib-dimensionality-reduction.html#principal-component-analysis-pca The requirements, as per our data scientist , are as follows a) We need to find Varimax Rotation ( https://en.wikipedia.org/wiki/Varimax_rotation) for the data - I could not find anything on this in the documentation. Would anybody please help with this. b) We also need to find out what all features are getting clubbed together so that we can understand the feature condensation that is taking place. Is there a way to see this? E.g. we have a CSV with header as feature names, the header is dropped but preserved for later use. We move from n features to k in PCA i.e. n columns to k. What are those k columns made up of? How to find this out? c) Is there a way to preserve the primary key of the row that is getting dropped in the analysis. i.e. when preparing the feature vector the PK is dropped. (General Knowledge question :-)) Any help is appreciated. Thanks in Advance, ~BA