To clarify, you are looking for eigenvectors of what, the covariance matrix? So for example you are looking for the sqrt of the eigenvalues when you talk about stdev of components?
Looking at https://github.com/apache/spark/blob/1f33e1f2013c508aa86511750f7bd8437154e51a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L462 it seems that the singular values from the SVD aren't returned, so I don't know that you can access this directly. You could emulate this approach directly though in your own code to access them. But the output is pretty straightforward, it's the principal components as columns. If you have m rows of n-dimensional data, and ask for k principal components, you get an n x k matrix, where the k columns are the n-dimensional principal component vectors. On Thu, Jul 10, 2014 at 1:46 AM, fintis <fin...@gmail.com> wrote: > Hi, > > Can anyone please shed more light on the PCA implementation in spark? The > documentation is a bit leaving as I am not sure I understand the output. > According to the docs, the output is a local matrix with the columns as > principal components and columns sorted in descending order of covariance. > This is a bit confusing for me as I need to compute other statistic Like > standard deviation of the principal components. How do I match the > principal > components to the actual features since there is some sorting? How about > eigenvectors and eigenvalues? > > Please anyone to help shed light on the output, how to use it further and > pca spark implementation in general is appreciated > > Thank you in earnest > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >