Re: spark1.0 principal component analysis

Sean Owen Thu, 10 Jul 2014 01:07:25 -0700

To clarify, you are looking for eigenvectors of what, the covariance
matrix? So for example you are looking for the sqrt of the eigenvalues when
you talk about stdev of components?

Looking at
https://github.com/apache/spark/blob/1f33e1f2013c508aa86511750f7bd8437154e51a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L462
it seems that the singular values from the SVD aren't returned, so I don't
know that you can access this directly.

You could emulate this approach directly though in your own code to access
them.

But the output is pretty straightforward, it's the principal components as
columns. If you have m rows of n-dimensional data, and ask for k principal
components, you get an n x k matrix, where the k columns are the
n-dimensional principal component vectors.

On Thu, Jul 10, 2014 at 1:46 AM, fintis <fin...@gmail.com> wrote:

> Hi,
>
> Can anyone please shed more light on the PCA  implementation in spark? The
> documentation is a bit leaving as I am not sure I understand the output.
> According to the docs, the output is a local matrix with the columns as
> principal components and columns sorted in descending order of covariance.
> This is a bit confusing for me as I need to compute other statistic Like
> standard deviation of the principal components. How do I match the
> principal
> components to the actual features since there is some sorting? How about
> eigenvectors and eigenvalues?
>
> Please anyone to help shed light on the output, how to use it further and
> pca spark implementation in general is appreciated
>
> Thank you in earnest
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: spark1.0 principal component analysis

Reply via email to