@Miles, the latest SVD implementation in mllib is partially distributed.
Matrix-vector multiplication is computed among all workers, but the right
singular vectors are all stored in the driver. If your symmetric matrix is
n x n and you want the first k eigenvalues, you will need to fit n x k
doubles in driver's memory. Behind the scene, it calls ARPACK to compute
eigen-decomposition of A^T A. You can look into the source code for the
details.

@Sean, the SVD++ implementation in graphx is not the canonical definition
of SVD. It doesn't have the orthogonality that SVD holds. But we might want
to use graphx as the underlying matrix representation for mllib.SVD to
address the problem of skewed entry distribution.


On Thu, Aug 7, 2014 at 10:51 AM, Evan R. Sparks <evan.spa...@gmail.com>
wrote:

> Reza Zadeh has contributed the distributed implementation of (Tall/Skinny)
> SVD (
> http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html),
> which is in MLlib (Spark 1.0) and a distributed sparse SVD coming in Spark
> 1.1. (https://issues.apache.org/jira/browse/SPARK-1782). If your data is
> sparse (which it often is in social networks), you may have better luck
> with this.
>
> I haven't tried the GraphX implementation, but those algorithms are often
> well-suited for power-law distributed graphs as you might see in social
> networks.
>
> FWIW, I believe you need to square elements of the sigma matrix from the
> SVD to get the eigenvalues.
>
>
>
>
> On Thu, Aug 7, 2014 at 10:20 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> (-incubator, +user)
>>
>> If your matrix is symmetric (and real I presume), and if my linear
>> algebra isn't too rusty, then its SVD is its eigendecomposition. The
>> SingularValueDecomposition object you get back has U and V, both of
>> which have columns that are the eigenvectors.
>>
>> There are a few SVDs in the Spark code. The one in mllib is not
>> distributed (right?) and is probably not an efficient means of
>> computing eigenvectors if you really just want a decomposition of a
>> symmetric matrix.
>>
>> The one I see in graphx is distributed? I haven't used it though.
>> Maybe it could be part of a solution.
>>
>>
>>
>> On Thu, Aug 7, 2014 at 2:21 PM, yaochunnan <yaochun...@gmail.com> wrote:
>> > Our lab need to do some simulation on online social networks. We need to
>> > handle a 5000*5000 adjacency matrix, namely, to get its largest
>> eigenvalue
>> > and corresponding eigenvector. Matlab can be used but it is
>> time-consuming.
>> > Is Spark effective in linear algebra calculations and transformations?
>> Later
>> > we would have 5000000*5000000 matrix processed. It seems emergent that
>> we
>> > should find some distributed computation platform.
>> >
>> > I see SVD has been implemented and I can get eigenvalues of a matrix
>> through
>> > this API.  But when I want to get both eigenvalues and eigenvectors or
>> at
>> > least the biggest eigenvalue and the corresponding eigenvector, it seems
>> > that current Spark doesn't have such API. Is it possible that I write
>> > eigenvalue decomposition from scratch? What should I do? Thanks a lot!
>> >
>> >
>> > Miles Yao
>> >
>> > ________________________________
>> > View this message in context: How can I implement eigenvalue
>> decomposition
>> > in Spark?
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 
Li
@vrilleup

Reply via email to