On Fri, Dec 20, 2013 at 5:21 PM, Tom Vacek <[email protected]> wrote:

> If you use an RDD[Array[Double]] with a row decomposition of the matrix,
> you can index windows of the rows all you want, but you're limited to 100
> concurrent tasks.  You could use a column decomposition and access subsets
> of the columns with a PartitionPruningRDD.  I have to say, though, if
> you're doing dense matrix operations, they will be 100s of times faster on
> a shared mem platform.  This particular matrix, at 800 MB could be a Breeze
> on a single node.
>

The computation for every submatrix is very expensive, it takes days on a
single node. I was hoping this can be reduced to hours or minutes with
spark.

Are you saying that spark is not suitable for this type of job?

Reply via email to