You could transpose your matrix, so now you have a 1000,1000 x 100 matrix, so 
rows of the new matrix corresponds to columns of your old matrix

On 21 Dec, 2013, at 12:40 am, Aureliano Buendia <[email protected]> wrote:

> 
> 
> 
> On Fri, Dec 20, 2013 at 5:21 PM, Tom Vacek <[email protected]> wrote:
> If you use an RDD[Array[Double]] with a row decomposition of the matrix, you 
> can index windows of the rows all you want, but you're limited to 100 
> concurrent tasks.  You could use a column decomposition and access subsets of 
> the columns with a PartitionPruningRDD.  I have to say, though, if you're 
> doing dense matrix operations, they will be 100s of times faster on a shared 
> mem platform.  This particular matrix, at 800 MB could be a Breeze on a 
> single node.
>  
> The computation for every submatrix is very expensive, it takes days on a 
> single node. I was hoping this can be reduced to hours or minutes with spark.
> 
> Are you saying that spark is not suitable for this type of job?

Reply via email to