You could transpose your matrix, so now you have a 1000,1000 x 100 matrix, so rows of the new matrix corresponds to columns of your old matrix
On 21 Dec, 2013, at 12:40 am, Aureliano Buendia <[email protected]> wrote: > > > > On Fri, Dec 20, 2013 at 5:21 PM, Tom Vacek <[email protected]> wrote: > If you use an RDD[Array[Double]] with a row decomposition of the matrix, you > can index windows of the rows all you want, but you're limited to 100 > concurrent tasks. You could use a column decomposition and access subsets of > the columns with a PartitionPruningRDD. I have to say, though, if you're > doing dense matrix operations, they will be 100s of times faster on a shared > mem platform. This particular matrix, at 800 MB could be a Breeze on a > single node. > > The computation for every submatrix is very expensive, it takes days on a > single node. I was hoping this can be reduced to hours or minutes with spark. > > Are you saying that spark is not suitable for this type of job?
