On Fri, Dec 20, 2013 at 5:21 PM, Tom Vacek <[email protected]> wrote:
> If you use an RDD[Array[Double]] with a row decomposition of the matrix, > you can index windows of the rows all you want, but you're limited to 100 > concurrent tasks. You could use a column decomposition and access subsets > of the columns with a PartitionPruningRDD. I have to say, though, if > you're doing dense matrix operations, they will be 100s of times faster on > a shared mem platform. This particular matrix, at 800 MB could be a Breeze > on a single node. > The computation for every submatrix is very expensive, it takes days on a single node. I was hoping this can be reduced to hours or minutes with spark. Are you saying that spark is not suitable for this type of job?
