If you use an RDD[Array[Double]] with a row decomposition of the matrix,
you can index windows of the rows all you want, but you're limited to 100
concurrent tasks.  You could use a column decomposition and access subsets
of the columns with a PartitionPruningRDD.  I have to say, though, if
you're doing dense matrix operations, they will be 100s of times faster on
a shared mem platform.  This particular matrix, at 800 MB could be a Breeze
on a single node.


On Fri, Dec 20, 2013 at 9:40 AM, Aureliano Buendia <[email protected]>wrote:

> Hi,
>
> I have a 100 x 1000,000 matrix of double value, and I want to perform
> distributed computing on
> a 'window' of 100 x 50, where the window starts at each column. That is,
> each task must have access to columns j to j+50.
>
> Spark examples only come with accessing a single row per task. Is it
> possible to have access to a small part of the matrix?
>

Reply via email to