It will depends on the size of your matrix. If it can fit in memory,
then you can

sparse = sparse_matrix(matrix) # sparse_matrix is the function you had written
sc.parallelize(sparse, NUM_OF_PARTITIONS)

On Tue, Jul 29, 2014 at 11:39 PM, Chengi Liu <chengi.liu...@gmail.com> wrote:
> Hi,
>     I have an rdd with n rows and m columns... but most of them are 0 .. its
> as sparse matrix..
>
> I would like to only get the non zero entries with their index?
>
> Any equivalent python code would be
>
> for i,x in enumerate(matrix):
>    for j,y in enumerate(x):
>         if y:
>            print i,j,y
>
> Now, what I would like to do is save i,j,y entries?
> How do I do this in pyspark.
> Thanks
>
>

Reply via email to