It will depends on the size of your matrix. If it can fit in memory, then you can
sparse = sparse_matrix(matrix) # sparse_matrix is the function you had written sc.parallelize(sparse, NUM_OF_PARTITIONS) On Tue, Jul 29, 2014 at 11:39 PM, Chengi Liu <chengi.liu...@gmail.com> wrote: > Hi, > I have an rdd with n rows and m columns... but most of them are 0 .. its > as sparse matrix.. > > I would like to only get the non zero entries with their index? > > Any equivalent python code would be > > for i,x in enumerate(matrix): > for j,y in enumerate(x): > if y: > print i,j,y > > Now, what I would like to do is save i,j,y entries? > How do I do this in pyspark. > Thanks > >