I am a beginner to Spark, having some simple questions regarding the use of
RDD in python.

Suppose I have a matrix called data_matrix, I pass it to RDD using

RDD_matrix = sc.parallelize(data_matrix)

but I will have a problem if I want to know the dimension of the matrix in
Spark, because Sparkk RDD does not know the Python (Numpy package) command
"shape"

In this case, how should I deal with it?

In general, do I need to "translate" all my piece of Python code in RDD
acceptable syntax, so that my Python program can run using Pyspark?

Thanks in advance for any helps!

Best

Rui

Reply via email to