I am a beginner to Spark, having some simple questions regarding the use of RDD in python.
Suppose I have a matrix called data_matrix, I pass it to RDD using RDD_matrix = sc.parallelize(data_matrix) but I will have a problem if I want to know the dimension of the matrix in Spark, because Sparkk RDD does not know the Python (Numpy package) command "shape" In this case, how should I deal with it? In general, do I need to "translate" all my piece of Python code in RDD acceptable syntax, so that my Python program can run using Pyspark? Thanks in advance for any helps! Best Rui