Hi All, I would like to parallelize Python NumpyArray to apply scikit Learn algorithm on top of Spark. When I call *sc.parallelize() *I receive rdd of different structure.
To be more precise, I am trying to have the following, X = [[ 0.49426097 1.45106697] [-1.42808099 -0.83706377] [ 0.33855918 1.03875871] ..., [-0.05713876 -0.90926105] [-1.16939407 0.03959692] [ 0.26322951 -0.92649949]] However, what I get when I cal SC.parallelize(X) is the following [array([ 0.49426097, 1.45106697]), array([-1.42808099, -0.83706377])] Anyone tried this before ?