Hi, I'm experimenting with a Spark analytic on a 9-node cluster, and the Python version runs in about 5 minutes, whereas the Java version with all the same SparkContext configurations (and everything else being equal) takes 40+ minutes.
Does anyone know what may be causing this performance issue? What is pyspark doing differently? Thanks, Mike
