Hi,

I'm experimenting with a Spark analytic on a 9-node cluster, and the Python 
version runs in about 5 minutes, whereas the Java version with all the same 
SparkContext configurations (and everything else being equal) takes 40+ minutes.

Does anyone know what may be causing this performance issue? What is pyspark 
doing differently?

Thanks,
Mike

Reply via email to