Hello, I am using PySpark to develop my big-data application. I have the impression that most of the execution of my application is spent on the infrastructure (distributing the code and the data in the cluster, IPC between the Python processes and the JVM) rather than on the computation itself. I would be interested in particular in measuring the time spent in the IPC between the Python processes and the JVM.
I would like to ask you, is there a way to breakdown the execution time in order to have more details on how much time is effectively spent on the different phases of the execution, so to have some kind of detailed profiling of the execution time, and have more information for fine-tuning the application? Thank you very much for your help and support, Luca -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-breakdown-application-execution-time-and-fine-tuning-tp25105.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org