Hi everybody, I am running a Spark job with multiple Map-Reduce iterations on a cluster with multi-core machines. Inside each machine, I observe variable performance, with some threads taking 20% more time than others (within the same machine). I checked that the input size is the same for all the threads, and the computation does not depend on the input values. The situation still happens if I run the job in a single machine. It seems to me a JVM issue, but I hope somebody already experienced it and give some help. Below, I post the example of one iteration, with the red bars being the duration of each task. The first group of long red bars are mapPartitions tasks running separate threads thread, while the following short lines are the reduce tasks. In the first group (long lines), the execution variability is clearly visible Has somebody already seen it? What might the cause be? Thanks everybody,
Alberto <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26298/anomaly.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Variable-performance-in-Spark-threads-tp26298.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org