See the reference on shuffles <http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/programming-guide.html#shuffle-operations>, "Spark’s mechanism for re-distributing data so that it’s grouped differently across partitions. This typically involves copying data across executors and machines, making the shuffle a complex and costly operation."
On Thu, Sep 22, 2016 at 4:14 PM, Soumitra Johri < soumitra.siddha...@gmail.com> wrote: > If your job involves a shuffle then the compute for the entire batch will > increase with network latency. What would be interesting is to see how much > time each task/job/stage takes. > > On Thu, Sep 22, 2016 at 5:11 PM Peter Figliozzi <pete.figlio...@gmail.com> > wrote: > >> It seems to me they must communicate for joins, sorts, grouping, and so >> forth, where the original data partitioning needs to change. You could >> repeat your experiment for different code snippets. I'll bet it depends on >> what you do. >> >> On Thu, Sep 22, 2016 at 8:54 AM, gusiri <dreame...@gmail.com> wrote: >> >>> Hi, >>> >>> When I increase the network latency among spark nodes, >>> >>> I see compute time (=executor computing time in Spark Web UI) also >>> increases. >>> >>> In the graph attached, left = latency 1ms vs right = latency 500ms. >>> >>> Is there any communication between worker and driver/master even 'during' >>> executor computing? or any idea on this result? >>> >>> >>> <http://apache-spark-user-list.1001560.n3.nabble.com/ >>> file/n27779/Screen_Shot_2016-09-21_at_5.png> >>> >>> >>> >>> >>> >>> Thank you very much in advance. >>> >>> //gusiri >>> >>> >>> >>> >>> -- >>> View this message in context: http://apache-spark-user-list. >>> 1001560.n3.nabble.com/Is-executor-computing-time- >>> affected-by-network-latency-tp27779.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> >>