Hello,

I'm facing a strange behavior regarding a larger data processing
pipeline consisting of multiple steps involving Spark core and GraphX.
Increasing the network transfer rate in the 5 node cluster from 100
Mbit/s to 1 Gbit/s the runtime also increases from around 15 minutes to
19 Minutes. This only holds for large input files. On small files the
faster transfer rate decreases the runtime by around one third.

I tested the network transfer rate by transmitting files from node to
node. On 100 Mbit/s I get 11,7 MByte/s and on 1 Gbit/s I get 67 MByte/s.
For that reason the network itself should not be the reason.

My question is. Does Spark and especially GraphX adapt its behavior to
the available network transfer rate? Does anybody have an idea how a
faster network could decrease the performance?

Thank you very much!

Kind regards,
Niklas Wilcke



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to