Hello, I'm facing a strange behavior regarding a larger data processing pipeline consisting of multiple steps involving Spark core and GraphX. Increasing the network transfer rate in the 5 node cluster from 100 Mbit/s to 1 Gbit/s the runtime also increases from around 15 minutes to 19 Minutes. This only holds for large input files. On small files the faster transfer rate decreases the runtime by around one third.
I tested the network transfer rate by transmitting files from node to node. On 100 Mbit/s I get 11,7 MByte/s and on 1 Gbit/s I get 67 MByte/s. For that reason the network itself should not be the reason. My question is. Does Spark and especially GraphX adapt its behavior to the available network transfer rate? Does anybody have an idea how a faster network could decrease the performance? Thank you very much! Kind regards, Niklas Wilcke --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org