Hi, I have seen in the video from Spark summit that usually (when I use HDFS) are data distributed across the whole cluster and usually computations goes to the data.
My question is how does it work when I read the data from Amazon S3? Is the whole input dataset readed by the master node and then distributed to the slave nodes? Or does master node only determine which slave should read what and then the reading is performed independently by each of the slaves? Thank you in advance for the clarification.
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org