I am running a model where the workers should not have the data stored in them. They are only for execution purpose. The other cluster (its just a single node) which I am receiving data from is just acting as a file server, for which I could have used any other way like nfs or ftp. So I went with hdfs so that it would not have to worry about partitioning of data and also it does not effect my experiment. So I just had this question that does spark worker read all the data before computation once its first task start, and then distribute it among the workers memory or do they read it chunk by chunk, by each worker and then store the end result in memory to send the final result.
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org