> You need to put your data files on a distributed file system (e.g. hdfs/s3) > for the distributed spark > to work. Otherwise, the workers cannot read files from a single node.
I'm not sure if it makes a difference, but the same files were in the same folder on both the master and the worker. However, next time I'll try with HDFS. In fact, I already tried it once but faced a Hadoop configuration problem. I should be able to solve this problem though with the plentiful amount of Hadoop related material available. > The reason first works is because for very short actions like first / take, > Spark alternatively > launches the action on the master.
