> You need to put your data files on a distributed file system (e.g. hdfs/s3) 
> for the distributed spark
> to work. Otherwise, the workers cannot read files from a single node.

I'm not sure if it makes a difference, but the same files were in the same 
folder on both the master and the worker. However, next time I'll try with 
HDFS. In fact, I already tried it once but faced a Hadoop configuration 
problem. I should be able to solve this problem though with the plentiful 
amount of Hadoop related material available.

> The reason first works is because for very short actions like first / take, 
> Spark alternatively
> launches the action on the master.

Reply via email to