Markus, I'm guessing at your config which is probably one or more workers
running on the same node as your master, and the same node as your
spark-shell. Which is why you're expecting the workers to be able to read
the same relative path?
If that's the case, the reason it doesn't work out as you expect is because
the workers have different working directories.
Instead, this should work:
> val fileSizes = fileList.map(file => new
File("/the/absolute/path/to/spark/data/" + file).length)
This would also be interesting to you:
> val workingDirs = fileList.map(x => System.getProperty("user.dir"))
> workingDirs.collect
--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen
On Tue, Oct 15, 2013 at 11:17 AM, Markus Losoi <[email protected]>wrote:
> > You need to put your data files on a distributed file system (e.g.
> hdfs/s3) for the distributed spark
> > to work. Otherwise, the workers cannot read files from a single node.
>
> I'm not sure if it makes a difference, but the same files were in the same
> folder on both the master and the worker. However, next time I'll try with
> HDFS. In fact, I already tried it once but faced a Hadoop configuration
> problem. I should be able to solve this problem though with the plentiful
> amount of Hadoop related material available.
>
> > The reason first works is because for very short actions like first /
> take, Spark alternatively
> > launches the action on the master.
>
>