Re: Reading files on a cluster / shared file system

2014-01-16 Thread Ognen Duzlevski
Makes sense. Thanks! Ognen On Thu, Jan 16, 2014 at 12:54 AM, Tathagata Das tathagata.das1...@gmail.com wrote: If you are running a distributed Spark cluster over the nodes, then the reading should be done in a distributed manner. If you give sc.textFile() a local path to a directory in the

Reading files on a cluster / shared file system

2014-01-15 Thread Ognen Duzlevski
On a cluster where the nodes and the master all have access to a shared filesystem/files - does spark read a file (like one resulting from sc.textFile()) in parallel/different sections on each node? Or is the file read on master in sequence and chunks processed on the nodes afterwards? Thanks!

Re: Reading files on a cluster / shared file system

2014-01-15 Thread Tathagata Das
If you are running a distributed Spark cluster over the nodes, then the reading should be done in a distributed manner. If you give sc.textFile() a local path to a directory in the shared file system, then each worker should read a subset of the files in directory by accessing them locally.