If you are running a distributed Spark cluster over the nodes, then the reading should be done in a distributed manner. If you give sc.textFile() a "local path" to a directory in the shared file system, then each worker should read a subset of the files in directory by accessing them locally. Nothing should be read on the master.
TD On Wed, Jan 15, 2014 at 3:56 PM, Ognen Duzlevski <[email protected]>wrote: > On a cluster where the nodes and the master all have access to a shared > filesystem/files - does spark read a file (like one resulting from > sc.textFile()) in parallel/different sections on each node? Or is the file > read on master in sequence and chunks processed on the nodes afterwards? > > Thanks! > Ognen >
