Re: How to use cluster for large set of linux files

Matei Zaharia Wed, 22 Jan 2014 12:40:40 -0800

Hi Manoj,

You’d have to make the files available at the same path on each machine through 
something like NFS. You don’t need to copy them, though that would also work.


Matei

On Jan 22, 2014, at 12:37 PM, Manoj Samel <[email protected]> wrote:

> I have a set of csv files that I want to read as a single RDD using a stand 
> alone cluster. 
> 
> These file reside on one machine right now. If I start a cluster with 
> multiple worker nodes, how do I use these worker nodes to read the files and 
> do the RDD computation ? Do I have to copy the files on every worker node ?
> 
> Assume that copying these into a HDFS is not a option for now ..
> 
> Thanks,

Re: How to use cluster for large set of linux files

Reply via email to