Thanks Matei. One thing I noticed after doing this and starting MASTER=spark://xxxx spark-shell is everything works , BUT the xxx.foreach(println) prints blank line. All other logic seems working. If I do a xx.count etc, I can see the value, just the println does not seems working
On Wed, Jan 22, 2014 at 12:39 PM, Matei Zaharia <[email protected]>wrote: > Hi Manoj, > > You’d have to make the files available at the same path on each machine > through something like NFS. You don’t need to copy them, though that would > also work. > > Matei > > On Jan 22, 2014, at 12:37 PM, Manoj Samel <[email protected]> > wrote: > > > I have a set of csv files that I want to read as a single RDD using a > stand alone cluster. > > > > These file reside on one machine right now. If I start a cluster with > multiple worker nodes, how do I use these worker nodes to read the files > and do the RDD computation ? Do I have to copy the files on every worker > node ? > > > > Assume that copying these into a HDFS is not a option for now .. > > > > Thanks, > >
