Re: How to use cluster for large set of linux files

Manoj Samel Wed, 22 Jan 2014 12:58:16 -0800

Thanks Matei.

One thing I noticed after doing this and starting MASTER=spark://xxxx
spark-shell is everything works , BUT the xxx.foreach(println) prints blank
line. All other logic seems working. If I do a xx.count etc, I can see the
value, just the println does not seems working



On Wed, Jan 22, 2014 at 12:39 PM, Matei Zaharia <[email protected]>wrote:

> Hi Manoj,
>
> You’d have to make the files available at the same path on each machine
> through something like NFS. You don’t need to copy them, though that would
> also work.
>
> Matei
>
> On Jan 22, 2014, at 12:37 PM, Manoj Samel <[email protected]>
> wrote:
>
> > I have a set of csv files that I want to read as a single RDD using a
> stand alone cluster.
> >
> > These file reside on one machine right now. If I start a cluster with
> multiple worker nodes, how do I use these worker nodes to read the files
> and do the RDD computation ? Do I have to copy the files on every worker
> node ?
> >
> > Assume that copying these into a HDFS is not a option for now ..
> >
> > Thanks,
>
>

Re: How to use cluster for large set of linux files

Reply via email to