Re: SparkFiles.get() returns with driver path Instead of Worker Path

2016-03-08 Thread Tristan Nixon
Based on your code:

sparkContext.addFile("/home/files/data.txt");
List file =sparkContext.textFile(SparkFiles.get("data.txt")).collect();

I’m assuming the file in “/home/files/data.txt” exists and is readable in the 
driver’s filesystem.
Did you try just doing this:

List file =sparkContext.textFile("/home/files/data.txt").collect();

> On Mar 8, 2016, at 1:20 PM, Ashik Vetrivelu  wrote:
> 
> Hey, yeah I also tried by setting sc.textFile() with a local path and it 
> still throws the exception when trying to use collect().
> 
> Sorry I am new to spark and I am just messing around with it.
> 
> On Mar 8, 2016 10:23 PM, "Tristan Nixon"  > wrote:
> My understanding of the model is that you’re supposed to execute 
> SparkFiles.get(…) on each worker node, not on the driver.
> 
> Since you already know where the files are on the driver, if you want to load 
> these into an RDD with SparkContext.textFile, then this will distribute it 
> out to the workers, there’s no need to use SparkContext.add to do this.
> 
> If you have some functions that run on workers that expects local file 
> resources, then you can use SparkContext.addFile to distribute the files into 
> worker local storage, then you can execute SparkFiles.get separately on each 
> worker to retrieve these local files (it will give different paths on each 
> worker).
> 
> > On Mar 8, 2016, at 5:31 AM, ashikvc  > > wrote:
> >
> > I am trying to play a little bit with apache-spark cluster mode.
> > So my cluster consists of a driver in my machine and a worker and manager in
> > host machine(separate machine).
> >
> > I send a textfile using `sparkContext.addFile(filepath)` where the filepath
> > is the path of my text file in local machine for which I get the following
> > output:
> >
> >INFO Utils: Copying /home/files/data.txt to
> > /tmp/spark-b2e2bb22-487b-412b-831d-19d7aa96f275/userFiles-147c9552-1a77-427e-9b17-cb0845807860/data.txt
> >
> >INFO SparkContext: Added file /home/files/data.txt at
> > http://192.XX.XX.164:58143/files/data.txt 
> >  with timestamp 1457432207649
> >
> > But when I try to access the same file using `SparkFiles.get("data.txt")`, I
> > get the path to file in my driver instead of worker.
> > I am setting my file like this
> >
> >SparkConf conf = new
> > SparkConf().setAppName("spark-play").setMaster("spark://192.XX.XX.172:7077");
> >conf.setJars(new String[]{"jars/SparkWorker.jar"});
> >JavaSparkContext sparkContext = new JavaSparkContext(conf);
> >sparkContext.addFile("/home/files/data.txt");
> >List file
> > =sparkContext.textFile(SparkFiles.get("data.txt")).collect();
> > I am getting FileNotFoundException here.
> >
> >
> >
> >
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html
> >  
> > 
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> > 
> > For additional commands, e-mail: user-h...@spark.apache.org 
> > 
> >
> 



Re: SparkFiles.get() returns with driver path Instead of Worker Path

2016-03-08 Thread Tristan Nixon
My understanding of the model is that you’re supposed to execute 
SparkFiles.get(…) on each worker node, not on the driver.

Since you already know where the files are on the driver, if you want to load 
these into an RDD with SparkContext.textFile, then this will distribute it out 
to the workers, there’s no need to use SparkContext.add to do this.

If you have some functions that run on workers that expects local file 
resources, then you can use SparkContext.addFile to distribute the files into 
worker local storage, then you can execute SparkFiles.get separately on each 
worker to retrieve these local files (it will give different paths on each 
worker).

> On Mar 8, 2016, at 5:31 AM, ashikvc  wrote:
> 
> I am trying to play a little bit with apache-spark cluster mode.
> So my cluster consists of a driver in my machine and a worker and manager in
> host machine(separate machine).
> 
> I send a textfile using `sparkContext.addFile(filepath)` where the filepath
> is the path of my text file in local machine for which I get the following
> output:
> 
>INFO Utils: Copying /home/files/data.txt to
> /tmp/spark-b2e2bb22-487b-412b-831d-19d7aa96f275/userFiles-147c9552-1a77-427e-9b17-cb0845807860/data.txt
> 
>INFO SparkContext: Added file /home/files/data.txt at
> http://192.XX.XX.164:58143/files/data.txt with timestamp 1457432207649
> 
> But when I try to access the same file using `SparkFiles.get("data.txt")`, I
> get the path to file in my driver instead of worker.
> I am setting my file like this
> 
>SparkConf conf = new
> SparkConf().setAppName("spark-play").setMaster("spark://192.XX.XX.172:7077");
>conf.setJars(new String[]{"jars/SparkWorker.jar"});
>JavaSparkContext sparkContext = new JavaSparkContext(conf);
>sparkContext.addFile("/home/files/data.txt");
>List file
> =sparkContext.textFile(SparkFiles.get("data.txt")).collect();
> I am getting FileNotFoundException here.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SparkFiles.get() returns with driver path Instead of Worker Path

2016-03-08 Thread ashikvc
I am trying to play a little bit with apache-spark cluster mode.
So my cluster consists of a driver in my machine and a worker and manager in
host machine(separate machine).

I send a textfile using `sparkContext.addFile(filepath)` where the filepath
is the path of my text file in local machine for which I get the following
output:

INFO Utils: Copying /home/files/data.txt to
/tmp/spark-b2e2bb22-487b-412b-831d-19d7aa96f275/userFiles-147c9552-1a77-427e-9b17-cb0845807860/data.txt

INFO SparkContext: Added file /home/files/data.txt at
http://192.XX.XX.164:58143/files/data.txt with timestamp 1457432207649

But when I try to access the same file using `SparkFiles.get("data.txt")`, I
get the path to file in my driver instead of worker.
I am setting my file like this

SparkConf conf = new
SparkConf().setAppName("spark-play").setMaster("spark://192.XX.XX.172:7077");
conf.setJars(new String[]{"jars/SparkWorker.jar"});
JavaSparkContext sparkContext = new JavaSparkContext(conf);
sparkContext.addFile("/home/files/data.txt");
List file
=sparkContext.textFile(SparkFiles.get("data.txt")).collect();
I am getting FileNotFoundException here.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org