Hi Andrew,

Thanks for your reply, I have another question about using HDFS, when running 
hdfs and the standalone mode on the same cluster, will the spark workers only 
read data on the same server to avoid transfering data over network.

Xiaobo gu

?? 2014??01??01?? 05:37:36
"andrew"<[email protected]> ??????

Hi Xiaobo,

I would recommend putting the files into an HDFS cluster on the same machines 
instead if possible. ?If you're concerned about duplicating the data, you can 
set the replication factor to 1 so you don't use more space than before.
 

In my experience of Spark around 0.7.0 or so, when reading from a local file 
with sc.textFile("file:///...") you had to have that file in that exact path on 
every Spark worker machine.
  

Cheers,
Andrew



On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <[email protected]> wrote:
 Hi,


We are going to deploy a standalone mode cluster, we know Spark can read local 
data files into RDDs, but the question is where should we put the data file, on 
the server where commit our application, or the server where the master service 
runs?
 

Regards,


Xiaobo Gu

Reply via email to