Hi Andrew, Thanks for your reply, I have another question about using HDFS, when running hdfs and the standalone mode on the same cluster, will the spark workers only read data on the same server to avoid transfering data over network.
Xiaobo gu ?? 2014??01??01?? 05:37:36 "andrew"<[email protected]> ?????? Hi Xiaobo, I would recommend putting the files into an HDFS cluster on the same machines instead if possible. ?If you're concerned about duplicating the data, you can set the replication factor to 1 so you don't use more space than before. In my experience of Spark around 0.7.0 or so, when reading from a local file with sc.textFile("file:///...") you had to have that file in that exact path on every Spark worker machine. Cheers, Andrew On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <[email protected]> wrote: Hi, We are going to deploy a standalone mode cluster, we know Spark can read local data files into RDDs, but the question is where should we put the data file, on the server where commit our application, or the server where the master service runs? Regards, Xiaobo Gu
