Yes it will. This is called data locality and it works by matching the hostname in Spark with the one in HDFS.
On Wed, Jan 1, 2014 at 2:40 AM, guxiaobo1982 <[email protected]> wrote: > Hi Andrew, > > > Thanks for your reply, I have another question about using HDFS, when running > hdfs and the standalone mode on the same cluster, will the spark workers only > read data on the same server to avoid transfering data over network. > > Xiaobo gu > > 在 2014年01月01日 05:37:36 > "andrew"<[email protected]> 写道: > > Hi Xiaobo, > > I would recommend putting the files into an HDFS cluster on the same > machines instead if possible. ?If you're concerned about duplicating the > data, you can set the replication factor to 1 so you don't use more space > than before. > > In my experience of Spark around 0.7.0 or so, when reading from a local > file with sc.textFile("file:///...") you had to have that file in that > exact path on every Spark worker machine. > > Cheers, > Andrew > > > On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <[email protected]> wrote: > >> Hi, >> >> We are going to deploy a standalone mode cluster, we know Spark can read >> local data files into RDDs, but the question is where should we put the >> data file, on the server where commit our application, or the server where >> the master service runs? >> >> Regards, >> >> Xiaobo Gu >> > >
