Yes it will.  This is called data locality and it works by matching the
hostname in Spark with the one in HDFS.


On Wed, Jan 1, 2014 at 2:40 AM, guxiaobo1982 <[email protected]> wrote:

> Hi Andrew,
>
>
> Thanks for your reply, I have another question about using HDFS, when running 
> hdfs and the standalone mode on the same cluster, will the spark workers only 
> read data on the same server to avoid transfering data over network.
>
> Xiaobo gu
>
> 在 2014年01月01日 05:37:36
> "andrew"<[email protected]> 写道:
>
> Hi Xiaobo,
>
> I would recommend putting the files into an HDFS cluster on the same
> machines instead if possible. ?If you're concerned about duplicating the
> data, you can set the replication factor to 1 so you don't use more space
> than before.
>
> In my experience of Spark around 0.7.0 or so, when reading from a local
> file with sc.textFile("file:///...") you had to have that file in that
> exact path on every Spark worker machine.
>
> Cheers,
> Andrew
>
>
> On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <[email protected]> wrote:
>
>> Hi,
>>
>> We are going to deploy a standalone mode cluster, we know Spark can read
>> local data files into RDDs, but the question is where should we put the
>> data file, on the server where commit our application, or the server where
>> the master service runs?
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>
>

Reply via email to