Re: data localisation in spark

Sandy Ryza Sun, 31 May 2015 15:12:07 -0700

Hi Shushant,

Spark currently makes no effort to request executors based on data locality
(although it does try to schedule tasks within executors based on data
locality).  We're working on adding this capability at SPARK-4352
<https://issues.apache.org/jira/browse/SPARK-4352>.


-Sandy

On Sun, May 31, 2015 at 7:24 AM, Shushant Arora <shushantaror...@gmail.com>
wrote:

>
> I want to understand how  spark takes care of data localisation in cluster
> mode when run on YARN.
>
> 1.Driver program asks ResourceManager for executors. Does it tell yarn's
> RM to check HDFS blocks of input data and then allocate executors to it.
> And executors remain fixed throughout application or driver program asks
> for new executors when it submits another job in same application , since
> in spark new job is created for each action . If executors are fixed then
> for second job achieving data localisation is impossible?
>
>
>
> 2.When executors are done with their processing, does they are marked as
> free in ResourceManager's resoruce queue and  executors directly tell this
> to Rm  instead of via driver's ?
>
> Thanks
> Shushant
>

Re: data localisation in spark

Reply via email to