Re: AM creation in yarn client mode

2016-02-10 Thread Steve Loughran

On 10 Feb 2016, at 13:20, Manoj Awasthi 
> wrote:



On Wed, Feb 10, 2016 at 5:20 PM, Steve Loughran 
> wrote:

On 10 Feb 2016, at 04:42, praveen S 
> wrote:


Hi,

I have 2 questions when running the spark jobs on yarn in client mode :

1) Where is the AM(application master) created :


in the cluster



A) is it created on the client where the job was submitted? i.e driver and AM 
on the same client?

no


Or B) yarn decides where the the AM should be created?

yes

2) Driver and AM run in different processes : is my assumption correct?


yes. the driver runs on your local system, which had better be close to the 
cluster and stay up for the duration of the work

This is not correct. In yarn-cluster mode driver is what runs inside the 
application master and the node on which application master gets allocated is 
decided by yarn.

I agree


In yarn-client mode, there is no application master and driver runs in the 
context of the same unix process as the spark-submit.


I must beg to differ


There is an AM. All YARN apps need an AM, as it is the only way you can get 
containers to run your work. And, except in the special case of an "Unmanaged 
AM", the AM runs in the cluster.

In spark, an AM is always set up by the YARN client and submitted to the cluster

https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L116

The big difference is where that driver lives. In --cluster, its in the AM. in 
--client, it's in the client

here's the AM making its decision

https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L187

-Steve

(why yes, I have spent too much time staring at AM logs)




Re: AM creation in yarn client mode

2016-02-10 Thread Manoj Awasthi
My pardon to writing that "there is no AM". I realize it! :-) :-)


On Wed, Feb 10, 2016 at 7:14 PM, Steve Loughran 
wrote:

>
> On 10 Feb 2016, at 13:20, Manoj Awasthi  wrote:
>
>
>
> On Wed, Feb 10, 2016 at 5:20 PM, Steve Loughran 
> wrote:
>
>>
>> On 10 Feb 2016, at 04:42, praveen S  wrote:
>>
>> Hi,
>>
>> I have 2 questions when running the spark jobs on yarn in client mode :
>>
>> 1) Where is the AM(application master) created :
>>
>>
>> in the cluster
>>
>>
>> A) is it created on the client where the job was submitted? i.e driver
>> and AM on the same client?
>>
>> no
>>
>> Or B) yarn decides where the the AM should be created?
>>
>> yes
>>
>> 2) Driver and AM run in different processes : is my assumption correct?
>>
>>
>> yes. the driver runs on your local system, which had better be close to
>> the cluster and stay up for the duration of the work
>>
>
> This is not correct. In yarn-cluster mode driver is what runs inside the
> application master and the node on which application master gets allocated
> is decided by yarn.
>
>
> I agree
>
>
> In yarn-client mode, there is no application master and driver runs in the
> context of the same unix process as the spark-submit.
>
>
>
> I must beg to differ
>
>
> There is an AM. All YARN apps need an AM, as it is the only way you can
> get containers to run your work. And, except in the special case of an
> "Unmanaged AM", the AM runs in the cluster.
>
> In spark, an AM is always set up by the YARN client and submitted to the
> cluster
>
>
> https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L116
>
> The big difference is where that driver lives. In --cluster, its in the
> AM. in --client, it's in the client
>
> here's the AM making its decision
>
>
> https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L187
>
> -Steve
>
> (why yes, I have spent too much time staring at AM logs)
>
>
>


Re: AM creation in yarn client mode

2016-02-10 Thread Steve Loughran

On 10 Feb 2016, at 14:18, Manoj Awasthi 
> wrote:


My pardon to writing that "there is no AM". I realize it! :-) :-)


There is the unmanaged AM option, which was originally written for debugging, 
but has been used in various apps.

Spark doesn't do it; you'd host the unmanaged AM in the client app, alongside 
the driver. it'd talk the AMRM protocol to the YARN resource manager for 
containers and things.

I don't think you'd gain much; you'd still be vulnerable to failure of 
client/loss of connectivity. All it'd be doing is adding a third deployment 
mode to test and maintain



On Wed, Feb 10, 2016 at 7:14 PM, Steve Loughran 
> wrote:

On 10 Feb 2016, at 13:20, Manoj Awasthi 
> wrote:



On Wed, Feb 10, 2016 at 5:20 PM, Steve Loughran 
> wrote:

On 10 Feb 2016, at 04:42, praveen S 
> wrote:


Hi,

I have 2 questions when running the spark jobs on yarn in client mode :

1) Where is the AM(application master) created :


in the cluster



A) is it created on the client where the job was submitted? i.e driver and AM 
on the same client?

no


Or B) yarn decides where the the AM should be created?

yes

2) Driver and AM run in different processes : is my assumption correct?


yes. the driver runs on your local system, which had better be close to the 
cluster and stay up for the duration of the work

This is not correct. In yarn-cluster mode driver is what runs inside the 
application master and the node on which application master gets allocated is 
decided by yarn.

I agree


In yarn-client mode, there is no application master and driver runs in the 
context of the same unix process as the spark-submit.


I must beg to differ


There is an AM. All YARN apps need an AM, as it is the only way you can get 
containers to run your work. And, except in the special case of an "Unmanaged 
AM", the AM runs in the cluster.

In spark, an AM is always set up by the YARN client and submitted to the cluster

https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L116

The big difference is where that driver lives. In --cluster, its in the AM. in 
--client, it's in the client

here's the AM making its decision

https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L187

-Steve

(why yes, I have spent too much time staring at AM logs)






Re: AM creation in yarn client mode

2016-02-10 Thread Steve Loughran

On 10 Feb 2016, at 04:42, praveen S 
> wrote:


Hi,

I have 2 questions when running the spark jobs on yarn in client mode :

1) Where is the AM(application master) created :


in the cluster



A) is it created on the client where the job was submitted? i.e driver and AM 
on the same client?

no


Or B) yarn decides where the the AM should be created?

yes

2) Driver and AM run in different processes : is my assumption correct?


yes. the driver runs on your local system, which had better be close to the 
cluster and stay up for the duration of the work

Regards,
Praveen



Re: AM creation in yarn client mode

2016-02-10 Thread Manoj Awasthi
On Wed, Feb 10, 2016 at 5:20 PM, Steve Loughran 
wrote:

>
> On 10 Feb 2016, at 04:42, praveen S  wrote:
>
> Hi,
>
> I have 2 questions when running the spark jobs on yarn in client mode :
>
> 1) Where is the AM(application master) created :
>
>
> in the cluster
>
>
> A) is it created on the client where the job was submitted? i.e driver and
> AM on the same client?
>
> no
>
> Or B) yarn decides where the the AM should be created?
>
> yes
>
> 2) Driver and AM run in different processes : is my assumption correct?
>
>
> yes. the driver runs on your local system, which had better be close to
> the cluster and stay up for the duration of the work
>

This is not correct. In yarn-cluster mode driver is what runs inside the
application master and the node on which application master gets allocated
is decided by yarn. In yarn-client mode, there is no application master and
driver runs in the context of the same unix process as the spark-submit.


> Regards,
> Praveen
>
>
>


AM creation in yarn client mode

2016-02-09 Thread praveen S
Hi,

I have 2 questions when running the spark jobs on yarn in client mode :

1) Where is the AM(application master) created :

A) is it created on the client where the job was submitted? i.e driver and
AM on the same client?
Or
B) yarn decides where the the AM should be created?

2) Driver and AM run in different processes : is my assumption correct?

Regards,
Praveen


Re: AM creation in yarn client mode

2016-02-09 Thread ayan guha
It depends on yarn-cluster and yarn-client mode.

On Wed, Feb 10, 2016 at 3:42 PM, praveen S  wrote:

> Hi,
>
> I have 2 questions when running the spark jobs on yarn in client mode :
>
> 1) Where is the AM(application master) created :
>
> A) is it created on the client where the job was submitted? i.e driver and
> AM on the same client?
> Or
> B) yarn decides where the the AM should be created?
>
> 2) Driver and AM run in different processes : is my assumption correct?
>
> Regards,
> Praveen
>



-- 
Best Regards,
Ayan Guha


Re: AM creation in yarn-client mode

2016-02-09 Thread Alexander Pivovarov
the pictures to illustrate it
http://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_running_spark_on_yarn.html

On Tue, Feb 9, 2016 at 10:18 PM, Jonathan Kelly 
wrote:

> In yarn-client mode, the driver is separate from the AM. The AM is created
> in YARN, and YARN controls where it goes (though you can somewhat control
> it using YARN node labels--I just learned earlier today in a different
> thread on this list that this can be controlled by
> spark.yarn.am.labelExpression). Then what I understand is that the driver
> talks to the AM in order to request additional YARN containers in which to
> run executors.
>
> In yarn-cluster mode, the SparkSubmit process outside of the cluster
> creates the AM in YARN, and then what I understand is that the AM *becomes*
> the driver (by invoking the driver's main method), and then it requests the
> executor containers.
>
> So yes, one difference between yarn-client and yarn-cluster mode is that
> in yarn-client mode the driver and AM are separate, whereas they are the
> same in yarn-cluster.
>
> ~ Jonathan
>
> On Tue, Feb 9, 2016 at 9:57 PM praveen S  wrote:
>
>> Can you explain what happens in yarn client mode?
>>
>> Regards,
>> Praveen
>> On 10 Feb 2016 10:55, "ayan guha"  wrote:
>>
>>> It depends on yarn-cluster and yarn-client mode.
>>>
>>> On Wed, Feb 10, 2016 at 3:42 PM, praveen S  wrote:
>>>
 Hi,

 I have 2 questions when running the spark jobs on yarn in client mode :

 1) Where is the AM(application master) created :

 A) is it created on the client where the job was submitted? i.e driver
 and AM on the same client?
 Or
 B) yarn decides where the the AM should be created?

 2) Driver and AM run in different processes : is my assumption correct?

 Regards,
 Praveen

>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>


Re: AM creation in yarn client mode

2016-02-09 Thread Diwakar Dhanuskodi
Your  2nd assumption  is  correct .
There  is  yarn client  which  polls AM while  running  in  yarn client mode 

Sent from Samsung Mobile.

 Original message From: ayan guha 
<guha.a...@gmail.com> Date:10/02/2016  10:55  (GMT+05:30) 
To: praveen S <mylogi...@gmail.com> Cc: user 
<user@spark.apache.org> Subject: Re: AM creation in yarn client mode 

It depends on yarn-cluster and yarn-client mode. 

On Wed, Feb 10, 2016 at 3:42 PM, praveen S <mylogi...@gmail.com> wrote:
Hi,

I have 2 questions when running the spark jobs on yarn in client mode :

1) Where is the AM(application master) created :

A) is it created on the client where the job was submitted? i.e driver and AM 
on the same client? 
Or 
B) yarn decides where the the AM should be created?

2) Driver and AM run in different processes : is my assumption correct?

Regards, 
Praveen




-- 
Best Regards,
Ayan Guha


Re: AM creation in yarn-client mode

2016-02-09 Thread praveen S
Can you explain what happens in yarn client mode?

Regards,
Praveen
On 10 Feb 2016 10:55, "ayan guha"  wrote:

> It depends on yarn-cluster and yarn-client mode.
>
> On Wed, Feb 10, 2016 at 3:42 PM, praveen S  wrote:
>
>> Hi,
>>
>> I have 2 questions when running the spark jobs on yarn in client mode :
>>
>> 1) Where is the AM(application master) created :
>>
>> A) is it created on the client where the job was submitted? i.e driver
>> and AM on the same client?
>> Or
>> B) yarn decides where the the AM should be created?
>>
>> 2) Driver and AM run in different processes : is my assumption correct?
>>
>> Regards,
>> Praveen
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: AM creation in yarn-client mode

2016-02-09 Thread Jonathan Kelly
In yarn-client mode, the driver is separate from the AM. The AM is created
in YARN, and YARN controls where it goes (though you can somewhat control
it using YARN node labels--I just learned earlier today in a different
thread on this list that this can be controlled by
spark.yarn.am.labelExpression). Then what I understand is that the driver
talks to the AM in order to request additional YARN containers in which to
run executors.

In yarn-cluster mode, the SparkSubmit process outside of the cluster
creates the AM in YARN, and then what I understand is that the AM *becomes*
the driver (by invoking the driver's main method), and then it requests the
executor containers.

So yes, one difference between yarn-client and yarn-cluster mode is that in
yarn-client mode the driver and AM are separate, whereas they are the same
in yarn-cluster.

~ Jonathan
On Tue, Feb 9, 2016 at 9:57 PM praveen S  wrote:

> Can you explain what happens in yarn client mode?
>
> Regards,
> Praveen
> On 10 Feb 2016 10:55, "ayan guha"  wrote:
>
>> It depends on yarn-cluster and yarn-client mode.
>>
>> On Wed, Feb 10, 2016 at 3:42 PM, praveen S  wrote:
>>
>>> Hi,
>>>
>>> I have 2 questions when running the spark jobs on yarn in client mode :
>>>
>>> 1) Where is the AM(application master) created :
>>>
>>> A) is it created on the client where the job was submitted? i.e driver
>>> and AM on the same client?
>>> Or
>>> B) yarn decides where the the AM should be created?
>>>
>>> 2) Driver and AM run in different processes : is my assumption correct?
>>>
>>> Regards,
>>> Praveen
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>