[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2019-02-20 Thread Rayman (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773513#comment-16773513
 ] 

Rayman commented on YARN-3554:
--

The RetryUpToMaximumTimeWithFixedSleep policy takes as input a maxTime and a 
sleepTime.
and internally is implemented as a RetryUpToMaximumCountWithFixedSleep with 
maxCount =  maxTime / sleepTime. 

This has a problem
It does not account for the time spent while performing the actual retry. For 
example, 
RetryUpToMaximumTimeWithFixedSleep with maxTime = 30 sec and sleepTime = 1sec. 
Will takeupto 90 seconds, if each retry (e.g., ConnectionTimeout) takes 2 
seconds to return. 
30 * (2 +1). 

A policy claiming to be RetryUpToMaximumTimeWithFixedSleep, should *actually* 
respect the *maximum time*, e.g., by recording a timestamp/timer.

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>Priority: Major
>  Labels: BB2015-05-RFC, newbie
> Fix For: 2.8.0, 2.7.1, 2.6.2, 3.0.0-alpha1
>
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-09-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934124#comment-14934124
 ] 

Sangjin Lee commented on YARN-3554:
---

The change applied cleanly to branch-2.6 for 2.6.2.

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>  Labels: BB2015-05-RFC, newbie
> Fix For: 2.7.1, 2.6.2
>
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533959#comment-14533959
 ] 

Naganarasimha G R commented on YARN-3554:
-

Hi [~jlowe], As 3 mins is fine with [~vinodkv] can we have this patch in ? 

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-TBR, newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534710#comment-14534710
 ] 

Naganarasimha G R commented on YARN-3554:
-

Thanks for reviewing and committing the patch [~jlowe], [~vinodkv]  
[~gtCarrera9] 

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-RFC, newbie
 Fix For: 2.7.1

 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534728#comment-14534728
 ] 

Hudson commented on YARN-3554:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7775 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7775/])
YARN-3554. Default value for maximum nodemanager connect wait time is too high. 
Contributed by Naganarasimha G R (jlowe: rev 
9757864fd662b69445e0c600aedbe307a264982e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-RFC, newbie
 Fix For: 2.7.1

 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534554#comment-14534554
 ] 

Jason Lowe commented on YARN-3554:
--

+1, committing this.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-RFC, newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534901#comment-14534901
 ] 

Hudson commented on YARN-3554:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2137 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2137/])
YARN-3554. Default value for maximum nodemanager connect wait time is too high. 
Contributed by Naganarasimha G R (jlowe: rev 
9757864fd662b69445e0c600aedbe307a264982e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-RFC, newbie
 Fix For: 2.7.1

 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534972#comment-14534972
 ] 

Hudson commented on YARN-3554:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #189 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/189/])
YARN-3554. Default value for maximum nodemanager connect wait time is too high. 
Contributed by Naganarasimha G R (jlowe: rev 
9757864fd662b69445e0c600aedbe307a264982e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-RFC, newbie
 Fix For: 2.7.1

 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531329#comment-14531329
 ] 

Li Lu commented on YARN-3554:
-

I think the current conclusion on HADOOP-11398 is that we need to make some 
non-trivial changes in the retry mechanism to make it work accurately. We may 
want to have some quick fix before that. 

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-TBR, newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531213#comment-14531213
 ] 

Vinod Kumar Vavilapalli commented on YARN-3554:
---

I am good with either. I think the more important fix is HADOOP-11398 to be 
able to configure things in a predictable manner.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-TBR, newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531224#comment-14531224
 ] 

Naganarasimha G R commented on YARN-3554:
-

if there are plans to get HADOOP-11398 faster then we can directly modify  
yarn.client.nodemanager-connect.max-wait-ms to 3mins, if not i feel it would be 
better to modify it to 1min and drop a comment in HADOOP-11398 to remodify this 
value back to 3 once its in . thoughts?

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: BB2015-05-TBR, newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529591#comment-14529591
 ] 

Naganarasimha G R commented on YARN-3554:
-

Hi [~vinodkv]  [~jlowe],
So would configuring yarn.client.nodemanager-connect.max-wait-ms as 1 min 
better ? 

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526775#comment-14526775
 ] 

Jason Lowe commented on YARN-3554:
--

YARN-3518 is a separate concern with different ramifications.  We should 
discuss it there and not mix these two.

bq. set this to a bigger value maybe based on network partition considerations 
not only for nm restart.
What value do you propose?  As pointed out earlier, anything over 10 minutes is 
pointless since the container allocation expires in that time.  Is it common 
for network partitions to take longer than 3 minutes but less than 10 minutes?  
If so we should tune the value for that.  If not then making the value larger 
just slows recovery time.

bq. 3 mins seems dangerous, If rm fails over and the recover takes serval mins, 
nm maybe kill all containers, in production env, it's not expected.

This JIRA is configuring the amount of time NM clients (i.e.: primarily 
ApplicationMasters and the RM when launching ApplicationMasters) will try to 
connect to a particular NM before failing.  I'm missing how RM failover leads 
to a mass killing of containers due to this proposed change.  This is not a 
property used by the NM, so the NM is not going to start killing all containers 
differently based on an updated value for it.  The only case where the RM will 
use this property is when connecting to NMs to launch AM containers, and it 
will only do so for NMs that have recently heartbeated.  Could you explain how 
this leads to all containers getting killed on a particular node?

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526934#comment-14526934
 ] 

Naganarasimha G R commented on YARN-3554:
-

Hi [~jlowe],
 earlier my query of ideal time and [~sandflee]'s comment is related to 
yarn.resourcemanager.connect.max-wait.ms and as [~gtCarrera] mentioned its 
just for discussion purpose.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526983#comment-14526983
 ] 

Jason Lowe commented on YARN-3554:
--

Ah, thanks [~Naganarasimha], sorry I missed that.  We can continue discussing 
the proper RM connect wait time over at YARN-3518, as obviously I cannot keep 
them straight here. ;-)

Are there still objections to lowering it from 15 mins to 3 mins?  I'm +1 for 
the second patch, but I'll wait a few days before committing to give time for 
alternate proposals.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527072#comment-14527072
 ] 

Vinod Kumar Vavilapalli commented on YARN-3554:
---

HADOOP-11398 and YARN-3238 relevant in that they caused AM-NM communication 
take a long time to timeout.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527087#comment-14527087
 ] 

Vinod Kumar Vavilapalli commented on YARN-3554:
---

bq. Are there still objections to lowering it from 15 mins to 3 mins? I'm +1 
for the second patch, but I'll wait a few days before committing to give time 
for alternate proposals.
For our users, we explicitly set yarn.client.nodemanager-connect.max-wait-ms to 
60,000 (one minute). As HADOOP-11398 is still not in, this ends up becoming 6 
minutes timeout (assuming each of the underlying rpc retries takes 1 sec * 50 
times to finish (50 secs), plus 10 seconds retry interval, causing 1min per 
retry and 6 retries overall).

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-02 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525143#comment-14525143
 ] 

sandflee commented on YARN-3554:


set this to a bigger value maybe based on network partition considerations not 
only for nm restart.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525245#comment-14525245
 ] 

Naganarasimha G R commented on YARN-3554:
-

Hi [~gtCarrera9],
Thanks for commenting on this jira but did not get the intention completely, 
whether you are expecting me to merge the changes required for 3518 here ?
if so i had few questions 
1. yarn-3518 tries to modify default value of 
yarn.resourcemanager.connect.max-wait.ms from 90 to 60, which not only 
impacts timeout from AM - RM but also NM - RM  and client(cli, web, application 
report etc..) - RM. Is that ok ? (I am ok with it but just wanted to point it 
out)
2. Given the current high availability, is it required to wait for 10 mins to 
detect that RM has failed is valid or shall i decrease that too to 3 mins ?

If you inform i can merge the changes of 3518 and also update in 
yarn-default.xml which is missing in 3518.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-02 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525264#comment-14525264
 ] 

sandflee commented on YARN-3554:


Hi [~Naganarasimha] 3 mins seems dangerous,  If rm fails over and the recover 
takes serval mins
, nm maybe kill all containers, in production env, it's not expected.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-02 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525406#comment-14525406
 ] 

Li Lu commented on YARN-3554:
-

Hi [~Naganarasimha], I just wanted to bring that JIRA into attention. We may 
want to share some discussions for both JIRAs. 

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524197#comment-14524197
 ] 

Li Lu commented on YARN-3554:
-

This is just a quick note that there's another JIRA YARN-3518 that fixes the 
similar problem for the RM. We may probably want to consider both of them. 

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519576#comment-14519576
 ] 

Naganarasimha G R commented on YARN-3554:
-

Agree with you [~jlowe], but what do you feel the ideal timeout should be, 3 
mins /  5 mins ? May be as you guys would have better experience with large 
number of nodes and see frequent NM failures you can suggest a better value 
here .


 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519603#comment-14519603
 ] 

Jason Lowe commented on YARN-3554:
--

I suggest we go with 3 minutes.  The retry interval is 10 seconds, so we'll get 
plenty of retries in that time if the failure is fast (e.g.: unknown host, 
connection refused) and still get a few retries in if the failure is slow 
(e.g.: connection timeout).

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519840#comment-14519840
 ] 

Hadoop QA commented on YARN-3554:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   7m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 46s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 30s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 59s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729227/YARN-3554-20150429-2.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f82970 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/console |


This message was automatically generated.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519360#comment-14519360
 ] 

Jason Lowe commented on YARN-3554:
--

I think 10 minutes is still too high.  We didn't even have this functionality 
until 2.6 because of rolling upgrades, and NMs don't take that long to recover 
in a rolling upgrade.  They recover in tens of seconds rather than tens of 
minutes.  Therefore I don't think it makes much sense to spend a lot of time 
trying to connect to an NM beyond a few minutes.  The chances of successfully 
connecting after a few minutes of trying is going to be very low, and NMs fail 
all the time anyway.  So if we spend all that extra time trying for essentially 
no benefit, all we've done is prolonged the application recovery time for no 
good reason.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518559#comment-14518559
 ] 

Hadoop QA commented on YARN-3554:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 14s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   5m 32s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 48s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 21s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 11s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12728970/YARN-3554.20150429-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c79e7f7 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7531/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7531/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7531/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7531/console |


This message was automatically generated.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)