[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051339#comment-14051339
 ] 

Hudson commented on YARN-2065:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #602 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/602/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Fix For: 2.5.0

 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051472#comment-14051472
 ] 

Hudson commented on YARN-2065:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1793 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1793/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Fix For: 2.5.0

 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051567#comment-14051567
 ] 

Hudson commented on YARN-2065:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1820 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1820/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Fix For: 2.5.0

 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050272#comment-14050272
 ] 

Hadoop QA commented on YARN-2065:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653606/YARN-2065-003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4178//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4178//console

This message is automatically generated.

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050286#comment-14050286
 ] 

Jian He commented on YARN-2065:
---

thanks for the testing, Steve!

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050521#comment-14050521
 ] 

Steve Loughran commented on YARN-2065:
--

With Jenkins happy, I'm +1 on this patch; it fixes what it says it does

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050562#comment-14050562
 ] 

Hudson commented on YARN-2065:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5808 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5808/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Fix For: 2.5.0

 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-06-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047111#comment-14047111
 ] 

Steve Loughran commented on YARN-2065:
--

I'll try to run my code against this patch this week

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-06-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047171#comment-14047171
 ] 

Hadoop QA commented on YARN-2065:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653066/YARN-2065-002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4134//console

This message is automatically generated.

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-05-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008045#comment-14008045
 ] 

Jian He commented on YARN-2065:
---

Also changed authorizeGetAndStopContainerRequest to check against appId.
bq. The token is generated with the previous container's attempt Id, instead of 
the current attemptId.
This actually should not be a problem after changing the two methods to check 
against appId instead of attemptId.



 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-05-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999094#comment-13999094
 ] 

Jian He commented on YARN-2065:
---

Looked at the exception posted in SLIDER-34, the problem is that  AM can get 
new containers from RM, but cannot launch the containers on NM because of the 
following method.
The token is generated with the previous container's attempt Id, instead of the 
current attemptId. And NM is checking the attemptId from NMToken against the 
attemptId from the container.
{code}
  public NMToken createAndGetNMToken(String applicationSubmitter,
  ApplicationAttemptId appAttemptId, Container container) {
try {
  this.readLock.lock();
  HashSetNodeId nodeSet = this.appAttemptToNodeKeyMap.get(appAttemptId);
  NMToken nmToken = null;
  if (nodeSet != null) {
if (!nodeSet.contains(container.getNodeId())) {
  LOG.info(Sending NMToken for nodeId :  + container.getNodeId()
  +  for container :  + container.getId());
  Token token =
  createNMToken(**container.getId().getApplicationAttemptId()**,
container.getNodeId(), applicationSubmitter);
  nmToken = NMToken.newInstance(container.getNodeId(), token);
  nodeSet.add(container.getNodeId());
}
  }
  return nmToken;
} finally {
  this.readLock.unlock();
}
  }
{code}
Changing this method will fix this problem. 

But another problem is that 
ContainerMangerImpl#authorizeGetAndStopContainerRequest also requires the 
previous NMToken to talk to the previous container and current NMToken to talk 
with current container. Luckily, it's now not throwing exception but just log 
error messages.  we also need to change the NM side to check against the 
applicationId rather than attemptId. 

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran

 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)