date:20130501

[
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated YARN-422:
-

Attachment: YARN-422.1.patch

Here's the first patch, which is ready for review. Based on the previous
definition file, the patch has the following updates:

1. Refactor some code (add some more logs, paraphrase the javadoc, and etc)

2. Rename AMNMClient to NMClient since not only AM will use this client.

3. In NMClientAsync, join the threadpool's thread, which is set to non-daeomon.

4. Enhance the test cases.

As the patch is already big, I suggest to defer the code changes of using the
client in AM and RM in the follow-up patches. In addition, as I found, maven
seems not to distinguish scope when checking cyclic dependency. Therefore,
making resourcemanager project depend on client (to use NMClient in AMLauncher)
will fail the build.

Add AM-NM client library

Key: YARN-422
URL: https://issues.apache.org/jira/browse/YARN-422
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
Attachments: AMNMClient_Defination.txt,
AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf,
YARN-422.1.patch

Create a simple wrapper over the AM-NM container protocol to provide hide the
details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-422) Add AM-NM client library


[ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646448#comment-13646448
 ] 

Hadoop QA commented on YARN-422:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581346/YARN-422.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.client.TestNMClientAsync

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/852//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/852//console

This message is automatically generated.

 Add AM-NM client library
 

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.1.patch


 Create a simple wrapper over the AM-NM container protocol to provide hide the 
 details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-422) Add AM-NM client library


 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Attachment: YARN-422.2.patch

Try to fix the test failure. The problem seems that the mockNMClient is changed 
before all the test cases are completed. So, split the success and the failure 
tests.

 Add AM-NM client library
 

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.1.patch, YARN-422.2.patch


 Create a simple wrapper over the AM-NM container protocol to provide hide the 
 details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-422) Add AM-NM client library


[ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646473#comment-13646473
 ] 

Hadoop QA commented on YARN-422:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581348/YARN-422.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/853//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/853//console

This message is automatically generated.

 Add AM-NM client library
 

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.1.patch, YARN-422.2.patch


 Create a simple wrapper over the AM-NM container protocol to provide hide the 
 details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-01 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646558#comment-13646558
]

Daryn Sharp commented on YARN-613:
--

I just have general concerns with assuming the entire hadoop environment is
trusted and thus introducing weaknesses at a global level . Ex. A weakness is
introduced every time one entity shares a secret to validate a token created by
another entity. Compromising one of hundreds or thousands of node shouldn't
put the entire cluster at risk. If I can gain access to one NM host and its
keytab, I believe I can secretly launch a malicious NM? NMs currently share a
global key container token secrets, but there is a jira to move to per-NM
secrets so sharing a global AM secret would be another step backwards.

Exploring alternate avenues to avoid global trust, is passing the allowed am
token allowed to get status and stop the container with the launch request not
feasible?

Create NM proxy per NM instead of per container
---

Key: YARN-613
URL: https://issues.apache.org/jira/browse/YARN-613
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Vinod Kumar Vavilapalli

Currently a new NM proxy has to be created per container since the secure
authentication is using a containertoken from the container.

[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements

2013-05-01 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646572#comment-13646572
]

Daryn Sharp commented on YARN-617:
--

bq. we are trying to change the auth to use AMTokens and authorization will
continue to be via ContainerTokens

I may have misinterpreted the other jira... I thought the goal is continue to
auth container launches with a container token, but change status and stop to
authenticate with the am token? Are you saying the goal is to auth container
launches with the am token too?

{quote}bq. A RPC server also enables SASL DIGEST-MD5 if a secret manager is
active.{quote}
bq. Off topic, but this is what I guessed is the reason underlying YARN-626, do
you know when this got merged into branch-2?

The SASL changes HADOOP-8783/HADOOP-8784 went in Oct 3-4 2012. The change
allowed servers to accept tokens regardless of security setting if a secret
manager is present, and for clients to always use a token if present regardless
of security setting. This didn't change behavior for secure cluster, so
YARN-626 can't be related because security is enabled and the AM is lacking a
token for the RM in its UGI.

In unsercure mode, AM can fake resource requirements
-

Key: YARN-617
URL: https://issues.apache.org/jira/browse/YARN-617
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Minor

Without security, it is impossible to completely avoid AMs faking resources.
We can at the least make it as difficult as possible by using the same
container tokens and the RM-NM shared key mechanism over unauthenticated
RM-NM channel.
In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology


 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: Pluggable topologies with NodeGroup for YARN.pdf

Implementation doc for NodeGroup layer support in YARN.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, 
 YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, 
 YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, 
 YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology


 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: YARN-18-v6.patch

Sync patch to latest trunk which keep consist with doc just attached.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, 
 YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, 
 YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, 
 YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology


[ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646652#comment-13646652
 ] 

Junping Du commented on YARN-18:


Thanks Luke for explanation. 
Hey Arun ([~acmurthy]), I just attached a doc with implementation details for 
this patch and YARN-19. Hopefully that will be helpful for your review. Your 
thought to abstract notion of topology logic in scheduler make great sense to 
me. We already did this for each scheduler, but you mean that need to abstract 
common part for all schedulers which is a little complicated but still doable. 
Can we do this code refactoring work in a separated jira and I am glad to work 
on it? For a new NodeGroup Scheduler, I think it may not be necessary as it 
address different topologies rather than different algorithm to 
isolate/prioritise job. So even under topology with NodeGroup layer, different 
user still need different scheduler like: Fair, Capacity. etc. Thoughts?

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, 
 YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, 
 YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, 
 YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

[
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646659#comment-13646659
]

Hadoop QA commented on YARN-18:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12581362/YARN-18-v6.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 4 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/854//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/854//console

This message is automatically generated.

Make locatlity in YARN's container assignment and task scheduling pluggable
for other deployment topology
-

Key: YARN-18
URL: https://issues.apache.org/jira/browse/YARN-18
Project: Hadoop YARN
Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
Labels: features
Attachments:
HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch,
MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch,
MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch,
MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf,
YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch,
YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch,
YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.patch

There are several classes in YARN’s container assignment and task scheduling
algorithms that relate to data locality which were updated to give preference
to running a container on other locality besides node-local and rack-local
(like nodegroup-local). This propose to make these data structure/algorithms
pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class
ScheduledRequests was made a package level class to it would be easier to
create a subclass, ScheduledRequestsWithNodeGroup.

[jira] [Updated] (YARN-582) Restore appToken for app attempt after RM restart

2013-05-01 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-582:
-

Attachment: YARN-582.3.patch

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart

2013-05-01 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646806#comment-13646806
 ] 

Jian He commented on YARN-582:
--

new patch addressed last comments

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart


[ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646835#comment-13646835
 ] 

Hadoop QA commented on YARN-582:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581393/YARN-582.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/855//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/855//console

This message is automatically generated.

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646914#comment-13646914
 ] 

Xuan Gong commented on YARN-513:


Actually, the RMClient.invoke() can be removed. I followed the pattern on how 
NMProxies created the proxy. So, in current patch, we just need to expose the 
proxy object, and client calls proxy.method(). It should be good to go. I will 
do the further test. 

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-01 Thread Chris Riccomini (JIRA)

[
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646993#comment-13646993
]

Chris Riccomini commented on YARN-614:
--

bq. One solution could be to move the check from finishAttempt() to
createAttempt(). finishAttempt() always enqueues a new attempt. the new attempt
creation checks if one can still be created based on failed count etc.

This wouldn't fix the problem with RMAppManager.recover(), would it? Whether we
enqueue attempts in finishAttempt or createAttempt, if the attempt account ever
goes above maxAppAttempts, it seems like RMAppManager would not recover the
app, right?

Are you proposing we always call appImpl.recover() in RMAppManager, always
retry in RMAppImpl.AttemptFailedTransition, and call
RMAppImpl.countFailureToAttemptLimit() inside RMAppImpl.createNewAttempt?

Retry attempts automatically for hardware failures or YARN issues and set
default app retries to 1
--

Key: YARN-614
URL: https://issues.apache.org/jira/browse/YARN-614
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Bikas Saha
Attachments: YARN-614-0.patch

Attempts can fail due to a large number of user errors and they should not be
retried unnecessarily. The only reason YARN should retry an attempt is when
the hardware fails or YARN has an error. NM failing, lost NM and NM disk
errors are the hardware errors that come to mind.

[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number


[ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647062#comment-13647062
 ] 

Vinod Kumar Vavilapalli commented on YARN-618:
--

+1, checking it in.

 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, 
 YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number

2013-05-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647075#comment-13647075
 ] 

Hudson commented on YARN-618:
-

Integrated in Hadoop-trunk-Commit #3708 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3708/])
YARN-618. Modified RM_INVALID_IDENTIFIER to be -1 instead of zero. 
Contributed by Jian He. (Revision 1478230)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1478230
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java


 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, 
 YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647085#comment-13647085
 ] 

Xuan Gong commented on YARN-513:


The new patch includes:
1. remove RMClient, create RMProxy, followed the pattern of NameNodeProxy, 
includes several static CreateXXX() method. This will be much simpler.
2. At NodeManagerUpdaterImpl, there is not RMProxy object anymore, use 
RMProxy.createRMProxy() to create ResourceTracker object directly.
3. Remove invoke method, call proxy.method() directly. 
4. change the testcases, including changes localRMClient to LocalRMProxy

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM


 [ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-513:
---

Attachment: YARN.513.5.patch

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch, YARN.513.5.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number


 [ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-618:
-

Attachment: YARN-618.3-branch-2.patch

The patch didn't apply cleanly against branch-2. Here's the one that I 
generated one myself which compiles successfully and passes 
TestContainerManager which had the merge conflict.

 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-618.1.patch, YARN-618.2.patch, 
 YARN-618.3-branch-2.patch, YARN-618.3.patch, YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647089#comment-13647089
 ] 

Xuan Gong commented on YARN-513:


The new patch is YARN.513.5.patch

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch, YARN.513.5.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-422) Add NM client library


 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Description: Create a simple wrapper over the ContainerManager protocol to 
provide hide the details of the protocol implementation.  (was: Create a simple 
wrapper over the AM-NM container protocol to provide hide the details of the 
protocol implementation.)

 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.1.patch, YARN-422.2.patch


 Create a simple wrapper over the ContainerManager protocol to provide hide 
 the details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-422) Add NM client library


 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Summary: Add NM client library  (was: Add AM-NM client library)

 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.1.patch, YARN-422.2.patch


 Create a simple wrapper over the AM-NM container protocol to provide hide the 
 details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-639) Make AM of Distributed Shell Use NMClient

Zhijie Shen created YARN-639:


 Summary: Make AM of Distributed Shell Use NMClient
 Key: YARN-639
 URL: https://issues.apache.org/jira/browse/YARN-639
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: YARN-422 adds 
Reporter: Zhijie Shen




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-639) Make AM of Distributed Shell Use NMClient


 [ 
https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-639:
-

Description: YARN-422 ads

 Make AM of Distributed Shell Use NMClient
 -

 Key: YARN-639
 URL: https://issues.apache.org/jira/browse/YARN-639
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: YARN-422 adds 
Reporter: Zhijie Shen

 YARN-422 ads

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647109#comment-13647109
 ] 

Hadoop QA commented on YARN-513:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581439/YARN.513.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/856//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/856//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/856//console

This message is automatically generated.

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch, YARN.513.5.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-639) Make AM of Distributed Shell Use NMClient


 [ 
https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-639:


Assignee: Zhijie Shen

 Make AM of Distributed Shell Use NMClient
 -

 Key: YARN-639
 URL: https://issues.apache.org/jira/browse/YARN-639
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: YARN-422 adds 
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-422 adds NMClient. AM of Distributed Shell should use it instead of 
 using ContainerManager directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-639) Make AM of Distributed Shell Use NMClient


 [ 
https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-639:
-

Environment: (was: YARN-422 adds )

 Make AM of Distributed Shell Use NMClient
 -

 Key: YARN-639
 URL: https://issues.apache.org/jira/browse/YARN-639
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-422 adds NMClient. AM of Distributed Shell should use it instead of 
 using ContainerManager directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-640) Make AM of M/R Use NMClient

Zhijie Shen created YARN-640:


 Summary: Make AM of M/R Use NMClient
 Key: YARN-640
 URL: https://issues.apache.org/jira/browse/YARN-640
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


YARN-422 adds NMClient. AM of mapreduce should use it instead of using the raw 
ContainerManager proxy directly. ContainerLauncherImpl needs to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-638) Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart


 [ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-638:
-

Summary: Add RMDelegationTokens back to DelegationTokenSecretManager after 
RM Restart  (was: Add DelegationTokens back to DelegationTokenSecretManager 
after RM Restart)

 Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart
 

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch


 This is missed in YARN-581. After RM restart, delegation tokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-606) negative queue metrics apps Failed

2013-05-01 Thread nemon lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-606:
---

Assignee: nemon lou

 negative  queue metrics apps Failed
 -

 Key: YARN-606
 URL: https://issues.apache.org/jira/browse/YARN-606
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor

 Queue metrcis apps Failed can be negative in some cases(more than one 
 attempt for an application can cause this).
 It's confusing if we use this metrics directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-638) Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart

2013-05-01 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-638:
-

Description: This is missed in YARN-581. After RM restart, 
RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed 
in YARN-581), and delegationTokenSecretManager  (was: This is missed in 
YARN-581. After RM restart, delegation tokens need to be added both in 
DelegationTokenRenewer (addressed in YARN-581), and 
delegationTokenSecretManager)

 Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart
 

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch


 This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-617) In unsercure mode, AM can fake resource requirements


 [ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-617:
--

Assignee: Omkar Vinit Joshi  (was: Vinod Kumar Vavilapalli)

 In unsercure mode, AM can fake resource requirements 
 -

 Key: YARN-617
 URL: https://issues.apache.org/jira/browse/YARN-617
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Priority: Minor

 Without security, it is impossible to completely avoid AMs faking resources. 
 We can at the least make it as difficult as possible by using the same 
 container tokens and the RM-NM shared key mechanism over unauthenticated 
 RM-NM channel.
 In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-617) In unsercure mode, AM can fake resource requirements


 [ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-617:
---

Attachment: YARN-617.20130501.patch

 In unsercure mode, AM can fake resource requirements 
 -

 Key: YARN-617
 URL: https://issues.apache.org/jira/browse/YARN-617
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: YARN-617.20130501.patch


 Without security, it is impossible to completely avoid AMs faking resources. 
 We can at the least make it as difficult as possible by using the same 
 container tokens and the RM-NM shared key mechanism over unauthenticated 
 RM-NM channel.
 In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-617) In unsercure mode, AM can fake resource requirements


 [ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-617:
---

Attachment: YARN-617.20130501.1.patch

 In unsercure mode, AM can fake resource requirements 
 -

 Key: YARN-617
 URL: https://issues.apache.org/jira/browse/YARN-617
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch


 Without security, it is impossible to completely avoid AMs faking resources. 
 We can at the least make it as difficult as possible by using the same 
 container tokens and the RM-NM shared key mechanism over unauthenticated 
 RM-NM channel.
 In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements

[
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647211#comment-13647211
]

Omkar Vinit Joshi commented on YARN-617:

I am attaching the patch. (Junit tests not included). I will update the patch
with tests soon.

* At present master key is exchanged between RM and NM only if the environment
is secured. I am updating this to make sure that RM - NM exchange master key in
both the scenarios Secured / Unsecured.
** During NM register
** During NM heartbeat (status updater only if key is updated as it is today)
* At present master key is not genertated/sent during container launch for
unsecured case. Now making sure that it is send as a part of the payload to
AMLauncher to NodeManager.. On Node Manager this token will be used to verify
container start request.
** For Secured case retrieving token from remoteUgi
** For unsecured case retrieving token from passed in container payload.

There are some other changes related to this patch
* start Container requires UGI-username to be that of container-id ... still I
have not understood why so? (ContainerLauncherImpl)
* Making sure that NMContainerTokenSecretManager is created for both cases.

In unsercure mode, AM can fake resource requirements
-

Key: YARN-617
URL: https://issues.apache.org/jira/browse/YARN-617
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Priority: Minor
Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch

[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException


[ 
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647229#comment-13647229
 ] 

Xuan Gong commented on YARN-629:


bq:Need a specific test which validates that the final exception is same as the 
one thrown on the remote side. Can you check if TestNodeManagerResync and 
TestContainerManager can be changed to validate this? Look for refs to YARN-142 
in those tests.
Should it be YarnRemoteException ???

 Make YarnRemoteException not be rooted at IOException
 -

 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-629.1.patch, YARN-629.2.patch


 After HADOOP-9343, it should be possible for YarnException to not be rooted 
 at IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException

[
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647228#comment-13647228
]

Xuan Gong commented on YARN-629:

bq:Need a specific test which validates that the final exception is same as the
one thrown on the remote side. Can you check if TestNodeManagerResync and
TestContainerManager can be changed to validate this? Look for refs to YARN-142
in those tests.
bq:Please create a sister JIRA in MapReduce, I'll review MR changes there but
commit that together with this patch.
Create MR ticket. MAPREDUCE-5204
bq:Investigated the test-failures?
they are all passing now.
bq:TestClientTokens: Explicitly catch YarnRemoteException and fail instead of
instance checks?
Can't understand the changes in TestClientRMService, explain please why we need
to only now do e.getCause().getMessage() to capture the remote exception
message.
YarnRemoteException is not rooted as IOException now. For ugi.doAs(), it throws
IOException, InterruptedException, UndeclaredThrowableException,etc, but does
not throw YarnRemoteException. So, we can not Explicitly catch
YarnRemoteException. I think if YarnRemoteException is throw inside the doAs(),
it will be wrapped. So by calling getCause() will get it back.

Make YarnRemoteException not be rooted at IOException
-

Key: YARN-629
URL: https://issues.apache.org/jira/browse/YARN-629
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-629.1.patch, YARN-629.2.patch

After HADOOP-9343, it should be possible for YarnException to not be rooted
at IOException

[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology


 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: YARN-18-v6.1.patch

Address minor javadoc issue in v6.1 patch. 

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, 
 YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, 
 YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, 
 YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, 
 YARN-18-v6.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements

[
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647242#comment-13647242
]

Vinod Kumar Vavilapalli commented on YARN-617:
--

bq. Are you saying the goal is to auth container launches with the am token too?
Yes. All communication with NM to be authenticated by AMToken.

We could keep it separate from startContainer() and stop/getStatus, but we want
to solve YARN-613 too. Having the authentication via container-token is forcing
us to create a connection per-container. You must have seen the gory MR
ContainerLauncher resorting to tricks like creating lots of threads, opening
and closing connections immediately to avoid hitting ulimits etc. Some of that
ugliness will go away if we perform all authentication using AMTokens and use
ContainerTokens for authorization.

Thanks for the tip on HADOOP-8783/HADOOP-8784.

In unsercure mode, AM can fake resource requirements
-

[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

[
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647249#comment-13647249
]

Hadoop QA commented on YARN-18:
---

{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12581468/YARN-18-v6.1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 4 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/857//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/857//console

This message is automatically generated.

Make locatlity in YARN's container assignment and task scheduling pluggable
for other deployment topology
-

[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

[
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647263#comment-13647263
]

Vinod Kumar Vavilapalli commented on YARN-613:
--

bq. I just have general concerns with assuming the entire hadoop environment is
trusted and thus introducing weaknesses at a global level . Ex. A weakness is
introduced every time one entity shares a secret to validate a token created by
another entity. Compromising one of hundreds or thousands of node shouldn't put
the entire cluster at risk.
Agree with you in general. Read on.

bq. If I can gain access to one NM host and its keytab, I believe I can
secretly launch a malicious NM?
That is true in general. And I am not sure how we can even contain such a
break-in. I suppose going the way of DataNode to start the server on privileged
ports will contain it [1]. If one can get hold of the keytab(owned by YARN
user), I suppose at that point he can launch the container-executor binary too,
which will give him root access. So it's all predicated on secure setup to not
do stupid things.

bq. NMs currently share a global key container token secrets, but there is a
jira to move to per-NM secrets so sharing a global AM secret would be another
step backwards.
Agreed.

bq. Exploring alternate avenues to avoid global trust, is passing the allowed
am token allowed to get status and stop the container with the launch request
not feasible?
May be it isn't clear in my proposal, but let me state it again anyways, mostly
repeating what I just commented about on YARN-617.
- Having the authentication via container-token is forcing us to create a
connection per-container.
- MR's ContainerLauncher for example resorts to tricks like creating lots of
threads, opening and closing connections immediately to avoid hitting ulimits
etc.
- Most of that ugliness will go away if we perform all authentication using
AMTokens for *all* AM-NM APIs and use ContainerTokens for authorization of
startContainer() requests.

May be we should just do [1] above (previleged ports).

To sum it up, I am open to suggestions. My fundamental requirements are:
- If possible, AMs should open only one connection - secure one - to each NM.
Not one per container
- All connections (all APIs) between AM and NM should be authenticated -
DIGEST based at best here - and if possible without AMs having to latch on to
things like ContainerTokens for potentially long periods.

Create NM proxy per NM instead of per container
---

Key: YARN-613
URL: https://issues.apache.org/jira/browse/YARN-613
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Vinod Kumar Vavilapalli

Currently a new NM proxy has to be created per container since the secure
authentication is using a containertoken from the container.

[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException