[jira] [Updated] (YARN-964) Give a parameter that can set AM retry interval

2013-07-26 Thread qus-jiawei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qus-jiawei updated YARN-964:


Affects Version/s: 2.3.0

 Give a parameter that can set  AM retry interval
 

 Key: YARN-964
 URL: https://issues.apache.org/jira/browse/YARN-964
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: qus-jiawei

 Our am retry number is 4.
 As one nodemanager 's disk is full,the container of am couldn't allocate on 
 this nodemanager.But RM try this AM on the same NM every 3 secondes.
 i think there shoule be a params to set the  AM retry interval.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720483#comment-13720483
 ] 

Alejandro Abdelnur commented on YARN-937:
-

[~bikassaha], [~vinodkv], anything else to be addressed? 

 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-07-26 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-978:
---

Summary: [YARN-321] Adding ApplicationAttemptReport and Protobuf 
implementation  (was: Adding ApplicationAttemptReport and Protobuf 
implementation)

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-978) Adding ApplicationAttemptReport and Protobuf implementation

2013-07-26 Thread Mayank Bansal (JIRA)
Mayank Bansal created YARN-978:
--

 Summary: Adding ApplicationAttemptReport and Protobuf 
implementation
 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal


We dont have ApplicationAttemptReport and Protobuf implementation.

Adding that.

Thanks,
Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-07-26 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-978:
---

Attachment: YARN-978-1.patch

Attaching patch.

Thanks,
Mayank

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-978-1.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-925) Read Interface of HistoryStorage for AHS

2013-07-26 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-925:
---

Attachment: YARN-925-2.patch

Updating Interface.

Thanks,
Mayank

 Read Interface of HistoryStorage for AHS
 

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-925-1.patch, YARN-925-2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720551#comment-13720551
 ] 

Hadoop QA commented on YARN-978:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594345/YARN-978-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1589//console

This message is automatically generated.

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-978-1.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-979) [YARN-321] Adding application attempt and container to ApplicationHistoryProtocol

2013-07-26 Thread Mayank Bansal (JIRA)
Mayank Bansal created YARN-979:
--

 Summary: [YARN-321] Adding application attempt and container to 
ApplicationHistoryProtocol
 Key: YARN-979
 URL: https://issues.apache.org/jira/browse/YARN-979
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal


 Adding application attempt and container to ApplicationHistoryProtocol

Thanks,
Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-925) Read Interface of HistoryStorage for AHS

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720559#comment-13720559
 ] 

Hadoop QA commented on YARN-925:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594346/YARN-925-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1590//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1590//console

This message is automatically generated.

 Read Interface of HistoryStorage for AHS
 

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-925-1.patch, YARN-925-2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-18) Configurable Hierarchical Topology for YARN

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Summary: Configurable Hierarchical Topology for YARN  (was: Make locatlity 
in YARN's container assignment and task scheduling pluggable for other 
deployment topology)

 Configurable Hierarchical Topology for YARN
 ---

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309.patch, 
 MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, 
 MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, 
 Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, 
 YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, 
 YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, 
 YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, 
 YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, 
 YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: HierachicalTopologyForYARNr1.pdf

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309.patch, 
 MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, 
 MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, 
 Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, 
 YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, 
 YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, 
 YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, 
 YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, 
 YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-18) Configurable Hierarchical Topology for YARN

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Description: Per discussion in the design lounge of Hadoop Summit 2013, we 
agreed to change the design of “Pluggable topologies with NodeGroup for YARN” 
to support a configurable hierarchical topology that makes adding additional 
locality layers simple. Please refer attached doc 
HierachicalTopologyForYARNr1.pdf for details.  (was: There are several classes 
in YARN’s container assignment and task scheduling algorithms that relate to 
data locality which were updated to give preference to running a container on 
other locality besides node-local and rack-local (like nodegroup-local). This 
propose to make these data structure/algorithms pluggable, like: SchedulerNode, 
RMNodeImpl, etc. The inner class ScheduledRequests was made a package level 
class to it would be easier to create a subclass, 
ScheduledRequestsWithNodeGroup.)

 Configurable Hierarchical Topology for YARN
 ---

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309.patch, 
 MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, 
 MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, 
 Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, 
 YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, 
 YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, 
 YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, 
 YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, 
 YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch


 Per discussion in the design lounge of Hadoop Summit 2013, we agreed to 
 change the design of “Pluggable topologies with NodeGroup for YARN” to 
 support a configurable hierarchical topology that makes adding additional 
 locality layers simple. Please refer attached doc 
 HierachicalTopologyForYARNr1.pdf for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-762) Javadoc params not matching in ContainerManagerImpl.authorizeRequest method

2013-07-26 Thread Niranjan Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Singh reassigned YARN-762:
---

Assignee: Niranjan Singh

 Javadoc params not matching in ContainerManagerImpl.authorizeRequest method
 ---

 Key: YARN-762
 URL: https://issues.apache.org/jira/browse/YARN-762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Niranjan Singh
Assignee: Niranjan Singh
Priority: Minor
 Attachments: YARN-762.patch


 In ContainerManagerImpl.authorizeRequest method the number of parameters 
 passed are four where as in Javadoc the params are only three.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-18) Configurable Hierarchical Topology for YARN

2013-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720603#comment-13720603
 ] 

Junping Du commented on YARN-18:


Hi, according to our discussion in the design lounge at Hadoop Summit 2013, we 
agree to change previous design of “Pluggable topologies with NodeGroup for 
YARN” to something new -- support a configurable hierarchical topology that 
makes adding additional locality layers simple. I attached the new version of 
proposal - HierachicalTopologyForYARNr1.pdf, please help to review and 
comments. Thx!

 Configurable Hierarchical Topology for YARN
 ---

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309.patch, 
 MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, 
 MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, 
 Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, 
 YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, 
 YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, 
 YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, 
 YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, 
 YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch


 Per discussion in the design lounge of Hadoop Summit 2013, we agreed to 
 change the design of “Pluggable topologies with NodeGroup for YARN” to 
 support a configurable hierarchical topology that makes adding additional 
 locality layers simple. Please refer attached doc 
 HierachicalTopologyForYARNr1.pdf for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Raghu C Doppalapudi (JIRA)
Raghu C Doppalapudi created YARN-980:


 Summary: Nodemanager is shutting down while executing a mapreduce 
job
 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: CDH4.3
Reporter: Raghu C Doppalapudi
Priority: Critical


2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:662)
2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
stopped.
2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
SelectChannelConnector@0.0.0.0:8042
2013-07-24 11:00:26,686 INFO service.AbstractService - 
Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-977) Interface for users/AM to know actual usage by the container

2013-07-26 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720810#comment-13720810
 ] 

Timothy St. Clair commented on YARN-977:


Usage statistics can also be reported via cgroups.

 Interface for users/AM to know actual usage by the container
 

 Key: YARN-977
 URL: https://issues.apache.org/jira/browse/YARN-977
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Omkar Vinit Joshi

 Today we allocate resource (memory and cpu) and node manager starts the 
 container with requested resource [I am assuming they are using cgroups]. But 
 there is definitely a possibility of users requesting more than what they 
 actually may need during the execution of their container/job-task. If we add 
 a way for users/AM to know the actual usage of the requested/completed 
 container then they may optimize it for next run..
 This will be helpful for AM to optimize cpu/memory resource requests by 
 querying NM/RM to know avg/max cpu/memory usage of the container or may be 
 containers belonging to application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-107) ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly

2013-07-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720845#comment-13720845
 ] 

Jason Lowe commented on YARN-107:
-

bq. I think the easiest way to differentiate the error is based on the 
exception types it catches. And for the user side, the easiest way is 
differentiate the error based on the different exist code if we set different 
exist code for different types of error instead of just simply throwing the 
exceptions 

IMHO the most common case for this API is to make sure the application is no 
longer running, and the caller isn't so much worried about the exact final 
state as long as it's a terminal state.  That means for the common case, users 
are going to have to wrap calls to this in a try..catch just so they can ignore 
the corner-case exception.  Sounds like a pain.  Do we really need to throw an 
exception in this case?  Is the client really going to care and want to field 
said exception?  Same with the CLI, callers would need to check for explicit 
exit codes to make sure what looks like an error really is an error.  If the 
caller really cares about distinguishing between killing a running app and 
killing an already terminated app, can't they just check the state first?  
Regardless of whether they do the check or this API does it for them, there 
will always be a race where the app completes before it is killed.

Maybe I'm in the minority here, and that's fine.  I just don't want the API to 
be difficult to wield in the common case if there's a way for the caller to 
cover the corner case in another way.

 ClientRMService.forceKillApplication() should handle the non-RUNNING 
 applications properly
 --

 Key: YARN-107
 URL: https://issues.apache.org/jira/browse/YARN-107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Devaraj K
Assignee: Xuan Gong
 Attachments: YARN-107.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720913#comment-13720913
 ] 

Junping Du commented on YARN-873:
-

Hi [~bikassaha] and [~jeffgx619], TestNonExistentJob on trunk is failed after 
this patch goes in. MAPREDUCE-5421 tries to fix it. Would you help to look at 
it? Thx!

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.1.0-beta

 Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch, 
 YARN-873.4.patch


 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-974) RMContainer should collection more useful information to be recorded

2013-07-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720948#comment-13720948
 ] 

Zhijie Shen commented on YARN-974:
--

bq. you may have to consider during the run and after the run too (log 
aggregation) ..

Thanks for the remind. Since the history information is recorded and used after 
the completion of an application, IMHO, we just need the after run log. Please 
correct me if I'm wrong.

 RMContainer should collection more useful information to be recorded
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-974) RMContainer should collection more useful information to be recorded

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720962#comment-13720962
 ] 

Omkar Vinit Joshi commented on YARN-974:


bq. Thanks for the remind. Since the history information is recorded and used 
after the completion of an application, IMHO, we just need the after run log. 
Please correct me if I'm wrong.
So you are saying we won't get any info till application finishes? I thought we 
will get this information once container starts but it may or may not have 
finished or application may or may not have finished. So we will have 
transitions like.. nothing - present locally on node manager - present on 
remote hdfs.

 RMContainer should collection more useful information to be recorded
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Raghu C Doppalapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu C Doppalapudi updated YARN-980:
-

Environment: (was: CDH4.3)

 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721011#comment-13721011
 ] 

Omkar Vinit Joshi commented on YARN-980:


There must be an additional exception which has resulted into this.. can you 
please attach NM logs for this?  is it related to  YARN-573 ??

 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-960) TestMRCredentials and TestBinaryTokenFile are failing on trunk

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721064#comment-13721064
 ] 

Vinod Kumar Vavilapalli commented on YARN-960:
--

This is a long standing bug. I always saw these tests failing on my local 
setup, but thought it was just me as Jenkins never reported these.

Anyways, the patch looks good to me. +1. The two tests pass for now. Single 
node cluster pi example also goes beyond the localization.

[~tucu00], seems like this is some problem only on your node - looks like a 
different bug.
bq. 2013-07-24 16:58:19,691 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 DEBUG: FAILED { 
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar,
 1374710294773, PATTERN, (?:classes/|lib/).* }, rename destination 
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/filecache/12
 already exists.

Are you hitting this consistently?

 TestMRCredentials and  TestBinaryTokenFile are failing on trunk
 ---

 Key: YARN-960
 URL: https://issues.apache.org/jira/browse/YARN-960
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Daryn Sharp
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-960.patch


 Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
 Making it a blocker until full impact of the issue is scoped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721074#comment-13721074
 ] 

Vinod Kumar Vavilapalli commented on YARN-937:
--

bq.  org.apache.hadoop.mapreduce.v2.TestNonExistentJob
Filed MAPREDUCE-5424

 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-960) TestMRCredentials and TestBinaryTokenFile are failing on trunk

2013-07-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721118#comment-13721118
 ] 

Kihwal Lee commented on YARN-960:
-

I am no expert, but looking at all isSecurityEnabled() calls in YARN, many of 
them don't look right.

 TestMRCredentials and  TestBinaryTokenFile are failing on trunk
 ---

 Key: YARN-960
 URL: https://issues.apache.org/jira/browse/YARN-960
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Daryn Sharp
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-960.patch


 Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
 Making it a blocker until full impact of the issue is scoped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-960) TestMRCredentials and TestBinaryTokenFile are failing on trunk

2013-07-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721120#comment-13721120
 ] 

Kihwal Lee commented on YARN-960:
-

[~daryn] and [~vinodkv]. Can you go over them?

 TestMRCredentials and  TestBinaryTokenFile are failing on trunk
 ---

 Key: YARN-960
 URL: https://issues.apache.org/jira/browse/YARN-960
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Daryn Sharp
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-960.patch


 Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
 Making it a blocker until full impact of the issue is scoped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-960) TestMRCredentials and TestBinaryTokenFile are failing on trunk

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721124#comment-13721124
 ] 

Vinod Kumar Vavilapalli commented on YARN-960:
--

I just quickly did. Other than ClientToAMToken, JHSToken and LocalizerToken 
which only work in secure mode, we are good.

 TestMRCredentials and  TestBinaryTokenFile are failing on trunk
 ---

 Key: YARN-960
 URL: https://issues.apache.org/jira/browse/YARN-960
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Daryn Sharp
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-960.patch


 Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
 Making it a blocker until full impact of the issue is scoped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-980:


Assignee: Vinod Kumar Vavilapalli

Didn't know this before, the default number of parallel downloads is 4, can you 
increase yarn.nodemanager.localizer.fetch.thread-count and try again? It's a NM 
config, so you have to restart NMs after this change.

Seems like you have lots of public distributed cache files, it has to be 
increased depending on that..

 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Assignee: Vinod Kumar Vavilapalli
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Raghu C Doppalapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721216#comment-13721216
 ] 

Raghu C Doppalapudi commented on YARN-980:
--

Yes Vinod you are correct we have lot of files in the downloading state before 
this incident happened. will update you with our finding after the config 
change suggested.
Also will post the NM logs.

 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Assignee: Vinod Kumar Vavilapalli
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721228#comment-13721228
 ] 

Alejandro Abdelnur commented on YARN-937:
-

bq. Since the AMRMToken already has the service field populated we dont need to 
override anything. So we dont need to lookup any address from config in the 
YARNClient code. Later, for HA if we need to do some translation, then it 
should probably happen via the RMProxy layer. Does that work for you?

Well, [~daryn] has been doing lot of work to ensure the service of a token is 
not set by the server but by the client. Doing what you suggest is going 
opposite to that.

This has to be done in the client (for example in the case of a multi-homing 
setup, it would not work otherwise as the RM does not know the hostname/IP 
visible to the user).

Also, looking at the conf is exactly what the {{ClientRMProxy}} is doing within 
the {{getRMAddress()}}.

I think the current patch it is the right approach.



 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Raghu C Doppalapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721231#comment-13721231
 ] 

Raghu C Doppalapudi commented on YARN-980:
--

And also this incident is not happening every time, it is infrequent. Following 
is the entire stack trace Omkar


2013-07-25 23:06:09,851 INFO  containermanager.ContainerManagerImpl - Start 
request for container_1374637086174_0533_01_02 by user testuser
2013-07-25 23:06:09,851 INFO  containermanager.ContainerManagerImpl - Creating 
a new application reference for app application_1374637086174_0533
2013-07-25 23:06:09,851 INFO  nodemanager.NMAuditLogger - USER=testuser  
IP=10.224.111.21OPERATION=Start Container Request   
TARGET=ContainerManageImpl  RESULT=SUCCESS  
APPID=application_1374637086174_0533
CONTAINERID=container_1374637086174_0533_01_02
2013-07-25 23:06:09,852 INFO  application.Application - Application 
application_1374637086174_0533 transitioned from NEW to INITING
2013-07-25 23:06:09,853 INFO  application.Application - Adding 
container_1374637086174_0533_01_02 to application 
application_1374637086174_0533
2013-07-25 23:06:09,944 INFO  application.Application - Application 
application_1374637086174_0533 transitioned from INITING to RUNNING
2013-07-25 23:06:09,948 INFO  container.Container - Container 
container_1374637086174_0533_01_02 transitioned from NEW to LOCALIZING
2013-07-25 23:06:09,948 INFO  containermanager.AuxServices - Got event 
APPLICATION_INIT for appId application_1374637086174_0533
2013-07-25 23:06:09,948 INFO  containermanager.AuxServices - Got 
APPLICATION_INIT for service mapreduce.shuffle
2013-07-25 23:06:09,948 INFO  mapred.ShuffleHandler - Added token for 
job_1374637086174_0533
2013-07-25 23:06:09,948 INFO  localizer.LocalizedResource - Resource 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/locale-sh-0.0.3.jar
 transitioned from INIT to DOWNLOADING
2013-07-25 23:06:09,948 INFO  localizer.LocalizedResource - Resource 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/hadoop-mapreduce-client-app-2.0.0-cdh4.2.1.jar
 transitioned from INIT to DOWNLOADING
2013-07-25 23:06:09,948 INFO  localizer.LocalizedResource - Resource 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/mahout-utils-0.5.jar
 transitioned from INIT to DOWNLOADING
2013-07-25 23:06:09,948 INFO  localizer.LocalizedResource - Resource 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/avro-1.3.0-rc1-sfdc-patch1.jar
 transitioned from INIT to DOWNLOADING
2013-07-25 23:06:09,948 INFO  localizer.LocalizedResource - Resource 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/jung-samples-2.0.1.jar
 transitioned from INIT to DOWNLOADING
2013-07-25 23:06:09,948 INFO  localizer.LocalizedResource - Resource 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/commons-math-2.1.jar
 transitioned from INIT to DOWNLOADING
2013-07-25 23:06:09,948 INFO  localizer.LocalizedResource - Resource 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/vtd-xml-2.6.jar
 transitioned from INIT to DOWNLOADING

….. around 30 jars are in downloading state.

2013-07-25 23:06:09,957 INFO  localizer.ResourceLocalizationService - 
Downloading public rsrc:{ 
hdfs://internal-EMPTY-gfist1/testuser-wsl/test_case/PreDriverTest/testJobWithFakeQueries/10383925454001385/sfdc_lib/10381068948677006/locale-sh-0.0.3.jar,
 1374793533752, FILE, null }
2013-07-25 23:06:09,957 FATAL event.AsyncDispatcher - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
at 

[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-744:
---

Attachment: YARN-744-20130726.1.patch

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, 
 YARN-744-20130726.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721282#comment-13721282
 ] 

Omkar Vinit Joshi commented on YARN-744:


Thanks [~bikassaha] ...

bq. AllocateResponseWrapper res
how about AllocateResponseLock??

bq. If the wrapper exists then how can the lastResponse be null?
you are right ..now we no longer need this removing it.

yeah the test won't actually be able to simulate the race condition mentioned 
above. Can't think of any other test. Attaching it without a test.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, 
 YARN-744-20130726.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721300#comment-13721300
 ] 

Hadoop QA commented on YARN-744:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12594461/YARN-744-20130726.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1591//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1591//console

This message is automatically generated.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, 
 YARN-744-20130726.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721327#comment-13721327
 ] 

Bikas Saha commented on YARN-937:
-

We are on the same page that this translation should happen on the client side. 
We tried our best to remove static RM address translation code from all clients 
and NM code and moved the logic into a common RMProxy layer. Hence, I am trying 
to avoid putting static RM address lookup back into YarnClient code. At some 
point in the near future, we may be doing the address translation for tokens 
inside the RMProxy layer like HDFS already does. My suggestion was a compromise 
that allows things to work correctly as of now while at the same time not 
regressing on the effort we made to remove RM static address translation code. 
I hope this clarifies my concerns. Thoughts?

 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-981) YARN/MR2/Job history /logs and /metrics link do not have correct content

2013-07-26 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-981:
--

 Summary: YARN/MR2/Job history /logs and /metrics link do not have 
correct content
 Key: YARN-981
 URL: https://issues.apache.org/jira/browse/YARN-981
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721333#comment-13721333
 ] 

Omkar Vinit Joshi commented on YARN-980:


Yeah it can occur either if you have hit the thread limit or nm crashed and 
called shutdown on exec before it received new request. I have faced this issue 
in past in which nm crashed resulting into exec.shutdown() and then this 
exception..NM logs  AM logs will definitely help. [~raghu.hb...@gmail.com] can 
you please attach logs?

 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Assignee: Vinod Kumar Vavilapalli
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721338#comment-13721338
 ] 

Omkar Vinit Joshi commented on YARN-980:


[~raghu.hb...@gmail.com] did you see this message in logs?
{code}
LOG.info(Public cache exiting);
{code}
before or after above stack trace?

 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Assignee: Vinod Kumar Vavilapalli
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721345#comment-13721345
 ] 

Omkar Vinit Joshi commented on YARN-948:


bq. If some of the release requests are valid and some are invalid, we should 
accept the valid requests?
bq. If so, please modify the test to validate these multiple success/failure 
cases.
Probably not. We should have same behavior like ask. thoughts?

bq. To indicate its non-scheduler-specificity, 
validateContainerReleaseRequest() could be in RMServerUtils?
Other validate calls are present in SchedulerUtils. Let me know if I should 
move all or this?

bq. Shouldn't be using InvalidResourceRequestException for invalid 
release-requests. Don't know if we are over-killing it, but a new exception?
Yeah thought about it but but then sticked to it don't think we should have 
separate exception for this scenario. Let me know I will modify.

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721349#comment-13721349
 ] 

Vinod Kumar Vavilapalli commented on YARN-937:
--

Let's move, folks. Either
 - Keep it on the server side for now (which was always the case) and fix it 
separately to be in RMProxy later.
 - Or fix it now itself in RMProxy.
 - Or just leave it as is in the patch.

We all agree it needs to be on the client side, and that to in RMProxy. It's 
okay to compromise in any of the above ways to move this blocker ahead. Tx.

 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (YARN-885) TestBinaryTokenFile (and others) fail

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reopened YARN-885:
-


It seems TestBinaryTokenFile is still failed from recent jenkins job on trunk, 
(please refer: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//testReport/) so I 
reopen this jira. Kam, are you working on this?

 TestBinaryTokenFile (and others) fail
 -

 Key: YARN-885
 URL: https://issues.apache.org/jira/browse/YARN-885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Kam Kasravi

 Seeing the following stack trace and the unit test goes into a infinite loop:
 2013-06-24 17:03:58,316 ERROR [LocalizerRunner for 
 container_1372118631537_0001_01_01] security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1480)) - PriviledgedActionException 
 as:kamkasravi (auth:SIMPLE) cause:java.io.IOException: Server asks us to fall 
 back to SIMPLE auth, but this client is configured to only allow secure 
 connections.
 2013-06-24 17:03:58,317 WARN  [LocalizerRunner for 
 container_1372118631537_0001_01_01] ipc.Client (Client.java:run(579)) - 
 Exception encountered while connecting to the server : java.io.IOException: 
 Server asks us to fall back to SIMPLE auth, but this client is configured to 
 only allow secure connections.
 2013-06-24 17:03:58,318 ERROR [LocalizerRunner for 
 container_1372118631537_0001_01_01] security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1480)) - PriviledgedActionException 
 as:kamkasravi (auth:SIMPLE) cause:java.io.IOException: java.io.IOException: 
 Server asks us to fall back to SIMPLE auth, but this client is configured to 
 only allow secure connections.
 java.lang.reflect.UndeclaredThrowableException
 at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
 at 
 org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:56)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:247)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:181)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:859)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721362#comment-13721362
 ] 

Bikas Saha commented on YARN-937:
-

Alejandro, 
We can keep my suggestion and remove the translation code from YARNClient now. 
Things should continue to work if we simply remove that code since the token 
already has the address and there is no need to re-populate it. 

Or we can continue to keep the config address lookup code in YarnClientImpl. 
Later, when we improve RMProxy to do the right thing, we will have to remember 
to come back and remove this code.

Your call.

+1


 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721366#comment-13721366
 ] 

Vinod Kumar Vavilapalli commented on YARN-948:
--

bq. Probably not. We should have same behavior like ask. thoughts?
Ask can definitely be not per resource-request as there are implicit 
dependencies between host and rack requests. For blacklist and release, there 
are no dependencies, so handling them individually is fine. But I'm okay doing 
it like you have mainly because reporting part partial successes and failures 
is more API changes. *sigh*

bq. Other validate calls are present in SchedulerUtils. Let me know if I should 
move all or this?
Yeah, let's move'em if they aren't really tied to any scheduler.

bq. Yeah thought about it but but then sticked to it don't think we should have 
separate exception for this scenario. Let me know I will modify.
It is clearly not InvalidResourceRequest, so let's create a new one and add 
documentation too.

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-974) RMContainer should collection more useful information to be recorded

2013-07-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721376#comment-13721376
 ] 

Zhijie Shen commented on YARN-974:
--

bq. So we will have transitions like.. nothing - present locally on node 
manager - present on remote hdfs.

Good point. The log url that points to the NM web can be made available as 
early as the container is launched. The url should then be updated to the the 
AHS web when the container is finished, as the aggregated log is supposed to be 
accessed via AHS in the future. 

 RMContainer should collection more useful information to be recorded
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-974) RMContainer should collection more useful information to be recorded

2013-07-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-974:
-

Attachment: YARN-974.1.patch

Make RMContainer collect more information. The missing part is that the URL is 
not updated after the container is finished, because we need to access the 
aggregated log via AHS web, which is not completed. Will do that afterwards. 

 RMContainer should collection more useful information to be recorded
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-974.1.patch


 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721389#comment-13721389
 ] 

Bikas Saha commented on YARN-937:
-

Another option is to add a getRMCurrentAddress() to RMProxy and use that 
instead of statically reading from config. We keep the client side address 
logic while still maintaining the RMProxy as the source of truth.

 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-347:


Attachment: YARN-347-v3.patch

Sync up with latest trunk.

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721403#comment-13721403
 ] 

Alejandro Abdelnur commented on YARN-937:
-


{{getRMCurrentAddress()}}, we would have to pass the config and a more 
appropriate name would be {{getRMSchedulerAddress()}}.

Also, shouldn't be the {{ApplicationClientProtocol}} instance the one returning 
the address?

And this seems like it will affect other things as well. I think we should take 
care of this in another JIRA.

Are you OK on committing the current patch and following up with another JIRA 
(making it a blocker for the release after 2.1.0)?



 Fix unmanaged AM in non-secure/secure setup post YARN-701
 -

 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-937.patch, YARN-937.patch, YARN-937.patch, 
 YARN-937.patch, YARN-937.patch, YARN-937.patch


 Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
 will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-974) RMContainer should collection more useful information to be recorded

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721408#comment-13721408
 ] 

Hadoop QA commented on YARN-974:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594478/YARN-974.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1592//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1592//console

This message is automatically generated.

 RMContainer should collection more useful information to be recorded
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-974.1.patch


 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721410#comment-13721410
 ] 

Hadoop QA commented on YARN-347:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594482/YARN-347-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1593//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1593//console

This message is automatically generated.

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721418#comment-13721418
 ] 

Prashant Kommireddi commented on YARN-980:
--

[~vinodkv] regarding
{quote}
Didn't know this before, the default number of parallel downloads is 4, can you 
increase yarn.nodemanager.localizer.fetch.thread-count and try again?
{quote}

What is a good way to determine the ideal value for this config?

Question for everyone - what is the reasoning behind killing a process when a 
threshold is met instead of throttling or something else to that effect? It 
makes sense in a few cases, but killing in this case seems quite drastic, no?


 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Assignee: Vinod Kumar Vavilapalli
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-974) RMContainer should collection more useful information to be recorded

2013-07-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-974:
-

Attachment: YARN-974.2.patch

Fix the test failure. BTW, the patch should be applicable to trunk as well.

 RMContainer should collection more useful information to be recorded
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-974.1.patch, YARN-974.2.patch


 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-980) Nodemanager is shutting down while executing a mapreduce job

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721435#comment-13721435
 ] 

Vinod Kumar Vavilapalli commented on YARN-980:
--

Okay, I was wrong. The queue is unbounded, the default 4 threads is just a core 
pool size. [~ojoshi] is right, it must have been caused by some other problem, 
sharing logs will be helpful.

 Nodemanager is shutting down while executing a mapreduce job
 

 Key: YARN-980
 URL: https://issues.apache.org/jira/browse/YARN-980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Raghu C Doppalapudi
Assignee: Vinod Kumar Vavilapalli
Priority: Critical

 2013-07-24 11:00:26,582 FATAL event.AsyncDispatcher - Error in dispatcher 
 thread
 java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 at 
 java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:621)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:516)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:458)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)
 2013-07-24 11:00:26,582 INFO event.AsyncDispatcher - Exiting, bbye..
 2013-07-24 11:00:26,583 INFO service.AbstractService - Service:Dispatcher is 
 stopped.
 2013-07-24 11:00:26,585 INFO mortbay.log - Stopped 
 SelectChannelConnector@0.0.0.0:8042
 2013-07-24 11:00:26,686 INFO service.AbstractService - 
 Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-974) RMContainer should collection more useful information to be recorded

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721438#comment-13721438
 ] 

Hadoop QA commented on YARN-974:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594490/YARN-974.2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1594//console

This message is automatically generated.

 RMContainer should collection more useful information to be recorded
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-974.1.patch, YARN-974.2.patch


 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-26 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721439#comment-13721439
 ] 

Luke Lu commented on YARN-347:
--

The patch lgtm. +1. Will commmit shortly.

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721443#comment-13721443
 ] 

Junping Du commented on YARN-347:
-

Thanks Luke for review!

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721446#comment-13721446
 ] 

Vinod Kumar Vavilapalli commented on YARN-347:
--

For the memory values, 0MB instead of 0M ?

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-948:
---

Attachment: YARN-948-20130726.1.patch

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721450#comment-13721450
 ] 

Omkar Vinit Joshi commented on YARN-948:


Thanks [~vinodkv]..
bq. Ask can definitely be not per resource-request as there are implicit 
dependencies between host and rack requests. For blacklist and release, there 
are no dependencies, so handling them individually is fine. But I'm okay doing 
it like you have mainly because reporting part partial successes and failures 
is more API changes. sigh
yeah... :D

bq. Yeah, let's move'em if they aren't really tied to any scheduler.
moved all three to RMServerUtils

bq. It is clearly not InvalidResourceRequest, so let's create a new one and add 
documentation too.
yeah ...created one.. is InvalidContainerReleaseRequestException good?

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721454#comment-13721454
 ] 

Junping Du commented on YARN-347:
-

So we replace all 'M' to 'MB' here? Sounds reasonable to me.

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-347:


Attachment: YARN-347-v4.patch

Thanks for review. Vinod! v4 patch incorporate your comments. Please help to 
review it again. Thx!

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch, 
 YARN-347-v4.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-347) YARN node CLI should also show CPU info besides memory info in node status

2013-07-26 Thread Luke Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu updated YARN-347:
-

Summary: YARN node CLI should also show CPU info besides memory info in 
node status  (was: YARN node CLI should also show CPU info as memory info in 
node status)

 YARN node CLI should also show CPU info besides memory info in node status
 --

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch, 
 YARN-347-v4.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721461#comment-13721461
 ] 

Bikas Saha commented on YARN-948:
-

Where is appAttemptId coming from? The token? If its coming from the client 
request object itself then can client send different app's attempt id and 
matching container ids?

InvalidContainerReleaseException sounds better to me.

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-347) YARN CLI should show CPU info besides memory info in node status

2013-07-26 Thread Luke Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu updated YARN-347:
-

Summary: YARN CLI should show CPU info besides memory info in node status  
(was: YARN node CLI should also show CPU info besides memory info in node 
status)

 YARN CLI should show CPU info besides memory info in node status
 

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch, 
 YARN-347-v4.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721466#comment-13721466
 ] 

Omkar Vinit Joshi commented on YARN-948:


bq. Where is appAttemptId coming from? The token? If its coming from the client 
request object itself then can client send different app's attempt id and 
matching container ids?

No ...we have removed appAttemptId now from request... it is from AMRMToken 
(auth method retrieves and returns it).

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN CLI should show CPU info besides memory info in node status

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721467#comment-13721467
 ] 

Vinod Kumar Vavilapalli commented on YARN-347:
--

+1, pending Jenkins.

 YARN CLI should show CPU info besides memory info in node status
 

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch, 
 YARN-347-v4.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721471#comment-13721471
 ] 

Hadoop QA commented on YARN-948:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12594493/YARN-948-20130726.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1595//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1595//console

This message is automatically generated.

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN CLI should show CPU info besides memory info in node status

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721474#comment-13721474
 ] 

Hadoop QA commented on YARN-347:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594494/YARN-347-v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1596//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1596//console

This message is automatically generated.

 YARN CLI should show CPU info besides memory info in node status
 

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch, 
 YARN-347-v4.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN CLI should show CPU info besides memory info in node status

2013-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721482#comment-13721482
 ] 

Junping Du commented on YARN-347:
-

The test failure is unrelated as previous discussion (YARN-906 address this).

 YARN CLI should show CPU info besides memory info in node status
 

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch, YARN-347-v3.patch, 
 YARN-347-v4.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-84) Use Builder to get RPC server in YARN

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-84:
---

Attachment: YARN-84-branch2.patch

Backport to branch2 as blocking HADOOP-9756 which remove RPC.getServer() 
completely. Brandon, would you help to review it? Thx!

 Use Builder to get RPC server in YARN
 -

 Key: YARN-84
 URL: https://issues.apache.org/jira/browse/YARN-84
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Brandon Li
Assignee: Brandon Li
Priority: Minor
 Fix For: 3.0.0

 Attachments: MAPREDUCE-4628.patch, YARN-84-branch2.patch


 In HADOOP-8736, a Builder is introduced to replace all the getServer() 
 variants. This JIRA is the change in YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them

2013-07-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721486#comment-13721486
 ] 

Sandy Ryza commented on YARN-948:
-

+1 to InvalidContainerReleaseException

 RM should validate the release container list before actually releasing them
 

 Key: YARN-948
 URL: https://issues.apache.org/jira/browse/YARN-948
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch


 At present we are blinding passing the allocate request containing containers 
 to be released to the scheduler. This may result into one application 
 releasing another application's container.
 {code}
   @Override
   @Lock(Lock.NoLock.class)
   public Allocation allocate(ApplicationAttemptId applicationAttemptId,
   ListResourceRequest ask, ListContainerId release, 
   ListString blacklistAdditions, ListString blacklistRemovals) {
 FiCaSchedulerApp application = getApplication(applicationAttemptId);
 
 
 // Release containers
 for (ContainerId releasedContainerId : release) {
   RMContainer rmContainer = getRMContainer(releasedContainerId);
   if (rmContainer == null) {
  RMAuditLogger.logFailure(application.getUser(),
  AuditConstants.RELEASE_CONTAINER, 
  Unauthorized access or invalid container, CapacityScheduler,
  Trying to release container not owned by app or with invalid 
 id,
  application.getApplicationId(), releasedContainerId);
   }
   completedContainer(rmContainer,
   SchedulerUtils.createAbnormalContainerStatus(
   releasedContainerId, 
   SchedulerUtils.RELEASED_CONTAINER),
   RMContainerEventType.RELEASED);
 }
 {code}
 Current checks are not sufficient and we should prevent this. thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-934) Defining Writing Interface of HistoryStorage

2013-07-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-934:
-

Attachment: YARN-934.3.patch

Thanks Vinod for the comments. Updated the patch to make a pure interface. Pls 
review again.

 Defining Writing Interface of HistoryStorage
 

 Key: YARN-934
 URL: https://issues.apache.org/jira/browse/YARN-934
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-934.1.patch, YARN-934.2.patch, YARN-934.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-934) Defining Writing Interface of HistoryStorage

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721502#comment-13721502
 ] 

Hadoop QA commented on YARN-934:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594508/YARN-934.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1597//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1597//console

This message is automatically generated.

 Defining Writing Interface of HistoryStorage
 

 Key: YARN-934
 URL: https://issues.apache.org/jira/browse/YARN-934
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-934.1.patch, YARN-934.2.patch, YARN-934.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira