date:20130906


[ 
https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760042#comment-13760042
 ] 

Tsuyoshi OZAWA commented on YARN-1059:
--

Threw a patch to HADOOP-9869.

 '\n' or ' ' or '\t' should be ignored for some configuration parameters
 ---

 Key: YARN-1059
 URL: https://issues.apache.org/jira/browse/YARN-1059
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
 Environment: Ubuntu 12.04, hadoop 2.0.5
Reporter: rvller
Priority: Minor
  Labels: newbie

 Here is the traceback while starting the yarn resourse manager:
 2013-08-12 12:53:29,319 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.lang.IllegalArgumentException: Does not contain a valid host:port 
 authority: 
 10.245.1.30:9030
  (configuration property 'yarn.resourcemanager.resource-tracker.address')
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193)
   at 
 org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105)
   at 
 org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)
 And here is the yarn-site.xml:
 configuration
 property
 name
 yarn.resourcemanager.address
 /name
 value
 10.245.1.30:9010
 /value
 description
 /description
 /property
 property
 name
 yarn.resourcemanager.scheduler.address
 /name
 value
 10.245.1.30:9020
 /value
 description
 /description
 /property
 property
 name
 yarn.resourcemanager.resource-tracker.address
 /name
 value
 10.245.1.30:9030
 /value
 description
 /description
 /property
 property
 name
 yarn.resourcemanager.admin.address
 /name
 value
 10.245.1.30:9040
 /value
 description
 /description
 /property
 property
 name
 yarn.resourcemanager.webapp.address
 /name
 value
 10.245.1.30:9050
 /value
 description
 /description
 /property
 !-- Site specific YARN configuration properties --
 /configuration

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1132) QueueMetrics.java has wrong comments


[ 
https://issues.apache.org/jira/browse/YARN-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760045#comment-13760045
 ] 

Tsuyoshi OZAWA commented on YARN-1132:
--

Then, this should be closed as duplicated jira of YARN-1090?

 QueueMetrics.java has wrong comments
 

 Key: YARN-1132
 URL: https://issues.apache.org/jira/browse/YARN-1132
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Priority: Minor
  Labels: newbie

 I found o.a.h.yarn.server.resourcemanager.scheduler.QueueMetrics.java has 
 wrong comments
 {code}
   @Metric(# of reserved memory in MB) MutableGaugeInt reservedMB;
   @Metric(# of active users) MutableGaugeInt activeApplications;
 {code}
 they should be fixed as follows:
 {code}
   @Metric(Reserved memory in MB) MutableGaugeInt reservedMB;
   @Metric(# of active applications) MutableGaugeInt activeApplications;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-910) Allow auxiliary services to listen for container starts and completions

[
https://issues.apache.org/jira/browse/YARN-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Abdelnur updated YARN-910:

Attachment: YARN-910.patch

Thanks Sandy Vinod. In the latest patch I've took care of all the changes
except the following.

bq. Split AuxServicesEvent into a AuxServicesAppEvent and
AuxServicesContainerEvent ? Don't like nulls like that.

The patch is only adding a new property to the event, container, which is NULL
for App events. All the other NULLs where already there.

Regardless, I've tried refactoring AuxServicesEvent into a AuxServicesAppEvent
and AuxServicesContainerEvent. But the patch gets much bigger as the necessary
changes are not just different names but the way the AuxiliaryServices
handle() would take care of these 2 events. We should introduce a parent event
class for those.

I'd prefer, if you still want to do this break up, to do it as part of another
JIRA which only does the refactoring, without adding new functionality.

Allow auxiliary services to listen for container starts and completions
---

Key: YARN-910
URL: https://issues.apache.org/jira/browse/YARN-910
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Alejandro Abdelnur
Attachments: YARN-910.patch, YARN-910.patch, YARN-910.patch

Making container start and completion events available to auxiliary services
would allow them to be resource-aware. The auxiliary service would be able
to notify a co-located service that is opportunistically using free capacity
of allocation changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

Alejandro Abdelnur created YARN-1159:


 Summary: NodeManager reports  Invalid event: 
CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
 Key: YARN-1159
 URL: https://issues.apache.org/jira/browse/YARN-1159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.1.1-beta


When running MR PI, which runs successfully, the NM log reports:

{code}
2013-09-06 11:45:29,368 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
status for container: container_id { app_attempt_id { application_id { id: 5 
cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING 
diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000
2013-09-06 11:45:29,390 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
from container container_1378450335207_0005_01_04 is : 143
2013-09-06 11:45:29,425 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1378450335207_0005_01_04 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL
2013-09-06 11:45:29,426 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Can't handle this event at current state: Current: 
[CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
at java.lang.Thread.run(Thread.java:722)
2013-09-06 11:45:29,426 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1378450335207_0005_01_04 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to null
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-910) Allow auxiliary services to listen for container starts and completions


[ 
https://issues.apache.org/jira/browse/YARN-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760107#comment-13760107
 ] 

Hadoop QA commented on YARN-910:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601802/YARN-910.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1854//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1854//console

This message is automatically generated.

 Allow auxiliary services to listen for container starts and completions
 ---

 Key: YARN-910
 URL: https://issues.apache.org/jira/browse/YARN-910
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Alejandro Abdelnur
 Attachments: YARN-910.patch, YARN-910.patch, YARN-910.patch


 Making container start and completion events available to auxiliary services 
 would allow them to be resource-aware.  The auxiliary service would be able 
 to notify a co-located service that is opportunistically using free capacity 
 of allocation changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips


[ 
https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760128#comment-13760128
 ] 

Steve Loughran commented on YARN-1155:
--

This would need to work for clusters where DNS doesn't resolve the hostnames, 
but instead /etc/hosts does the work. This is a not unusual setup in 
virtualized clusters when the VMs are on a virtual subnet

 RM should resolve hostnames/ips in include/exclude files to support matching 
 against both hostnames and ips
 ---

 Key: YARN-1155
 URL: https://issues.apache.org/jira/browse/YARN-1155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 RM should be able to resolve both ips and host names from include and exclude 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-1160) allow admins to force app deployment on a specific host


 [ 
https://issues.apache.org/jira/browse/YARN-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved MAPREDUCE-4277 to YARN-1160:
-

  Component/s: (was: mrv2)
   resourcemanager
Affects Version/s: (was: trunk)
   (was: 2.0.0-alpha)
   3.0.0
  Key: YARN-1160  (was: MAPREDUCE-4277)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 allow admins to force app deployment on a specific host
 ---

 Key: YARN-1160
 URL: https://issues.apache.org/jira/browse/YARN-1160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Priority: Minor

 Currently you ask YARN to get slots on a host and it finds a slot on that 
 machine -or, if unavailable or there is no room, on a host nearby as far as 
 the topology is concerned.
 People with admin rights should have the option to deploy a process on a 
 specific host and have it run there even if there are no free slots -and to 
 fail if the machine is not available. This would let you deploy 
 admin-specific process across a cluster. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

[
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760146#comment-13760146
]

Steve Loughran commented on YARN-1151:
--

it gets more complex once you add in failure handling: how does the NM react to
the aux service process failing? How does the NM shut it down? where do the
logs go? Who does it run as?

We effectively do have a system for doing this, it is called YARN. What sounds
needed here is a way to tell *something* that an NM has started and then give
it the option of creating and deploying a container on it.

That something should, obviously, be a YARN app itself, since they are set up
to build up command lines, copy in JARs, handle failures, etc.

What we don't have is
# anything that starts a specific long-lived YARN AM service on cluster
startup.
# a way for an AM to list all the hosts and demand a container on every one,
irrespective of what is already there. (actually you could probably do it by
asking for 0 RAM and vhosts, but the min resource config options are designed
to stop users doing this). YARN-1160 covers that problem

Ability to configure auxiliary services from HDFS-based JAR files
-

Key: YARN-1151
URL: https://issues.apache.org/jira/browse/YARN-1151
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: john lilley
Priority: Minor
Labels: auxiliary-service, yarn

I would like to install an auxiliary service in Hadoop YARN without actually
installing files/services on every node in the system. Discussions on the
user@ list indicate that this is not easily done. The reason we want an
auxiliary service is that our application has some persistent-data components
that are not appropriate for HDFS. In fact, they are somewhat analogous to
the mapper output of MapReduce's shuffle, which is what led me to
auxiliary-services in the first place. It would be much easier if we could
just place our service's JARs in HDFS.

[jira] [Updated] (YARN-1049) ContainerExistStatus should define a status for preempted containers

[
https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Abdelnur updated YARN-1049:
-

Description:
With the current behavior is impossible to determine if a container has been
preempted or lost due to a NM crash.

Adding a PREEMPTED exit status (-102) will help an AM determine that a
container has been preempted.

Note the change of scope from the original summary/description. The original
scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm
reducing the scope of this JIRA.

was:
ContainerExitStatus defines a few constant with special exit status values
(0,-1000, -100, -101). This is incorrect, we should not define any special
constants and limit to return the actual process exist status code.

ContainerState should include PREEMPTED (when preempted by YARN), LOST (when
the NM crashes).

With the current behavior is impossible to determine if a container has been
preempted or lost due to a NM crash.

Marking it as a blocker for 2.1.0 as this is an API/behavior change.

Fix Version/s: (was: 2.3.0)
2.1.1-beta
Assignee: Alejandro Abdelnur
Summary: ContainerExistStatus should define a status for preempted
containers (was: ContainerExistStatus and ContainerState are defined
incorrectly)

ContainerExistStatus should define a status for preempted containers

Key: YARN-1049
URL: https://issues.apache.org/jira/browse/YARN-1049
Project: Hadoop YARN
Issue Type: Bug
Components: api
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
Fix For: 2.1.1-beta

With the current behavior is impossible to determine if a container has been
preempted or lost due to a NM crash.
Adding a PREEMPTED exit status (-102) will help an AM determine that a
container has been preempted.
Note the change of scope from the original summary/description. The original
scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm
reducing the scope of this JIRA.

[jira] [Updated] (YARN-1049) ContainerExistStatus should define a status for preempted containers


 [ 
https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1049:
-

Attachment: YARN-1049.patch

 ContainerExistStatus should define a status for preempted containers
 

 Key: YARN-1049
 URL: https://issues.apache.org/jira/browse/YARN-1049
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: YARN-1049.patch


 With the current behavior is impossible to determine if a container has been 
 preempted or lost due to a NM crash.
 Adding a PREEMPTED exit status (-102) will help an AM determine that a 
 container has been preempted.
 Note the change of scope from the original summary/description. The original 
 scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm 
 reducing the scope of this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1049) ContainerExistStatus should define a status for preempted containers


[ 
https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760172#comment-13760172
 ] 

Hadoop QA commented on YARN-1049:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601813/YARN-1049.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1855//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1855//console

This message is automatically generated.

 ContainerExistStatus should define a status for preempted containers
 

 Key: YARN-1049
 URL: https://issues.apache.org/jira/browse/YARN-1049
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: YARN-1049.patch


 With the current behavior is impossible to determine if a container has been 
 preempted or lost due to a NM crash.
 Adding a PREEMPTED exit status (-102) will help an AM determine that a 
 container has been preempted.
 Note the change of scope from the original summary/description. The original 
 scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm 
 reducing the scope of this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1161) branch-2.1-beta compilation fails

2013-09-06 Thread Devaraj K (JIRA)

Devaraj K created YARN-1161:
---

 Summary: branch-2.1-beta compilation fails
 Key: YARN-1161
 URL: https://issues.apache.org/jira/browse/YARN-1161
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Blocker


{code:xml}
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.
5.1:testCompile (default-testCompile) on project hadoop-yarn-server-resourcemana
ger: Compilation failure
[ERROR] D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yar
n-server\hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn
\server\resourcemanager\MockRM.java:[238,8] cannot find symbol
[ERROR] symbol  : constructor MockNM(java.lang.String,int,int,org.apache.hadoop.
yarn.server.resourcemanager.ResourceTrackerService)
[ERROR] location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM
[ERROR] - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal o
rg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCom
pile) on project hadoop-yarn-server-resourcemanager: Compilation failure
D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server
\hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn\server\
resourcemanager\MockRM.java:[238,8] cannot find symbol
symbol  : constructor MockNM(java.lang.String,int,int,org.apache.hadoop.yarn.ser
ver.resourcemanager.ResourceTrackerService)
location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1161) branch-2.1-beta compilation fails

2013-09-06 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-1161:


Attachment: YARN-1161.patch

 branch-2.1-beta compilation fails
 -

 Key: YARN-1161
 URL: https://issues.apache.org/jira/browse/YARN-1161
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Blocker
 Attachments: YARN-1161.patch


 {code:xml}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:2.
 5.1:testCompile (default-testCompile) on project 
 hadoop-yarn-server-resourcemana
 ger: Compilation failure
 [ERROR] 
 D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yar
 n-server\hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn
 \server\resourcemanager\MockRM.java:[238,8] cannot find symbol
 [ERROR] symbol  : constructor 
 MockNM(java.lang.String,int,int,org.apache.hadoop.
 yarn.server.resourcemanager.ResourceTrackerService)
 [ERROR] location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM
 [ERROR] - [Help 1]
 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
 goal o
 rg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
 (default-testCom
 pile) on project hadoop-yarn-server-resourcemanager: Compilation failure
 D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server
 \hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn\server\
 resourcemanager\MockRM.java:[238,8] cannot find symbol
 symbol  : constructor 
 MockNM(java.lang.String,int,int,org.apache.hadoop.yarn.ser
 ver.resourcemanager.ResourceTrackerService)
 location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1144) Unmanaged AMs registering a tracking URI should not be proxy-fied


 [ 
https://issues.apache.org/jira/browse/YARN-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1144:
-

Attachment: YARN-1144.patch

 Unmanaged AMs registering a tracking URI should not be proxy-fied
 -

 Key: YARN-1144
 URL: https://issues.apache.org/jira/browse/YARN-1144
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.1.1-beta

 Attachments: YARN-1144.patch


 Unmanaged AMs do not run in the cluster, their tracking URL should not be 
 proxy-fied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1161) branch-2.1-beta compilation fails


[ 
https://issues.apache.org/jira/browse/YARN-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760202#comment-13760202
 ] 

Hadoop QA commented on YARN-1161:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601822/YARN-1161.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1857//console

This message is automatically generated.

 branch-2.1-beta compilation fails
 -

 Key: YARN-1161
 URL: https://issues.apache.org/jira/browse/YARN-1161
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Blocker
 Attachments: YARN-1161.patch


 {code:xml}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:2.
 5.1:testCompile (default-testCompile) on project 
 hadoop-yarn-server-resourcemana
 ger: Compilation failure
 [ERROR] 
 D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yar
 n-server\hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn
 \server\resourcemanager\MockRM.java:[238,8] cannot find symbol
 [ERROR] symbol  : constructor 
 MockNM(java.lang.String,int,int,org.apache.hadoop.
 yarn.server.resourcemanager.ResourceTrackerService)
 [ERROR] location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM
 [ERROR] - [Help 1]
 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
 goal o
 rg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
 (default-testCom
 pile) on project hadoop-yarn-server-resourcemanager: Compilation failure
 D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server
 \hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn\server\
 resourcemanager\MockRM.java:[238,8] cannot find symbol
 symbol  : constructor 
 MockNM(java.lang.String,int,int,org.apache.hadoop.yarn.ser
 ver.resourcemanager.ResourceTrackerService)
 location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-09-06 Thread Trevor Lorimer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: YARN-696.diff

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, 
 YARN-696.diff, YARN-696.diff, YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working

2013-09-06 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760205#comment-13760205
 ] 

Thomas Graves commented on YARN-1153:
-

what are the rest of your queue settings?  With one user the user limit factor 
comes into affect. 

http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

 CapacityScheduler queue elasticity is not working
 -

 Key: YARN-1153
 URL: https://issues.apache.org/jira/browse/YARN-1153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 Configured 2 queues one with 25% capacity and the other with 75% capacity,
 and both has 100% max-capacity.
 Submit only 1 application to whichever queue. Ideally, it should take use of 
 100% cluster's resources, but it's not.
 Tested this on single node cluster using DefaultResourceCalculator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-09-06 Thread Trevor Lorimer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: YARN-696.diff

An invalid application state now throws a BadRequestException.

I went for the message Invalid application-state INVALID_test specified. It 
should be one of [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHING, 
FINISHED, FAILED, KILLED]

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, 
 YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1160) allow admins to force app deployment on a specific host


[ 
https://issues.apache.org/jira/browse/YARN-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760208#comment-13760208
 ] 

Alejandro Abdelnur commented on YARN-1160:
--

You can already ask for container exactly on a specific node setting 
relaxLocality to FALSE in the ResourceRequest. Though, this does not allow you 
to get a container if there is no capacity in the node.

 allow admins to force app deployment on a specific host
 ---

 Key: YARN-1160
 URL: https://issues.apache.org/jira/browse/YARN-1160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Priority: Minor

 Currently you ask YARN to get slots on a host and it finds a slot on that 
 machine -or, if unavailable or there is no room, on a host nearby as far as 
 the topology is concerned.
 People with admin rights should have the option to deploy a process on a 
 specific host and have it run there even if there are no free slots -and to 
 fail if the machine is not available. This would let you deploy 
 admin-specific process across a cluster. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-09-06 Thread Trevor Lorimer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: YARN-696.diff

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, 
 YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1162) NM auxiliary service invocations should be try/catch

Alejandro Abdelnur created YARN-1162:


 Summary: NM auxiliary service invocations should be try/catch
 Key: YARN-1162
 URL: https://issues.apache.org/jira/browse/YARN-1162
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.1.1-beta


The {{AuxiliaryServices#handle()}} should try/catch all invocations of 
auxiliary services to isolate failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1144) Unmanaged AMs registering a tracking URI should not be proxy-fied


[ 
https://issues.apache.org/jira/browse/YARN-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760209#comment-13760209
 ] 

Hadoop QA commented on YARN-1144:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601823/YARN-1144.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1856//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1856//console

This message is automatically generated.

 Unmanaged AMs registering a tracking URI should not be proxy-fied
 -

 Key: YARN-1144
 URL: https://issues.apache.org/jira/browse/YARN-1144
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.1.1-beta

 Attachments: YARN-1144.patch


 Unmanaged AMs do not run in the cluster, their tracking URL should not be 
 proxy-fied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1160) allow admins to force app deployment on a specific host


[ 
https://issues.apache.org/jira/browse/YARN-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760210#comment-13760210
 ] 

Steve Loughran commented on YARN-1160:
--

-Yes, and if you don't get that container it just stays in the queue -no 
notification to the AM. This is about being able to force things in without 
that wait and irrespective of space

 allow admins to force app deployment on a specific host
 ---

 Key: YARN-1160
 URL: https://issues.apache.org/jira/browse/YARN-1160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Priority: Minor

 Currently you ask YARN to get slots on a host and it finds a slot on that 
 machine -or, if unavailable or there is no room, on a host nearby as far as 
 the topology is concerned.
 People with admin rights should have the option to deploy a process on a 
 specific host and have it run there even if there are no free slots -and to 
 fail if the machine is not available. This would let you deploy 
 admin-specific process across a cluster. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call


[ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760216#comment-13760216
 ] 

Hadoop QA commented on YARN-696:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601828/YARN-696.diff
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1858//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1858//console

This message is automatically generated.

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, 
 YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1163) Cleanup code for AssignMapsWithLocality() in RMContainerAllocator

2013-09-06 Thread Junping Du (JIRA)

Junping Du created YARN-1163:


 Summary: Cleanup code for AssignMapsWithLocality() in 
RMContainerAllocator
 Key: YARN-1163
 URL: https://issues.apache.org/jira/browse/YARN-1163
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications
Reporter: Junping Du
Assignee: Junping Du
Priority: Minor


In RMContainerAllocator, AssignMapsWithLocality() is a very important method to 
assign map tasks on allocated containers with conforming different level of 
locality (dataLocal, rackLocal, etc.). However, this method messed with 
different code logic to handle different type of locality but have lots of 
similar behaviours. This is hard to maintain as well as do extension with other 
locality type, so we need some more clear code here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1164) maven Junit dependency should be test only

Steve Loughran created YARN-1164:


 Summary: maven Junit dependency should be test only
 Key: YARN-1164
 URL: https://issues.apache.org/jira/browse/YARN-1164
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Priority: Minor


The maven dependencies for the YARN artifacts don't restrict to test time, so 
it gets picked up by all downstream users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1164) maven Junit dependency should be test only


 [ 
https://issues.apache.org/jira/browse/YARN-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-1164:
-

Attachment: HADOOP-9935-001.patch

Patch from André Kelpe for HADOOP-9935; this JIRA is to test the YARN section

 maven Junit dependency should be test only
 --

 Key: YARN-1164
 URL: https://issues.apache.org/jira/browse/YARN-1164
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Priority: Minor
 Attachments: HADOOP-9935-001.patch


 The maven dependencies for the YARN artifacts don't restrict to test time, so 
 it gets picked up by all downstream users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1152) Invalid key to HMAC computation error when getting application report for completed app attempt

2013-09-06 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1152:
-

 Target Version/s: 2.1.1-beta  (was: 0.23.10, 2.1.1-beta)
Affects Version/s: (was: 0.23.10)

Turns out this does not affect 0.23 because master keys are created per app 
instead of app-attempt and not removed.

 Invalid key to HMAC computation error when getting application report for 
 completed app attempt
 ---

 Key: YARN-1152
 URL: https://issues.apache.org/jira/browse/YARN-1152
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-1152.txt


 On a secure cluster, an invalid key to HMAC error is thrown when trying to 
 get an application report for an application with an attempt that has 
 unregistered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL


[ 
https://issues.apache.org/jira/browse/YARN-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760338#comment-13760338
 ] 

Tsuyoshi OZAWA commented on YARN-1159:
--

Should we change the event CONTAINER_KILLED_ON_REQUEST at the state 
CONTAINER_CLEANEDUP_AFTER_KILL to be acceptable?

 NodeManager reports  Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1159
 URL: https://issues.apache.org/jira/browse/YARN-1159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.1.1-beta


 When running MR PI, which runs successfully, the NM log reports:
 {code}
 2013-09-06 11:45:29,368 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
 status for container: container_id { app_attempt_id { application_id { id: 5 
 cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING 
 diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000
 2013-09-06 11:45:29,390 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
 from container container_1378450335207_0005_01_04 is : 143
 2013-09-06 11:45:29,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1378450335207_0005_01_04 transitioned from KILLING 
 to CONTAINER_CLEANEDUP_AFTER_KILL
 2013-09-06 11:45:29,426 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: 
 [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
   at java.lang.Thread.run(Thread.java:722)
 2013-09-06 11:45:29,426 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1378450335207_0005_01_04 transitioned from 
 CONTAINER_CLEANEDUP_AFTER_KILL to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-09-06 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1159.
---

Resolution: Duplicate

It's a bug, and was reported before: YARN-1070. Will fix there.

 NodeManager reports  Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1159
 URL: https://issues.apache.org/jira/browse/YARN-1159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.1.1-beta


 When running MR PI, which runs successfully, the NM log reports:
 {code}
 2013-09-06 11:45:29,368 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
 status for container: container_id { app_attempt_id { application_id { id: 5 
 cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING 
 diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000
 2013-09-06 11:45:29,390 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
 from container container_1378450335207_0005_01_04 is : 143
 2013-09-06 11:45:29,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1378450335207_0005_01_04 transitioned from KILLING 
 to CONTAINER_CLEANEDUP_AFTER_KILL
 2013-09-06 11:45:29,426 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: 
 [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
   at java.lang.Thread.run(Thread.java:722)
 2013-09-06 11:45:29,426 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1378450335207_0005_01_04 transitioned from 
 CONTAINER_CLEANEDUP_AFTER_KILL to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-609) Fix synchronization issues in APIs which take in lists


 [ 
https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-609:
---

Attachment: YARN-609.6.patch

 Fix synchronization issues in APIs which take in lists
 --

 Key: YARN-609
 URL: https://issues.apache.org/jira/browse/YARN-609
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, 
 YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch


 Some of the APIs take in lists and the setter-APIs don't always do proper 
 synchronization. We need to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working


[ 
https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760408#comment-13760408
 ] 

Jian He commented on YARN-1153:
---

Exactly, the single user limit is throttling it

 CapacityScheduler queue elasticity is not working
 -

 Key: YARN-1153
 URL: https://issues.apache.org/jira/browse/YARN-1153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 Configured 2 queues one with 25% capacity and the other with 75% capacity,
 and both has 100% max-capacity.
 Submit only 1 application to whichever queue. Ideally, it should take use of 
 100% cluster's resources, but it's not.
 Tested this on single node cluster using DefaultResourceCalculator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-609) Fix synchronization issues in APIs which take in lists


[ 
https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760431#comment-13760431
 ] 

Hadoop QA commented on YARN-609:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601857/YARN-609.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1860//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1860//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1860//console

This message is automatically generated.

 Fix synchronization issues in APIs which take in lists
 --

 Key: YARN-609
 URL: https://issues.apache.org/jira/browse/YARN-609
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, 
 YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch


 Some of the APIs take in lists and the setter-APIs don't always do proper 
 synchronization. We need to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1152) Invalid key to HMAC computation error when getting application report for completed app attempt

2013-09-06 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760281#comment-13760281
 ] 

Jason Lowe commented on YARN-1152:
--

I also manually tested this on a secure cluster.  Proxy links and mapred job 
-list both worked after the job had completed, and the master key had been 
removed for the attempt.

 Invalid key to HMAC computation error when getting application report for 
 completed app attempt
 ---

 Key: YARN-1152
 URL: https://issues.apache.org/jira/browse/YARN-1152
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.10, 2.1.1-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-1152.txt


 On a secure cluster, an invalid key to HMAC error is thrown when trying to 
 get an application report for an application with an attempt that has 
 unregistered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760462#comment-13760462
 ] 

Hadoop QA commented on YARN-978:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601859/YARN-978.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1861//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1861//console

This message is automatically generated.

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1001) YARN should provide per application-type and state statistics

2013-09-06 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1001:
--

Attachment: YARN-1001.2.patch

Updated the patch against the latest trunk. In addition, polish the code, added 
the tests of empty params and invalid state, and added the document of the new 
rest API.

 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
 Attachments: YARN-1001.1.patch, YARN-1001.2.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working

2013-09-06 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760485#comment-13760485
 ] 

Thomas Graves commented on YARN-1153:
-

sorry so why is this a bug?  Its working as designed.  If we want to change 
that, we should make this an enhancement request.

 CapacityScheduler queue elasticity is not working
 -

 Key: YARN-1153
 URL: https://issues.apache.org/jira/browse/YARN-1153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 Configured 2 queues one with 25% capacity and the other with 75% capacity,
 and both has 100% max-capacity.
 Submit only 1 application to whichever queue. Ideally, it should take use of 
 100% cluster's resources, but it's not.
 Tested this on single node cluster using DefaultResourceCalculator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-609) Fix synchronization issues in APIs which take in lists


[ 
https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760487#comment-13760487
 ] 

Hadoop QA commented on YARN-609:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601867/YARN-609.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1862//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1862//console

This message is automatically generated.

 Fix synchronization issues in APIs which take in lists
 --

 Key: YARN-609
 URL: https://issues.apache.org/jira/browse/YARN-609
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, 
 YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch, YARN-609.7.patch, 
 YARN-609.8.patch


 Some of the APIs take in lists and the setter-APIs don't always do proper 
 synchronization. We need to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (YARN-1153) CapacityScheduler queue elasticity is not working

2013-09-06 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760485#comment-13760485
 ] 

Thomas Graves edited comment on YARN-1153 at 9/6/13 6:26 PM:
-

sorry so why is this a bug?  Its working as designed.  If we want to change 
that, we should make this an enhancement request.

the fix is change your config

  was (Author: tgraves):
sorry so why is this a bug?  Its working as designed.  If we want to change 
that, we should make this an enhancement request.
  
 CapacityScheduler queue elasticity is not working
 -

 Key: YARN-1153
 URL: https://issues.apache.org/jira/browse/YARN-1153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 Configured 2 queues one with 25% capacity and the other with 75% capacity,
 and both has 100% max-capacity.
 Submit only 1 application to whichever queue. Ideally, it should take use of 
 100% cluster's resources, but it's not.
 Tested this on single node cluster using DefaultResourceCalculator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.

[
https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli resolved YARN-957.
--

Resolution: Fixed

[~devaraj.k] opened YARN-1161. Closing this.

Capacity Scheduler tries to reserve the memory more than what node manager
reports.
---

Key: YARN-957
URL: https://issues.apache.org/jira/browse/YARN-957
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
Fix For: 2.1.1-beta

Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch,
YARN-957-20130730.3.patch, YARN-957-20130731.1.patch,
YARN-957-20130830.1.patch, YARN-957-20130904.1.patch,
YARN-957-20130904.2.patch

I have 2 node managers.
* one with 1024 MB memory.(nm1)
* second with 2048 MB memory.(nm2)
I am submitting simple map reduce application with 1 mapper and one reducer
with 1024mb each. The steps to reproduce this are
* stop nm2 with 2048MB memory.( This I am doing to make sure that this node's
heartbeat doesn't reach RM first).
* now submit application. As soon as it receives first node's (nm1) heartbeat
it will try to reserve memory for AM-container (2048MB). However it has only
1024MB of memory.
* now start nm2 with 2048 MB memory.
It hangs forever... Ideally this has two potential issues.
* It should not try to reserve memory on a node manager which is never going
to give requested memory. i.e. Current max capability of node manager is
1024MB but 2048MB is reserved on it. But it still does that.
* Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available
memory. In this case if the original request was made without any locality
then scheduler should unreserve memory on nm1 and allocate requested 2048MB
container on nm2.

[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable


[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760436#comment-13760436
 ] 

Omkar Vinit Joshi commented on YARN-713:


rebasing patch .. had missed one local commit.

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Critical
 Fix For: 2.3.0

 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, 
 YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-609) Fix synchronization issues in APIs which take in lists


 [ 
https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-609:
---

Attachment: YARN-609.8.patch

 Fix synchronization issues in APIs which take in lists
 --

 Key: YARN-609
 URL: https://issues.apache.org/jira/browse/YARN-609
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, 
 YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch, YARN-609.7.patch, 
 YARN-609.8.patch


 Some of the APIs take in lists and the setter-APIs don't always do proper 
 synchronization. We need to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-758) Augment MockNM to use multiple cores

2013-09-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760498#comment-13760498
 ] 

Hudson commented on YARN-758:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4379 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4379/])
Fixing CHANGES.txt for YARN-758 as it is now merged into branch-2.1-beta. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1520659)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Augment MockNM to use multiple cores
 

 Key: YARN-758
 URL: https://issues.apache.org/jira/browse/YARN-758
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.1.1-beta

 Attachments: yarn-758-1.patch, yarn-758-2.patch


 YARN-757 got fixed by changing the scheduler from Fair to default (which is 
 capacity).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable


[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760495#comment-13760495
 ] 

Hadoop QA commented on YARN-713:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12601865/YARN-713.09062013.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1863//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1863//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1863//console

This message is automatically generated.

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Critical
 Fix For: 2.3.0

 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, 
 YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760530#comment-13760530
 ] 

Hadoop QA commented on YARN-978:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601877/YARN-978.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1865//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1865//console

This message is automatically generated.

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-978:
---

Attachment: YARN-978.5.patch

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1098) Separate out RM services into Always On and Active


 [ 
https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1098:
---

Attachment: yarn-1098-4.patch

Updated patch to fix javadoc and test failures.

 Separate out RM services into Always On and Active
 --

 Key: YARN-1098
 URL: https://issues.apache.org/jira/browse/YARN-1098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1098-1.patch, yarn-1098-2.patch, yarn-1098-3.patch, 
 yarn-1098-4.patch, yarn-1098-approach.patch, yarn-1098-approach.patch


 From discussion on YARN-1027, it makes sense to separate out services that 
 are stateful and stateless. The stateless services can  run perennially 
 irrespective of whether the RM is in Active/Standby state, while the stateful 
 services need to  be started on transitionToActive() and completely shutdown 
 on transitionToStandby().
 The external-facing stateless services should respond to the client/AM/NM 
 requests depending on whether the RM is Active/Standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

[
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760582#comment-13760582
]

Karthik Kambatla commented on YARN-1027:

Thanks for the detailed review, [~bikas].

bq. What are the pros of making haState a member of ResourceManager instead of
HAServiceProtocol? A pro of the latter is that it keeps all HA stuff in one
place.
In the future, when individual external-facing services need to behave based on
the HAState, having it in the RM might be useful. However, I think we should
move it to RMHAProtocolService now, and move it to the RM or RMContext lazily.

bq. Why is there a lock used in ResourceManager.startActive() etc. Why are
these methods protected. If testing, then lets add an @visiblefortesting
annotation.
The lock is to protect against concurrent invocations of transitionToActive()
and transitionToStandby() due to say user input. The methods are protected
because they are being accessed from outside the RM - in this case,
RMHAProtocolService.

bq. Is there a way to confirm that the active service objects are all being
GC'd?
Not sure of a deterministic test. How about using Runtime.memory methods to
measure memory usage before and after transitioning to Active and subsequently
Standby?
I can jmap a real RM on a pseudo-dist cluster and see if they are being cleaned
up.

bq. Didnt quite get this comment. Is this do with change being requested by
user/admin/ZKFC?
If automatic failover is enabled and a user issues a transition command, it
should take effect only when it is forced.

Agree with remaining comments. Will fix it in the next version.

Implement RMHAServiceProtocol
-

Key: YARN-1027
URL: https://issues.apache.org/jira/browse/YARN-1027
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
Attachments: test-yarn-1027.patch, yarn-1027-1.patch,
yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch,
yarn-1027-in-rm-poc.patch

Implement existing HAServiceProtocol from Hadoop common. This protocol is the
single point of interaction between the RM and HA clients/services.

[jira] [Commented] (YARN-1132) QueueMetrics.java has wrong comments

2013-09-06 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760512#comment-13760512
 ] 

Akira AJISAKA commented on YARN-1132:
-

Thanks for your comment, I'll close this issue as duplicated.

 QueueMetrics.java has wrong comments
 

 Key: YARN-1132
 URL: https://issues.apache.org/jira/browse/YARN-1132
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Priority: Minor
  Labels: newbie

 I found o.a.h.yarn.server.resourcemanager.scheduler.QueueMetrics.java has 
 wrong comments
 {code}
   @Metric(# of reserved memory in MB) MutableGaugeInt reservedMB;
   @Metric(# of active users) MutableGaugeInt activeApplications;
 {code}
 they should be fixed as follows:
 {code}
   @Metric(Reserved memory in MB) MutableGaugeInt reservedMB;
   @Metric(# of active applications) MutableGaugeInt activeApplications;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active


[ 
https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760567#comment-13760567
 ] 

Karthik Kambatla commented on YARN-1098:


Haven't added any tests because the patch just reorganizes code and doesn't 
change functionality. Existing tests should expose any problems. In fact, not 
having to modify anything else shows the changes are transparent to the users 
of ResourceManager.



 Separate out RM services into Always On and Active
 --

 Key: YARN-1098
 URL: https://issues.apache.org/jira/browse/YARN-1098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1098-1.patch, yarn-1098-2.patch, yarn-1098-3.patch, 
 yarn-1098-4.patch, yarn-1098-approach.patch, yarn-1098-approach.patch


 From discussion on YARN-1027, it makes sense to separate out services that 
 are stateful and stateless. The stateless services can  run perennially 
 irrespective of whether the RM is in Active/Standby state, while the stateful 
 services need to  be started on transitionToActive() and completely shutdown 
 on transitionToStandby().
 The external-facing stateless services should respond to the client/AM/NM 
 requests depending on whether the RM is Active/Standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-1153) CapacityScheduler queue elasticity is not working


 [ 
https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-1153.
---

Resolution: Invalid

 CapacityScheduler queue elasticity is not working
 -

 Key: YARN-1153
 URL: https://issues.apache.org/jira/browse/YARN-1153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 Configured 2 queues one with 25% capacity and the other with 75% capacity,
 and both has 100% max-capacity.
 Submit only 1 application to whichever queue. Ideally, it should take use of 
 100% cluster's resources, but it's not.
 Tested this on single node cluster using DefaultResourceCalculator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-1132) QueueMetrics.java has wrong comments

2013-09-06 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA resolved YARN-1132.
-

Resolution: Duplicate

 QueueMetrics.java has wrong comments
 

 Key: YARN-1132
 URL: https://issues.apache.org/jira/browse/YARN-1132
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Priority: Minor
  Labels: newbie

 I found o.a.h.yarn.server.resourcemanager.scheduler.QueueMetrics.java has 
 wrong comments
 {code}
   @Metric(# of reserved memory in MB) MutableGaugeInt reservedMB;
   @Metric(# of active users) MutableGaugeInt activeApplications;
 {code}
 they should be fixed as follows:
 {code}
   @Metric(Reserved memory in MB) MutableGaugeInt reservedMB;
   @Metric(# of active applications) MutableGaugeInt activeApplications;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working


[ 
https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760510#comment-13760510
 ] 

Jian He commented on YARN-1153:
---

yh, I'm closing it. thx for clarifying 

 CapacityScheduler queue elasticity is not working
 -

 Key: YARN-1153
 URL: https://issues.apache.org/jira/browse/YARN-1153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 Configured 2 queues one with 25% capacity and the other with 75% capacity,
 and both has 100% max-capacity.
 Submit only 1 application to whichever queue. Ideally, it should take use of 
 100% cluster's resources, but it's not.
 Tested this on single node cluster using DefaultResourceCalculator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics


[ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760507#comment-13760507
 ] 

Hadoop QA commented on YARN-1001:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601872/YARN-1001.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1864//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1864//console

This message is automatically generated.

 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
 Attachments: YARN-1001.1.patch, YARN-1001.2.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active


[ 
https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760562#comment-13760562
 ] 

Hadoop QA commented on YARN-1098:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601883/yarn-1098-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1866//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1866//console

This message is automatically generated.

 Separate out RM services into Always On and Active
 --

 Key: YARN-1098
 URL: https://issues.apache.org/jira/browse/YARN-1098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1098-1.patch, yarn-1098-2.patch, yarn-1098-3.patch, 
 yarn-1098-4.patch, yarn-1098-approach.patch, yarn-1098-approach.patch


 From discussion on YARN-1027, it makes sense to separate out services that 
 are stateful and stateless. The stateless services can  run perennially 
 irrespective of whether the RM is in Active/Standby state, while the stateful 
 services need to  be started on transitionToActive() and completely shutdown 
 on transitionToStandby().
 The external-facing stateless services should respond to the client/AM/NM 
 requests depending on whether the RM is Active/Standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-978:
---

Attachment: YARN-978.6.patch

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips


[ 
https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760632#comment-13760632
 ] 

Xuan Gong commented on YARN-1155:
-

Verified that we have test case to test this logic
{code}
// To test that IPs also work
String ip = NetUtils.normalizeHostName(localhost);
writeToHostsFile(host1, ip);

rm.getNodesListManager().refreshNodes(conf);

nodeHeartbeat = nm1.nodeHeartbeat(true);
Assert.assertTrue(NodeAction.NORMAL.equals(nodeHeartbeat.getNodeAction()));
Assert
.assertEquals(0, ClusterMetrics.getMetrics().getNumDecommisionedNMs());
{code}

 RM should resolve hostnames/ips in include/exclude files to support matching 
 against both hostnames and ips
 ---

 Key: YARN-1155
 URL: https://issues.apache.org/jira/browse/YARN-1155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 RM should be able to resolve both ips and host names from include and exclude 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1162) NM auxiliary service invocations should be try/catch

2013-09-06 Thread Roman Shaposhnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik reassigned YARN-1162:
--

Assignee: Roman Shaposhnik

 NM auxiliary service invocations should be try/catch
 

 Key: YARN-1162
 URL: https://issues.apache.org/jira/browse/YARN-1162
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Critical
 Fix For: 2.1.1-beta


 The {{AuxiliaryServices#handle()}} should try/catch all invocations of 
 auxiliary services to isolate failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips


 [ 
https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-1155.
-

Resolution: Invalid

 RM should resolve hostnames/ips in include/exclude files to support matching 
 against both hostnames and ips
 ---

 Key: YARN-1155
 URL: https://issues.apache.org/jira/browse/YARN-1155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 RM should be able to resolve both ips and host names from include and exclude 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-1162) NM auxiliary service invocations should be try/catch


 [ 
https://issues.apache.org/jira/browse/YARN-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1162.
---

   Resolution: Duplicate
Fix Version/s: (was: 2.1.1-beta)

YARN-867 is already doing this, closing as duplicate.

 NM auxiliary service invocations should be try/catch
 

 Key: YARN-1162
 URL: https://issues.apache.org/jira/browse/YARN-1162
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Critical

 The {{AuxiliaryServices#handle()}} should try/catch all invocations of 
 auxiliary services to isolate failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips


[ 
https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760630#comment-13760630
 ] 

Xuan Gong commented on YARN-1155:
-

verified that we already have such logic:
{code}
  public boolean isValidNode(String hostName) {
synchronized (hostsReader) {
  SetString hostsList = hostsReader.getHosts();
  SetString excludeList = hostsReader.getExcludedHosts();
  String ip = NetUtils.normalizeHostName(hostName);
  return (hostsList.isEmpty() || hostsList.contains(hostName) || hostsList
  .contains(ip))
   !(excludeList.contains(hostName) || excludeList.contains(ip));
}
  }
{code}

 RM should resolve hostnames/ips in include/exclude files to support matching 
 against both hostnames and ips
 ---

 Key: YARN-1155
 URL: https://issues.apache.org/jira/browse/YARN-1155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 RM should be able to resolve both ips and host names from include and exclude 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()

Karthik Kambatla created YARN-1165:
--

 Summary: Move init() of activeServices to 
ResourceManager#serviceStart()
 Key: YARN-1165
 URL: https://issues.apache.org/jira/browse/YARN-1165
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Background: 
# YARN-1098 separates out RM services into Always-On and Active services, but 
doesn't change the behavior in any way.
# For YARN-1027, we would want to create, initialize, and start 
RMActiveServices in the context of RM#serviceStart(). This requires updating 
test cases that check for certain behavior post RM#serviceInit() - otherwise, 
most of these tests NPE.

Creating a JIRA different from YARN-1027 to address all these test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()

[
https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karthik Kambatla updated YARN-1165:
---

Description:
Background:
# YARN-1098 separates out RM services into Always-On and Active services, but
doesn't change the behavior in any way.
# For YARN-1027, we would want to create, initialize, and start
RMActiveServices in the scope of RM#serviceStart(). This requires updating test
cases that check for certain behavior post RM#serviceInit() - otherwise, most
of these tests NPE.

Creating a JIRA different from YARN-1027 to address all these test cases.

was:
Background:
# YARN-1098 separates out RM services into Always-On and Active services, but
doesn't change the behavior in any way.
# For YARN-1027, we would want to create, initialize, and start
RMActiveServices in the context of RM#serviceStart(). This requires updating
test cases that check for certain behavior post RM#serviceInit() - otherwise,
most of these tests NPE.

Creating a JIRA different from YARN-1027 to address all these test cases.

Move init() of activeServices to ResourceManager#serviceStart()
---

Key: YARN-1165
URL: https://issues.apache.org/jira/browse/YARN-1165
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

Background:
# YARN-1098 separates out RM services into Always-On and Active services, but
doesn't change the behavior in any way.
# For YARN-1027, we would want to create, initialize, and start
RMActiveServices in the scope of RM#serviceStart(). This requires updating
test cases that check for certain behavior post RM#serviceInit() - otherwise,
most of these tests NPE.
Creating a JIRA different from YARN-1027 to address all these test cases.

[jira] [Created] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'

2013-09-06 Thread Srimanth Gunturi (JIRA)

Srimanth Gunturi created YARN-1166:
--

 Summary: YARN 'appsFailed' metric should be of type 'counter'
 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi


Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
type 'guage' - which means the exact value will be reported. 

All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
between time-points.

To be consistent, AppsFailed metric should also be of type 'counter'. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL


[ 
https://issues.apache.org/jira/browse/YARN-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760715#comment-13760715
 ] 

Tsuyoshi OZAWA commented on YARN-1159:
--

OK, thanks.

 NodeManager reports  Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1159
 URL: https://issues.apache.org/jira/browse/YARN-1159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.1.1-beta


 When running MR PI, which runs successfully, the NM log reports:
 {code}
 2013-09-06 11:45:29,368 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
 status for container: container_id { app_attempt_id { application_id { id: 5 
 cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING 
 diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000
 2013-09-06 11:45:29,390 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
 from container container_1378450335207_0005_01_04 is : 143
 2013-09-06 11:45:29,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1378450335207_0005_01_04 transitioned from KILLING 
 to CONTAINER_CLEANEDUP_AFTER_KILL
 2013-09-06 11:45:29,426 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: 
 [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
   at java.lang.Thread.run(Thread.java:722)
 2013-09-06 11:45:29,426 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1378450335207_0005_01_04 transitioned from 
 CONTAINER_CLEANEDUP_AFTER_KILL to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()


 [ 
https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1165:
---

Attachment: test-failures.pdf

Moving creation, init() and start() of activeServices to RM#serviceStart().

Attaching the output of running tests in hadoop-yarn-server-resourcemanager - 
this alone is 149 tests.

Changing these tests to use RM#start() in addition to RM#init() would lead to 
significantly longer test run times. Also, once HADOOP-9933 is fixed, we will 
have to undo this.

The best way forward might be to bite the bullet and fix HADOOP-9933 first. 

The other alternative is to implement RM#initForTesting() that instantiates and 
initializes RMActiveServices - however, that is too much of a hack and I am not 
sure if we should do that.

Thoughts? [~bikassaha], [~vinodkv], [~stevel]?

 Move init() of activeServices to ResourceManager#serviceStart()
 ---

 Key: YARN-1165
 URL: https://issues.apache.org/jira/browse/YARN-1165
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: test-failures.pdf


 Background: 
 # YARN-1098 separates out RM services into Always-On and Active services, but 
 doesn't change the behavior in any way.
 # For YARN-1027, we would want to create, initialize, and start 
 RMActiveServices in the scope of RM#serviceStart(). This requires updating 
 test cases that check for certain behavior post RM#serviceInit() - otherwise, 
 most of these tests NPE.
 Creating a JIRA different from YARN-1027 to address all these test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable


 [ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-713:
---

Attachment: YARN-713.09062013.1.patch

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Critical
 Fix For: 2.3.0

 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, 
 YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-609) Fix synchronization issues in APIs which take in lists


 [ 
https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-609:
---

Attachment: YARN-609.7.patch

Fix -1 findbug

 Fix synchronization issues in APIs which take in lists
 --

 Key: YARN-609
 URL: https://issues.apache.org/jira/browse/YARN-609
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, 
 YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch, YARN-609.7.patch


 Some of the APIs take in lists and the setter-APIs don't always do proper 
 synchronization. We need to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'

2013-09-06 Thread Srimanth Gunturi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srimanth Gunturi reopened YARN-1166:



Talked with [~jianhe] and [~vinodkv], and this metric should be cumulative. 
Hence the original request that this metric be a 'counter' is valid.

 YARN 'appsFailed' metric should be of type 'counter'
 

 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi

 Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
 type 'guage' - which means the exact value will be reported. 
 All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
 are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
 between time-points.
 To be consistent, AppsFailed metric should also be of type 'counter'. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()

2013-09-06 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760747#comment-13760747
]

Bikas Saha commented on YARN-1165:
--

Are we getting the test failures even with the approach taken in the current
patch uploaded to YARN-1027? That patch was adding activeServices to the RM
when ha is not enabled and thus mimicking current RM behavior. Thus RM init
would cause activeService init and RM start would cause activeService start. So
its not clear to me why the current patch would break the tests.

How does HADOOP-9933 fix this? The problem happens before services are stopped.
So ability to restart them sounds unrelated.

Move init() of activeServices to ResourceManager#serviceStart()
---

Key: YARN-1165
URL: https://issues.apache.org/jira/browse/YARN-1165
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Attachments: test-failures.pdf

[jira] [Resolved] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'

2013-09-06 Thread Srimanth Gunturi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srimanth Gunturi resolved YARN-1166.


Resolution: Not A Problem

Turns out 'AppsFailed' should be interpreted as 'AppsFailing'. Its value is 
decremented when it is resubmitted in subsequent attempts.

 YARN 'appsFailed' metric should be of type 'counter'
 

 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi

 Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
 type 'guage' - which means the exact value will be reported. 
 All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
 are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
 between time-points.
 To be consistent, AppsFailed metric should also be of type 'counter'. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-06 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760752#comment-13760752
 ] 

Bikas Saha commented on YARN-1027:
--

RMHAProtocolService can be made available via RMContext and thus accessible to 
everyone who has access to RMContext.

In that case we probably mean package and not protected since there is no 
inheritance story here.

I dont think we need a test (although that would be awesome). If we can 
manually verify then it should be sufficient for now I guess.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, 
 yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics

2013-09-06 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760460#comment-13760460
]

Zhijie Shen commented on YARN-1001:
---

bq. we are expecting /ws/v1/cluster/appscount to provide all
app-types/state-counts in 1 call

To prevent RM from being overwhelmed, when no params is specified, the API
returns a empty response. Users must supply the states and the types (at least
one) to get the counts.

bq. Apart from that, we need /ws/v1/cluster/appscount information pushed to
Ganglia.

Though some discussion, we'd like to exclude this requirement from the scope of
this jira due to the performance concern. We can open a ticket to trace it
separately.

Another issue we need to clarify is that the counts depend on the current apps
in RMContext. Old finished apps may be removed from the context if the total
app number reaches the limit. Therefore, the count of completed apps may only
reflect the count of those in RMContext. Should emphasize the apps in RMContext
in the document.

YARN should provide per application-type and state statistics
-

Key: YARN-1001
URL: https://issues.apache.org/jira/browse/YARN-1001
Project: Hadoop YARN
Issue Type: Task
Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Attachments: YARN-1001.1.patch

In Ambari we plan to show for MR2 the number of applications finished,
running, waiting, etc. It would be efficient if YARN could provide per
application-type and state aggregated counts.

[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart

[
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760757#comment-13760757
]

Omkar Vinit Joshi commented on YARN-1107:
-

Thanks Vinod..
bq. Put both RMDelegationTokenSecretManager and ClientRMService in RMContext.
Then you don't need delegationTokenRenewer.setClientRMService() and
ClientRMService.getDelegationTokenSecretManager().

bq. You can add an assert in DelegationTokenRenewer.serviceStart() to check for
ClientRMService.start() after the comment. It'll be useful if tests enable
assertions, can you check?
done..

bq. RMDelegationTokenIdentifier.Renewer.setSecretManager is moved into
ClientRMService, but not so in the test. Can we change that.
fixed

bq. Please also take care of the test-issue and the findbugs warning.
Done..

Job submitted with Delegation token in secured environment causes RM to fail
during RM restart
--

Key: YARN-1107
URL: https://issues.apache.org/jira/browse/YARN-1107
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
Fix For: 2.1.1-beta

Attachments: rm.log, YARN-1107.20130828.1.patch,
YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch

If secure RM with recovery enabled is restarted while oozie jobs are running
rm fails to come up.

[jira] [Updated] (YARN-1027) Implement RMHAServiceProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1027:
---

Attachment: yarn-1027-4.patch

Patch addressing comments from Bikas.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-758) Augment MockNM to use multiple cores


 [ 
https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-758:
-

Priority: Minor  (was: Major)
Target Version/s: 2.3.0, 2.1.1-beta  (was: 2.3.0)
   Fix Version/s: (was: 2.3.0)
  2.1.1-beta
  Issue Type: Improvement  (was: Bug)

 Augment MockNM to use multiple cores
 

 Key: YARN-758
 URL: https://issues.apache.org/jira/browse/YARN-758
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.1.1-beta

 Attachments: yarn-758-1.patch, yarn-758-2.patch


 YARN-757 got fixed by changing the scheduler from Fair to default (which is 
 capacity).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-758) Augment MockNM to use multiple cores


 [ 
https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-758:
-


bq. This was reopened to ensure the patch was tested. Now, resolving this on 
the basis of tests via TestRMRestart on FairScheduler. Please reopen if needed. 
Thanks.
I actually meant that we don't test this automatically, we don't have a 
replacement for TestRMRestart with FairScheduler in the test suite.

Anyways, too late and too small a thing to worry about.

OTOH, some patches started depending on this in 2.1-beta. So I just merged this 
into branch-2.1-beta.

 Augment MockNM to use multiple cores
 

 Key: YARN-758
 URL: https://issues.apache.org/jira/browse/YARN-758
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Fix For: 2.3.0

 Attachments: yarn-758-1.patch, yarn-758-2.patch


 YARN-757 got fixed by changing the scheduler from Fair to default (which is 
 capacity).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol


[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760774#comment-13760774
 ] 

Karthik Kambatla commented on YARN-1027:


In yarn-1027-4.patch, the RM always addService(HAServiceProtocol). 
HAServiceProtocol is the one that checks if haEnabled in serviceStart(). If 
enabled then it transitions to standby and waits for active signal. If not, 
then it directly transitions to active.

However, post RM#init(), RM fields are not instantiated (e.g. TokenManagers) 
leading a bunch of test failures.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()


[ 
https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760775#comment-13760775
 ] 

Karthik Kambatla commented on YARN-1165:


The current patch uploaded to YARN-1027 (yarn-1027-3.patch) doesn't have this 
issue because the RM does the following in serviceInit():
{code}
if (haEnabled) {
  haService = new RMHAProtocolService(this);
  addService(haService);
} else {
  activeServices = new RMActiveServices();
  addService(activeServices);
}
super.serviceInit(conf);
{code}

If we were to (1) move handling of RMActiveServices to RMHAProtocolService, and 
(2) the RM to start the RMHAProtocolService always irrespective of HA being 
enabled or not, we will run into this. I just uploaded yarn-1027-4.patch that 
implements this. 

If HADOOP-9933 were fixed, we could initialize everything RM, 
RMHAProtocolService, and RMActiveServices on RM#init(). Transition to Active 
would be RMActiveServices#start(), and transition to Standby would be 
RMActiveServices#stop(). 

 Move init() of activeServices to ResourceManager#serviceStart()
 ---

 Key: YARN-1165
 URL: https://issues.apache.org/jira/browse/YARN-1165
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: test-failures.pdf


 Background: 
 # YARN-1098 separates out RM services into Always-On and Active services, but 
 doesn't change the behavior in any way.
 # For YARN-1027, we would want to create, initialize, and start 
 RMActiveServices in the scope of RM#serviceStart(). This requires updating 
 test cases that check for certain behavior post RM#serviceInit() - otherwise, 
 most of these tests NPE.
 Creating a JIRA different from YARN-1027 to address all these test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1107:


Attachment: YARN-1107.20130906.1.patch

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, 
 YARN-1107.20130906.1.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


[ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760826#comment-13760826
 ] 

Hadoop QA commented on YARN-1107:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12601937/YARN-1107.20130906.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1868//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1868//console

This message is automatically generated.

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, 
 YARN-1107.20130906.1.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1152) Invalid key to HMAC computation error when getting application report for completed app attempt


[ 
https://issues.apache.org/jira/browse/YARN-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760842#comment-13760842
 ] 

Vinod Kumar Vavilapalli commented on YARN-1152:
---

BTW, forgot to mention that, your test TestRMAppTransitions is good in that I 
was able to revert just RMAppImpl changes and reproduce the issue. So thumbs up!

 Invalid key to HMAC computation error when getting application report for 
 completed app attempt
 ---

 Key: YARN-1152
 URL: https://issues.apache.org/jira/browse/YARN-1152
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-1152.txt


 On a secure cluster, an invalid key to HMAC error is thrown when trying to 
 get an application report for an application with an attempt that has 
 unregistered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-09-06 Thread Tassapol Athiapinya (JIRA)

Tassapol Athiapinya created YARN-1167:
-

 Summary: Submitted distributed shell application shows 
appMasterHost = empty
 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
 Fix For: 2.1.1-beta


Submit distributed shell application. Once the application turns to be RUNNING 
state, app master host should not be empty. In reality, it is empty.

==console logs==
distributedshell.Client: Got application report from ASM for, appId=12, 
clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


[ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760870#comment-13760870
 ] 

Omkar Vinit Joshi commented on YARN-1107:
-

removing locking..

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, 
 YARN-1107.20130906.1.patch, YARN-1107.20130906.2.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1107:


Attachment: YARN-1107.20130906.2.patch

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, 
 YARN-1107.20130906.1.patch, YARN-1107.20130906.2.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-540:
-

Attachment: YARN-540.4.patch

Upload a patch that changes FinishApplicationMasterResponse to contain a 
response-completed field and MR AM and AMRMClient are changed to retry till it 
becomes true. Also fixed Bikas's last comments

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760896#comment-13760896
 ] 

Hadoop QA commented on YARN-540:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601956/YARN-540.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1870//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1870//console

This message is automatically generated.

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


[ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760925#comment-13760925
 ] 

Vinod Kumar Vavilapalli commented on YARN-1107:
---

+1 for the latest patch. I just made sure that without the core change, 
TestRMRestart fails. So we are good to go.

Checking this in.

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, 
 YARN-1107.20130906.1.patch, YARN-1107.20130906.2.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart