[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol
[ https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759973#comment-13759973 ] Bikas Saha commented on YARN-1027: -- The patch look clean overall. I would suggest keeping haEnabled concept within the HAServiceProtocol service instead of mixing it between the ResourceManager and HAServiceProtocol. Thus the RM always addService(HAServiceProtocol). HAServiceProtocol is the one that checks if haEnabled in serviceStart(). If enabled then it transitions to standby and waits for active signal. If not, then it directly transitions to active. Shouldn't we simply call transitionToStandby() here? That would ensure getServiceStatus() returns non active status for anyone that cares to know. {code} + public void serviceStop() throws Exception { +if (rm.haState == HAServiceState.ACTIVE) { + rm.stopActiveServices(); {code} This is fine for now but we might have to invest in better health check in a different jira. Any ideas? {code} public synchronized void monitorHealth() throws HealthCheckFailedException { +if (rm.haState == HAServiceState.ACTIVE !rm.areActiveServicesRunning()) { {code} We probably want the log before the if stmt. Should we change state to standby before we stop services? Assuming that HA aware services would need to know about this earlier rather than later so that they can stop signaling Active services and allow them to be drained/stopped. {code} +if (rm.haState == HAServiceState.ACTIVE) { + rm.stopActiveServices(); +} + +LOG.info(Transitioning to standby); +rm.haState = HAServiceState.STANDBY; {code} Didnt quite get this comment. Is this do with change being requested by user/admin/ZKFC? {code} + public void transitionToActive(StateChangeRequestInfo reqInfo) { +// TODO: When automatic failover is enabled, check if transition should +// be allowed for this request {code} What are the pros of making haState a member of ResourceManager instead of HAServiceProtocol? A pro of the latter is that it keeps all HA stuff in one place. Why is there a lock used in ResourceManager.startActive() etc. Why are these methods protected. If testing, then lets add an @visiblefortesting annotation. Is there a way to confirm that the active service objects are all being GC'd? testStartAndTransitions() - How about calling getServiceStatus() and monitorHealth() in addition to checking the internal members, in all places where internal members are being checked. So we can test and exercise those methods too. How about completing Active-Standby-Active-Standby-Active-RM.serviceStop(). This would fully simulate multiple full cycles of transitions and also verify the shutdown case. We can also issue some requests like createApplication() to the RM, when in active state, and verify that the RM is really working. TestRMHADisabled. It confusing to read that the RM has started but its haState==INITIALIZING. Also, we can probably move this test in TestRMHA.java to keep related tests in one place. Minor nits LOG instead of print? {code} +} catch (Exception e) { + e.printStackTrace(); {code} RM_HA_PREFIX instead of HA_PREFIX Implement RMHAServiceProtocol - Key: YARN-1027 URL: https://issues.apache.org/jira/browse/YARN-1027 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: test-yarn-1027.patch, yarn-1027-1.patch, yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1059) '\n' or ' ' or '\t' should be ignored for some configuration parameters
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760042#comment-13760042 ] Tsuyoshi OZAWA commented on YARN-1059: -- Threw a patch to HADOOP-9869. '\n' or ' ' or '\t' should be ignored for some configuration parameters --- Key: YARN-1059 URL: https://issues.apache.org/jira/browse/YARN-1059 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Environment: Ubuntu 12.04, hadoop 2.0.5 Reporter: rvller Priority: Minor Labels: newbie Here is the traceback while starting the yarn resourse manager: 2013-08-12 12:53:29,319 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 10.245.1.30:9030 (configuration property 'yarn.resourcemanager.resource-tracker.address') at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) at org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) And here is the yarn-site.xml: configuration property name yarn.resourcemanager.address /name value 10.245.1.30:9010 /value description /description /property property name yarn.resourcemanager.scheduler.address /name value 10.245.1.30:9020 /value description /description /property property name yarn.resourcemanager.resource-tracker.address /name value 10.245.1.30:9030 /value description /description /property property name yarn.resourcemanager.admin.address /name value 10.245.1.30:9040 /value description /description /property property name yarn.resourcemanager.webapp.address /name value 10.245.1.30:9050 /value description /description /property !-- Site specific YARN configuration properties -- /configuration -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1132) QueueMetrics.java has wrong comments
[ https://issues.apache.org/jira/browse/YARN-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760045#comment-13760045 ] Tsuyoshi OZAWA commented on YARN-1132: -- Then, this should be closed as duplicated jira of YARN-1090? QueueMetrics.java has wrong comments Key: YARN-1132 URL: https://issues.apache.org/jira/browse/YARN-1132 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Priority: Minor Labels: newbie I found o.a.h.yarn.server.resourcemanager.scheduler.QueueMetrics.java has wrong comments {code} @Metric(# of reserved memory in MB) MutableGaugeInt reservedMB; @Metric(# of active users) MutableGaugeInt activeApplications; {code} they should be fixed as follows: {code} @Metric(Reserved memory in MB) MutableGaugeInt reservedMB; @Metric(# of active applications) MutableGaugeInt activeApplications; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-910) Allow auxiliary services to listen for container starts and completions
[ https://issues.apache.org/jira/browse/YARN-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-910: Attachment: YARN-910.patch Thanks Sandy Vinod. In the latest patch I've took care of all the changes except the following. bq. Split AuxServicesEvent into a AuxServicesAppEvent and AuxServicesContainerEvent ? Don't like nulls like that. The patch is only adding a new property to the event, container, which is NULL for App events. All the other NULLs where already there. Regardless, I've tried refactoring AuxServicesEvent into a AuxServicesAppEvent and AuxServicesContainerEvent. But the patch gets much bigger as the necessary changes are not just different names but the way the AuxiliaryServices handle() would take care of these 2 events. We should introduce a parent event class for those. I'd prefer, if you still want to do this break up, to do it as part of another JIRA which only does the refactoring, without adding new functionality. Allow auxiliary services to listen for container starts and completions --- Key: YARN-910 URL: https://issues.apache.org/jira/browse/YARN-910 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Alejandro Abdelnur Attachments: YARN-910.patch, YARN-910.patch, YARN-910.patch Making container start and completion events available to auxiliary services would allow them to be resource-aware. The auxiliary service would be able to notify a co-located service that is opportunistically using free capacity of allocation changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
Alejandro Abdelnur created YARN-1159: Summary: NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL Key: YARN-1159 URL: https://issues.apache.org/jira/browse/YARN-1159 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.1.1-beta When running MR PI, which runs successfully, the NM log reports: {code} 2013-09-06 11:45:29,368 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 5 cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000 2013-09-06 11:45:29,390 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1378450335207_0005_01_04 is : 143 2013-09-06 11:45:29,425 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2013-09-06 11:45:29,426 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:722) 2013-09-06 11:45:29,426 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-910) Allow auxiliary services to listen for container starts and completions
[ https://issues.apache.org/jira/browse/YARN-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760107#comment-13760107 ] Hadoop QA commented on YARN-910: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601802/YARN-910.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1854//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1854//console This message is automatically generated. Allow auxiliary services to listen for container starts and completions --- Key: YARN-910 URL: https://issues.apache.org/jira/browse/YARN-910 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Alejandro Abdelnur Attachments: YARN-910.patch, YARN-910.patch, YARN-910.patch Making container start and completion events available to auxiliary services would allow them to be resource-aware. The auxiliary service would be able to notify a co-located service that is opportunistically using free capacity of allocation changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips
[ https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760128#comment-13760128 ] Steve Loughran commented on YARN-1155: -- This would need to work for clusters where DNS doesn't resolve the hostnames, but instead /etc/hosts does the work. This is a not unusual setup in virtualized clusters when the VMs are on a virtual subnet RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips --- Key: YARN-1155 URL: https://issues.apache.org/jira/browse/YARN-1155 Project: Hadoop YARN Issue Type: Bug Reporter: yeshavora Assignee: Xuan Gong RM should be able to resolve both ips and host names from include and exclude files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-1160) allow admins to force app deployment on a specific host
[ https://issues.apache.org/jira/browse/YARN-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran moved MAPREDUCE-4277 to YARN-1160: - Component/s: (was: mrv2) resourcemanager Affects Version/s: (was: trunk) (was: 2.0.0-alpha) 3.0.0 Key: YARN-1160 (was: MAPREDUCE-4277) Project: Hadoop YARN (was: Hadoop Map/Reduce) allow admins to force app deployment on a specific host --- Key: YARN-1160 URL: https://issues.apache.org/jira/browse/YARN-1160 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Priority: Minor Currently you ask YARN to get slots on a host and it finds a slot on that machine -or, if unavailable or there is no room, on a host nearby as far as the topology is concerned. People with admin rights should have the option to deploy a process on a specific host and have it run there even if there are no free slots -and to fail if the machine is not available. This would let you deploy admin-specific process across a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files
[ https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760146#comment-13760146 ] Steve Loughran commented on YARN-1151: -- it gets more complex once you add in failure handling: how does the NM react to the aux service process failing? How does the NM shut it down? where do the logs go? Who does it run as? We effectively do have a system for doing this, it is called YARN. What sounds needed here is a way to tell *something* that an NM has started and then give it the option of creating and deploying a container on it. That something should, obviously, be a YARN app itself, since they are set up to build up command lines, copy in JARs, handle failures, etc. What we don't have is # anything that starts a specific long-lived YARN AM service on cluster startup. # a way for an AM to list all the hosts and demand a container on every one, irrespective of what is already there. (actually you could probably do it by asking for 0 RAM and vhosts, but the min resource config options are designed to stop users doing this). YARN-1160 covers that problem Ability to configure auxiliary services from HDFS-based JAR files - Key: YARN-1151 URL: https://issues.apache.org/jira/browse/YARN-1151 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.1.0-beta Reporter: john lilley Priority: Minor Labels: auxiliary-service, yarn I would like to install an auxiliary service in Hadoop YARN without actually installing files/services on every node in the system. Discussions on the user@ list indicate that this is not easily done. The reason we want an auxiliary service is that our application has some persistent-data components that are not appropriate for HDFS. In fact, they are somewhat analogous to the mapper output of MapReduce's shuffle, which is what led me to auxiliary-services in the first place. It would be much easier if we could just place our service's JARs in HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1049) ContainerExistStatus should define a status for preempted containers
[ https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1049: - Description: With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash. Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted. Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA. was: ContainerExitStatus defines a few constant with special exit status values (0,-1000, -100, -101). This is incorrect, we should not define any special constants and limit to return the actual process exist status code. ContainerState should include PREEMPTED (when preempted by YARN), LOST (when the NM crashes). With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash. Marking it as a blocker for 2.1.0 as this is an API/behavior change. Fix Version/s: (was: 2.3.0) 2.1.1-beta Assignee: Alejandro Abdelnur Summary: ContainerExistStatus should define a status for preempted containers (was: ContainerExistStatus and ContainerState are defined incorrectly) ContainerExistStatus should define a status for preempted containers Key: YARN-1049 URL: https://issues.apache.org/jira/browse/YARN-1049 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.1.1-beta With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash. Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted. Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1049) ContainerExistStatus should define a status for preempted containers
[ https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1049: - Attachment: YARN-1049.patch ContainerExistStatus should define a status for preempted containers Key: YARN-1049 URL: https://issues.apache.org/jira/browse/YARN-1049 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1049.patch With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash. Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted. Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1049) ContainerExistStatus should define a status for preempted containers
[ https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760172#comment-13760172 ] Hadoop QA commented on YARN-1049: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601813/YARN-1049.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1855//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1855//console This message is automatically generated. ContainerExistStatus should define a status for preempted containers Key: YARN-1049 URL: https://issues.apache.org/jira/browse/YARN-1049 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1049.patch With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash. Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted. Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1161) branch-2.1-beta compilation fails
Devaraj K created YARN-1161: --- Summary: branch-2.1-beta compilation fails Key: YARN-1161 URL: https://issues.apache.org/jira/browse/YARN-1161 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Assignee: Devaraj K Priority: Blocker {code:xml} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2. 5.1:testCompile (default-testCompile) on project hadoop-yarn-server-resourcemana ger: Compilation failure [ERROR] D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yar n-server\hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn \server\resourcemanager\MockRM.java:[238,8] cannot find symbol [ERROR] symbol : constructor MockNM(java.lang.String,int,int,org.apache.hadoop. yarn.server.resourcemanager.ResourceTrackerService) [ERROR] location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM [ERROR] - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal o rg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCom pile) on project hadoop-yarn-server-resourcemanager: Compilation failure D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server \hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn\server\ resourcemanager\MockRM.java:[238,8] cannot find symbol symbol : constructor MockNM(java.lang.String,int,int,org.apache.hadoop.yarn.ser ver.resourcemanager.ResourceTrackerService) location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1161) branch-2.1-beta compilation fails
[ https://issues.apache.org/jira/browse/YARN-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-1161: Attachment: YARN-1161.patch branch-2.1-beta compilation fails - Key: YARN-1161 URL: https://issues.apache.org/jira/browse/YARN-1161 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Assignee: Devaraj K Priority: Blocker Attachments: YARN-1161.patch {code:xml} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2. 5.1:testCompile (default-testCompile) on project hadoop-yarn-server-resourcemana ger: Compilation failure [ERROR] D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yar n-server\hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn \server\resourcemanager\MockRM.java:[238,8] cannot find symbol [ERROR] symbol : constructor MockNM(java.lang.String,int,int,org.apache.hadoop. yarn.server.resourcemanager.ResourceTrackerService) [ERROR] location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM [ERROR] - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal o rg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCom pile) on project hadoop-yarn-server-resourcemanager: Compilation failure D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server \hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn\server\ resourcemanager\MockRM.java:[238,8] cannot find symbol symbol : constructor MockNM(java.lang.String,int,int,org.apache.hadoop.yarn.ser ver.resourcemanager.ResourceTrackerService) location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1144) Unmanaged AMs registering a tracking URI should not be proxy-fied
[ https://issues.apache.org/jira/browse/YARN-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1144: - Attachment: YARN-1144.patch Unmanaged AMs registering a tracking URI should not be proxy-fied - Key: YARN-1144 URL: https://issues.apache.org/jira/browse/YARN-1144 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.1.1-beta Attachments: YARN-1144.patch Unmanaged AMs do not run in the cluster, their tracking URL should not be proxy-fied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1161) branch-2.1-beta compilation fails
[ https://issues.apache.org/jira/browse/YARN-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760202#comment-13760202 ] Hadoop QA commented on YARN-1161: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601822/YARN-1161.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1857//console This message is automatically generated. branch-2.1-beta compilation fails - Key: YARN-1161 URL: https://issues.apache.org/jira/browse/YARN-1161 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Assignee: Devaraj K Priority: Blocker Attachments: YARN-1161.patch {code:xml} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2. 5.1:testCompile (default-testCompile) on project hadoop-yarn-server-resourcemana ger: Compilation failure [ERROR] D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yar n-server\hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn \server\resourcemanager\MockRM.java:[238,8] cannot find symbol [ERROR] symbol : constructor MockNM(java.lang.String,int,int,org.apache.hadoop. yarn.server.resourcemanager.ResourceTrackerService) [ERROR] location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM [ERROR] - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal o rg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCom pile) on project hadoop-yarn-server-resourcemanager: Compilation failure D:\svn\apache\branch-2.1-beta\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server \hadoop-yarn-server-resourcemanager\src\test\java\org\apache\hadoop\yarn\server\ resourcemanager\MockRM.java:[238,8] cannot find symbol symbol : constructor MockNM(java.lang.String,int,int,org.apache.hadoop.yarn.ser ver.resourcemanager.ResourceTrackerService) location: class org.apache.hadoop.yarn.server.resourcemanager.MockNM {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated YARN-696: Attachment: YARN-696.diff Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working
[ https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760205#comment-13760205 ] Thomas Graves commented on YARN-1153: - what are the rest of your queue settings? With one user the user limit factor comes into affect. http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html CapacityScheduler queue elasticity is not working - Key: YARN-1153 URL: https://issues.apache.org/jira/browse/YARN-1153 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Configured 2 queues one with 25% capacity and the other with 75% capacity, and both has 100% max-capacity. Submit only 1 application to whichever queue. Ideally, it should take use of 100% cluster's resources, but it's not. Tested this on single node cluster using DefaultResourceCalculator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated YARN-696: Attachment: YARN-696.diff An invalid application state now throws a BadRequestException. I went for the message Invalid application-state INVALID_test specified. It should be one of [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHING, FINISHED, FAILED, KILLED] Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1160) allow admins to force app deployment on a specific host
[ https://issues.apache.org/jira/browse/YARN-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760208#comment-13760208 ] Alejandro Abdelnur commented on YARN-1160: -- You can already ask for container exactly on a specific node setting relaxLocality to FALSE in the ResourceRequest. Though, this does not allow you to get a container if there is no capacity in the node. allow admins to force app deployment on a specific host --- Key: YARN-1160 URL: https://issues.apache.org/jira/browse/YARN-1160 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Priority: Minor Currently you ask YARN to get slots on a host and it finds a slot on that machine -or, if unavailable or there is no room, on a host nearby as far as the topology is concerned. People with admin rights should have the option to deploy a process on a specific host and have it run there even if there are no free slots -and to fail if the machine is not available. This would let you deploy admin-specific process across a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated YARN-696: Attachment: YARN-696.diff Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1162) NM auxiliary service invocations should be try/catch
Alejandro Abdelnur created YARN-1162: Summary: NM auxiliary service invocations should be try/catch Key: YARN-1162 URL: https://issues.apache.org/jira/browse/YARN-1162 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Priority: Critical Fix For: 2.1.1-beta The {{AuxiliaryServices#handle()}} should try/catch all invocations of auxiliary services to isolate failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1144) Unmanaged AMs registering a tracking URI should not be proxy-fied
[ https://issues.apache.org/jira/browse/YARN-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760209#comment-13760209 ] Hadoop QA commented on YARN-1144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601823/YARN-1144.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1856//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1856//console This message is automatically generated. Unmanaged AMs registering a tracking URI should not be proxy-fied - Key: YARN-1144 URL: https://issues.apache.org/jira/browse/YARN-1144 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.1.1-beta Attachments: YARN-1144.patch Unmanaged AMs do not run in the cluster, their tracking URL should not be proxy-fied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1160) allow admins to force app deployment on a specific host
[ https://issues.apache.org/jira/browse/YARN-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760210#comment-13760210 ] Steve Loughran commented on YARN-1160: -- -Yes, and if you don't get that container it just stays in the queue -no notification to the AM. This is about being able to force things in without that wait and irrespective of space allow admins to force app deployment on a specific host --- Key: YARN-1160 URL: https://issues.apache.org/jira/browse/YARN-1160 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Priority: Minor Currently you ask YARN to get slots on a host and it finds a slot on that machine -or, if unavailable or there is no room, on a host nearby as far as the topology is concerned. People with admin rights should have the option to deploy a process on a specific host and have it run there even if there are no free slots -and to fail if the machine is not available. This would let you deploy admin-specific process across a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760216#comment-13760216 ] Hadoop QA commented on YARN-696: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601828/YARN-696.diff against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1858//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1858//console This message is automatically generated. Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1163) Cleanup code for AssignMapsWithLocality() in RMContainerAllocator
Junping Du created YARN-1163: Summary: Cleanup code for AssignMapsWithLocality() in RMContainerAllocator Key: YARN-1163 URL: https://issues.apache.org/jira/browse/YARN-1163 Project: Hadoop YARN Issue Type: Improvement Components: applications Reporter: Junping Du Assignee: Junping Du Priority: Minor In RMContainerAllocator, AssignMapsWithLocality() is a very important method to assign map tasks on allocated containers with conforming different level of locality (dataLocal, rackLocal, etc.). However, this method messed with different code logic to handle different type of locality but have lots of similar behaviours. This is hard to maintain as well as do extension with other locality type, so we need some more clear code here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1164) maven Junit dependency should be test only
Steve Loughran created YARN-1164: Summary: maven Junit dependency should be test only Key: YARN-1164 URL: https://issues.apache.org/jira/browse/YARN-1164 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.1.0-beta Reporter: Steve Loughran Priority: Minor The maven dependencies for the YARN artifacts don't restrict to test time, so it gets picked up by all downstream users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1164) maven Junit dependency should be test only
[ https://issues.apache.org/jira/browse/YARN-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-1164: - Attachment: HADOOP-9935-001.patch Patch from André Kelpe for HADOOP-9935; this JIRA is to test the YARN section maven Junit dependency should be test only -- Key: YARN-1164 URL: https://issues.apache.org/jira/browse/YARN-1164 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.1.0-beta Reporter: Steve Loughran Priority: Minor Attachments: HADOOP-9935-001.patch The maven dependencies for the YARN artifacts don't restrict to test time, so it gets picked up by all downstream users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1152) Invalid key to HMAC computation error when getting application report for completed app attempt
[ https://issues.apache.org/jira/browse/YARN-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1152: - Target Version/s: 2.1.1-beta (was: 0.23.10, 2.1.1-beta) Affects Version/s: (was: 0.23.10) Turns out this does not affect 0.23 because master keys are created per app instead of app-attempt and not removed. Invalid key to HMAC computation error when getting application report for completed app attempt --- Key: YARN-1152 URL: https://issues.apache.org/jira/browse/YARN-1152 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-1152.txt On a secure cluster, an invalid key to HMAC error is thrown when trying to get an application report for an application with an attempt that has unregistered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
[ https://issues.apache.org/jira/browse/YARN-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760338#comment-13760338 ] Tsuyoshi OZAWA commented on YARN-1159: -- Should we change the event CONTAINER_KILLED_ON_REQUEST at the state CONTAINER_CLEANEDUP_AFTER_KILL to be acceptable? NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL - Key: YARN-1159 URL: https://issues.apache.org/jira/browse/YARN-1159 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.1.1-beta When running MR PI, which runs successfully, the NM log reports: {code} 2013-09-06 11:45:29,368 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 5 cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000 2013-09-06 11:45:29,390 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1378450335207_0005_01_04 is : 143 2013-09-06 11:45:29,425 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2013-09-06 11:45:29,426 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:722) 2013-09-06 11:45:29,426 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
[ https://issues.apache.org/jira/browse/YARN-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1159. --- Resolution: Duplicate It's a bug, and was reported before: YARN-1070. Will fix there. NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL - Key: YARN-1159 URL: https://issues.apache.org/jira/browse/YARN-1159 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.1.1-beta When running MR PI, which runs successfully, the NM log reports: {code} 2013-09-06 11:45:29,368 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 5 cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000 2013-09-06 11:45:29,390 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1378450335207_0005_01_04 is : 143 2013-09-06 11:45:29,425 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2013-09-06 11:45:29,426 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:722) 2013-09-06 11:45:29,426 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-609) Fix synchronization issues in APIs which take in lists
[ https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-609: --- Attachment: YARN-609.6.patch Fix synchronization issues in APIs which take in lists -- Key: YARN-609 URL: https://issues.apache.org/jira/browse/YARN-609 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch Some of the APIs take in lists and the setter-APIs don't always do proper synchronization. We need to fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working
[ https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760408#comment-13760408 ] Jian He commented on YARN-1153: --- Exactly, the single user limit is throttling it CapacityScheduler queue elasticity is not working - Key: YARN-1153 URL: https://issues.apache.org/jira/browse/YARN-1153 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Configured 2 queues one with 25% capacity and the other with 75% capacity, and both has 100% max-capacity. Submit only 1 application to whichever queue. Ideally, it should take use of 100% cluster's resources, but it's not. Tested this on single node cluster using DefaultResourceCalculator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-609) Fix synchronization issues in APIs which take in lists
[ https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760431#comment-13760431 ] Hadoop QA commented on YARN-609: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601857/YARN-609.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1860//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1860//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1860//console This message is automatically generated. Fix synchronization issues in APIs which take in lists -- Key: YARN-609 URL: https://issues.apache.org/jira/browse/YARN-609 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch Some of the APIs take in lists and the setter-APIs don't always do proper synchronization. We need to fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1152) Invalid key to HMAC computation error when getting application report for completed app attempt
[ https://issues.apache.org/jira/browse/YARN-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760281#comment-13760281 ] Jason Lowe commented on YARN-1152: -- I also manually tested this on a secure cluster. Proxy links and mapred job -list both worked after the job had completed, and the master key had been removed for the attempt. Invalid key to HMAC computation error when getting application report for completed app attempt --- Key: YARN-1152 URL: https://issues.apache.org/jira/browse/YARN-1152 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.10, 2.1.1-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-1152.txt On a secure cluster, an invalid key to HMAC error is thrown when trying to get an application report for an application with an attempt that has unregistered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760462#comment-13760462 ] Hadoop QA commented on YARN-978: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601859/YARN-978.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1861//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1861//console This message is automatically generated. [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation -- Key: YARN-978 URL: https://issues.apache.org/jira/browse/YARN-978 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Xuan Gong Fix For: YARN-321 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1001: -- Attachment: YARN-1001.2.patch Updated the patch against the latest trunk. In addition, polish the code, added the tests of empty params and invalid state, and added the document of the new rest API. YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Attachments: YARN-1001.1.patch, YARN-1001.2.patch In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working
[ https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760485#comment-13760485 ] Thomas Graves commented on YARN-1153: - sorry so why is this a bug? Its working as designed. If we want to change that, we should make this an enhancement request. CapacityScheduler queue elasticity is not working - Key: YARN-1153 URL: https://issues.apache.org/jira/browse/YARN-1153 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Configured 2 queues one with 25% capacity and the other with 75% capacity, and both has 100% max-capacity. Submit only 1 application to whichever queue. Ideally, it should take use of 100% cluster's resources, but it's not. Tested this on single node cluster using DefaultResourceCalculator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-609) Fix synchronization issues in APIs which take in lists
[ https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760487#comment-13760487 ] Hadoop QA commented on YARN-609: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601867/YARN-609.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1862//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1862//console This message is automatically generated. Fix synchronization issues in APIs which take in lists -- Key: YARN-609 URL: https://issues.apache.org/jira/browse/YARN-609 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch, YARN-609.7.patch, YARN-609.8.patch Some of the APIs take in lists and the setter-APIs don't always do proper synchronization. We need to fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (YARN-1153) CapacityScheduler queue elasticity is not working
[ https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760485#comment-13760485 ] Thomas Graves edited comment on YARN-1153 at 9/6/13 6:26 PM: - sorry so why is this a bug? Its working as designed. If we want to change that, we should make this an enhancement request. the fix is change your config was (Author: tgraves): sorry so why is this a bug? Its working as designed. If we want to change that, we should make this an enhancement request. CapacityScheduler queue elasticity is not working - Key: YARN-1153 URL: https://issues.apache.org/jira/browse/YARN-1153 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Configured 2 queues one with 25% capacity and the other with 75% capacity, and both has 100% max-capacity. Submit only 1 application to whichever queue. Ideally, it should take use of 100% cluster's resources, but it's not. Tested this on single node cluster using DefaultResourceCalculator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-957. -- Resolution: Fixed [~devaraj.k] opened YARN-1161. Closing this. Capacity Scheduler tries to reserve the memory more than what node manager reports. --- Key: YARN-957 URL: https://issues.apache.org/jira/browse/YARN-957 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, YARN-957-20130830.1.patch, YARN-957-20130904.1.patch, YARN-957-20130904.2.patch I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760436#comment-13760436 ] Omkar Vinit Joshi commented on YARN-713: rebasing patch .. had missed one local commit. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.3.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-609) Fix synchronization issues in APIs which take in lists
[ https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-609: --- Attachment: YARN-609.8.patch Fix synchronization issues in APIs which take in lists -- Key: YARN-609 URL: https://issues.apache.org/jira/browse/YARN-609 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch, YARN-609.7.patch, YARN-609.8.patch Some of the APIs take in lists and the setter-APIs don't always do proper synchronization. We need to fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760498#comment-13760498 ] Hudson commented on YARN-758: - SUCCESS: Integrated in Hadoop-trunk-Commit #4379 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4379/]) Fixing CHANGES.txt for YARN-758 as it is now merged into branch-2.1-beta. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1520659) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Augment MockNM to use multiple cores Key: YARN-758 URL: https://issues.apache.org/jira/browse/YARN-758 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Karthik Kambatla Priority: Minor Fix For: 2.1.1-beta Attachments: yarn-758-1.patch, yarn-758-2.patch YARN-757 got fixed by changing the scheduler from Fair to default (which is capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760495#comment-13760495 ] Hadoop QA commented on YARN-713: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601865/YARN-713.09062013.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1863//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1863//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1863//console This message is automatically generated. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.3.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760530#comment-13760530 ] Hadoop QA commented on YARN-978: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601877/YARN-978.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1865//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1865//console This message is automatically generated. [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation -- Key: YARN-978 URL: https://issues.apache.org/jira/browse/YARN-978 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Xuan Gong Fix For: YARN-321 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-978: --- Attachment: YARN-978.5.patch [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation -- Key: YARN-978 URL: https://issues.apache.org/jira/browse/YARN-978 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Xuan Gong Fix For: YARN-321 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1098) Separate out RM services into Always On and Active
[ https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1098: --- Attachment: yarn-1098-4.patch Updated patch to fix javadoc and test failures. Separate out RM services into Always On and Active -- Key: YARN-1098 URL: https://issues.apache.org/jira/browse/YARN-1098 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1098-1.patch, yarn-1098-2.patch, yarn-1098-3.patch, yarn-1098-4.patch, yarn-1098-approach.patch, yarn-1098-approach.patch From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby(). The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol
[ https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760582#comment-13760582 ] Karthik Kambatla commented on YARN-1027: Thanks for the detailed review, [~bikas]. bq. What are the pros of making haState a member of ResourceManager instead of HAServiceProtocol? A pro of the latter is that it keeps all HA stuff in one place. In the future, when individual external-facing services need to behave based on the HAState, having it in the RM might be useful. However, I think we should move it to RMHAProtocolService now, and move it to the RM or RMContext lazily. bq. Why is there a lock used in ResourceManager.startActive() etc. Why are these methods protected. If testing, then lets add an @visiblefortesting annotation. The lock is to protect against concurrent invocations of transitionToActive() and transitionToStandby() due to say user input. The methods are protected because they are being accessed from outside the RM - in this case, RMHAProtocolService. bq. Is there a way to confirm that the active service objects are all being GC'd? Not sure of a deterministic test. How about using Runtime.memory methods to measure memory usage before and after transitioning to Active and subsequently Standby? I can jmap a real RM on a pseudo-dist cluster and see if they are being cleaned up. bq. Didnt quite get this comment. Is this do with change being requested by user/admin/ZKFC? If automatic failover is enabled and a user issues a transition command, it should take effect only when it is forced. Agree with remaining comments. Will fix it in the next version. Implement RMHAServiceProtocol - Key: YARN-1027 URL: https://issues.apache.org/jira/browse/YARN-1027 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: test-yarn-1027.patch, yarn-1027-1.patch, yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1132) QueueMetrics.java has wrong comments
[ https://issues.apache.org/jira/browse/YARN-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760512#comment-13760512 ] Akira AJISAKA commented on YARN-1132: - Thanks for your comment, I'll close this issue as duplicated. QueueMetrics.java has wrong comments Key: YARN-1132 URL: https://issues.apache.org/jira/browse/YARN-1132 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Priority: Minor Labels: newbie I found o.a.h.yarn.server.resourcemanager.scheduler.QueueMetrics.java has wrong comments {code} @Metric(# of reserved memory in MB) MutableGaugeInt reservedMB; @Metric(# of active users) MutableGaugeInt activeApplications; {code} they should be fixed as follows: {code} @Metric(Reserved memory in MB) MutableGaugeInt reservedMB; @Metric(# of active applications) MutableGaugeInt activeApplications; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active
[ https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760567#comment-13760567 ] Karthik Kambatla commented on YARN-1098: Haven't added any tests because the patch just reorganizes code and doesn't change functionality. Existing tests should expose any problems. In fact, not having to modify anything else shows the changes are transparent to the users of ResourceManager. Separate out RM services into Always On and Active -- Key: YARN-1098 URL: https://issues.apache.org/jira/browse/YARN-1098 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1098-1.patch, yarn-1098-2.patch, yarn-1098-3.patch, yarn-1098-4.patch, yarn-1098-approach.patch, yarn-1098-approach.patch From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby(). The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-1153) CapacityScheduler queue elasticity is not working
[ https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-1153. --- Resolution: Invalid CapacityScheduler queue elasticity is not working - Key: YARN-1153 URL: https://issues.apache.org/jira/browse/YARN-1153 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Configured 2 queues one with 25% capacity and the other with 75% capacity, and both has 100% max-capacity. Submit only 1 application to whichever queue. Ideally, it should take use of 100% cluster's resources, but it's not. Tested this on single node cluster using DefaultResourceCalculator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-1132) QueueMetrics.java has wrong comments
[ https://issues.apache.org/jira/browse/YARN-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved YARN-1132. - Resolution: Duplicate QueueMetrics.java has wrong comments Key: YARN-1132 URL: https://issues.apache.org/jira/browse/YARN-1132 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Priority: Minor Labels: newbie I found o.a.h.yarn.server.resourcemanager.scheduler.QueueMetrics.java has wrong comments {code} @Metric(# of reserved memory in MB) MutableGaugeInt reservedMB; @Metric(# of active users) MutableGaugeInt activeApplications; {code} they should be fixed as follows: {code} @Metric(Reserved memory in MB) MutableGaugeInt reservedMB; @Metric(# of active applications) MutableGaugeInt activeApplications; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1153) CapacityScheduler queue elasticity is not working
[ https://issues.apache.org/jira/browse/YARN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760510#comment-13760510 ] Jian He commented on YARN-1153: --- yh, I'm closing it. thx for clarifying CapacityScheduler queue elasticity is not working - Key: YARN-1153 URL: https://issues.apache.org/jira/browse/YARN-1153 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Configured 2 queues one with 25% capacity and the other with 75% capacity, and both has 100% max-capacity. Submit only 1 application to whichever queue. Ideally, it should take use of 100% cluster's resources, but it's not. Tested this on single node cluster using DefaultResourceCalculator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760507#comment-13760507 ] Hadoop QA commented on YARN-1001: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601872/YARN-1001.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1864//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1864//console This message is automatically generated. YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Attachments: YARN-1001.1.patch, YARN-1001.2.patch In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active
[ https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760562#comment-13760562 ] Hadoop QA commented on YARN-1098: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601883/yarn-1098-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1866//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1866//console This message is automatically generated. Separate out RM services into Always On and Active -- Key: YARN-1098 URL: https://issues.apache.org/jira/browse/YARN-1098 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1098-1.patch, yarn-1098-2.patch, yarn-1098-3.patch, yarn-1098-4.patch, yarn-1098-approach.patch, yarn-1098-approach.patch From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby(). The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-978: --- Attachment: YARN-978.6.patch [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation -- Key: YARN-978 URL: https://issues.apache.org/jira/browse/YARN-978 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Xuan Gong Fix For: YARN-321 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips
[ https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760632#comment-13760632 ] Xuan Gong commented on YARN-1155: - Verified that we have test case to test this logic {code} // To test that IPs also work String ip = NetUtils.normalizeHostName(localhost); writeToHostsFile(host1, ip); rm.getNodesListManager().refreshNodes(conf); nodeHeartbeat = nm1.nodeHeartbeat(true); Assert.assertTrue(NodeAction.NORMAL.equals(nodeHeartbeat.getNodeAction())); Assert .assertEquals(0, ClusterMetrics.getMetrics().getNumDecommisionedNMs()); {code} RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips --- Key: YARN-1155 URL: https://issues.apache.org/jira/browse/YARN-1155 Project: Hadoop YARN Issue Type: Bug Reporter: yeshavora Assignee: Xuan Gong RM should be able to resolve both ips and host names from include and exclude files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1162) NM auxiliary service invocations should be try/catch
[ https://issues.apache.org/jira/browse/YARN-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik reassigned YARN-1162: -- Assignee: Roman Shaposhnik NM auxiliary service invocations should be try/catch Key: YARN-1162 URL: https://issues.apache.org/jira/browse/YARN-1162 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Priority: Critical Fix For: 2.1.1-beta The {{AuxiliaryServices#handle()}} should try/catch all invocations of auxiliary services to isolate failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips
[ https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-1155. - Resolution: Invalid RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips --- Key: YARN-1155 URL: https://issues.apache.org/jira/browse/YARN-1155 Project: Hadoop YARN Issue Type: Bug Reporter: yeshavora Assignee: Xuan Gong RM should be able to resolve both ips and host names from include and exclude files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-1162) NM auxiliary service invocations should be try/catch
[ https://issues.apache.org/jira/browse/YARN-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1162. --- Resolution: Duplicate Fix Version/s: (was: 2.1.1-beta) YARN-867 is already doing this, closing as duplicate. NM auxiliary service invocations should be try/catch Key: YARN-1162 URL: https://issues.apache.org/jira/browse/YARN-1162 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Priority: Critical The {{AuxiliaryServices#handle()}} should try/catch all invocations of auxiliary services to isolate failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips
[ https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760630#comment-13760630 ] Xuan Gong commented on YARN-1155: - verified that we already have such logic: {code} public boolean isValidNode(String hostName) { synchronized (hostsReader) { SetString hostsList = hostsReader.getHosts(); SetString excludeList = hostsReader.getExcludedHosts(); String ip = NetUtils.normalizeHostName(hostName); return (hostsList.isEmpty() || hostsList.contains(hostName) || hostsList .contains(ip)) !(excludeList.contains(hostName) || excludeList.contains(ip)); } } {code} RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips --- Key: YARN-1155 URL: https://issues.apache.org/jira/browse/YARN-1155 Project: Hadoop YARN Issue Type: Bug Reporter: yeshavora Assignee: Xuan Gong RM should be able to resolve both ips and host names from include and exclude files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()
Karthik Kambatla created YARN-1165: -- Summary: Move init() of activeServices to ResourceManager#serviceStart() Key: YARN-1165 URL: https://issues.apache.org/jira/browse/YARN-1165 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Background: # YARN-1098 separates out RM services into Always-On and Active services, but doesn't change the behavior in any way. # For YARN-1027, we would want to create, initialize, and start RMActiveServices in the context of RM#serviceStart(). This requires updating test cases that check for certain behavior post RM#serviceInit() - otherwise, most of these tests NPE. Creating a JIRA different from YARN-1027 to address all these test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()
[ https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1165: --- Description: Background: # YARN-1098 separates out RM services into Always-On and Active services, but doesn't change the behavior in any way. # For YARN-1027, we would want to create, initialize, and start RMActiveServices in the scope of RM#serviceStart(). This requires updating test cases that check for certain behavior post RM#serviceInit() - otherwise, most of these tests NPE. Creating a JIRA different from YARN-1027 to address all these test cases. was: Background: # YARN-1098 separates out RM services into Always-On and Active services, but doesn't change the behavior in any way. # For YARN-1027, we would want to create, initialize, and start RMActiveServices in the context of RM#serviceStart(). This requires updating test cases that check for certain behavior post RM#serviceInit() - otherwise, most of these tests NPE. Creating a JIRA different from YARN-1027 to address all these test cases. Move init() of activeServices to ResourceManager#serviceStart() --- Key: YARN-1165 URL: https://issues.apache.org/jira/browse/YARN-1165 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Background: # YARN-1098 separates out RM services into Always-On and Active services, but doesn't change the behavior in any way. # For YARN-1027, we would want to create, initialize, and start RMActiveServices in the scope of RM#serviceStart(). This requires updating test cases that check for certain behavior post RM#serviceInit() - otherwise, most of these tests NPE. Creating a JIRA different from YARN-1027 to address all these test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
Srimanth Gunturi created YARN-1166: -- Summary: YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1159) NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
[ https://issues.apache.org/jira/browse/YARN-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760715#comment-13760715 ] Tsuyoshi OZAWA commented on YARN-1159: -- OK, thanks. NodeManager reports Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL - Key: YARN-1159 URL: https://issues.apache.org/jira/browse/YARN-1159 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.1.1-beta When running MR PI, which runs successfully, the NM log reports: {code} 2013-09-06 11:45:29,368 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 5 cluster_timestamp: 1378450335207 } attemptId: 1 } id: 4 } state: C_RUNNING diagnostics: Container killed by the ApplicationMaster.\n exit_status: -1000 2013-09-06 11:45:29,390 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1378450335207_0005_01_04 is : 143 2013-09-06 11:45:29,425 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2013-09-06 11:45:29,426 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:684) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:677) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:722) 2013-09-06 11:45:29,426 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1378450335207_0005_01_04 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()
[ https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1165: --- Attachment: test-failures.pdf Moving creation, init() and start() of activeServices to RM#serviceStart(). Attaching the output of running tests in hadoop-yarn-server-resourcemanager - this alone is 149 tests. Changing these tests to use RM#start() in addition to RM#init() would lead to significantly longer test run times. Also, once HADOOP-9933 is fixed, we will have to undo this. The best way forward might be to bite the bullet and fix HADOOP-9933 first. The other alternative is to implement RM#initForTesting() that instantiates and initializes RMActiveServices - however, that is too much of a hack and I am not sure if we should do that. Thoughts? [~bikassaha], [~vinodkv], [~stevel]? Move init() of activeServices to ResourceManager#serviceStart() --- Key: YARN-1165 URL: https://issues.apache.org/jira/browse/YARN-1165 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: test-failures.pdf Background: # YARN-1098 separates out RM services into Always-On and Active services, but doesn't change the behavior in any way. # For YARN-1027, we would want to create, initialize, and start RMActiveServices in the scope of RM#serviceStart(). This requires updating test cases that check for certain behavior post RM#serviceInit() - otherwise, most of these tests NPE. Creating a JIRA different from YARN-1027 to address all these test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-713: --- Attachment: YARN-713.09062013.1.patch ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.3.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-609) Fix synchronization issues in APIs which take in lists
[ https://issues.apache.org/jira/browse/YARN-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-609: --- Attachment: YARN-609.7.patch Fix -1 findbug Fix synchronization issues in APIs which take in lists -- Key: YARN-609 URL: https://issues.apache.org/jira/browse/YARN-609 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-609.1.patch, YARN-609.2.patch, YARN-609.3.patch, YARN-609.4.patch, YARN-609.5.patch, YARN-609.6.patch, YARN-609.7.patch Some of the APIs take in lists and the setter-APIs don't always do proper synchronization. We need to fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srimanth Gunturi reopened YARN-1166: Talked with [~jianhe] and [~vinodkv], and this metric should be cumulative. Hence the original request that this metric be a 'counter' is valid. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()
[ https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760747#comment-13760747 ] Bikas Saha commented on YARN-1165: -- Are we getting the test failures even with the approach taken in the current patch uploaded to YARN-1027? That patch was adding activeServices to the RM when ha is not enabled and thus mimicking current RM behavior. Thus RM init would cause activeService init and RM start would cause activeService start. So its not clear to me why the current patch would break the tests. How does HADOOP-9933 fix this? The problem happens before services are stopped. So ability to restart them sounds unrelated. Move init() of activeServices to ResourceManager#serviceStart() --- Key: YARN-1165 URL: https://issues.apache.org/jira/browse/YARN-1165 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: test-failures.pdf Background: # YARN-1098 separates out RM services into Always-On and Active services, but doesn't change the behavior in any way. # For YARN-1027, we would want to create, initialize, and start RMActiveServices in the scope of RM#serviceStart(). This requires updating test cases that check for certain behavior post RM#serviceInit() - otherwise, most of these tests NPE. Creating a JIRA different from YARN-1027 to address all these test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srimanth Gunturi resolved YARN-1166. Resolution: Not A Problem Turns out 'AppsFailed' should be interpreted as 'AppsFailing'. Its value is decremented when it is resubmitted in subsequent attempts. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol
[ https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760752#comment-13760752 ] Bikas Saha commented on YARN-1027: -- RMHAProtocolService can be made available via RMContext and thus accessible to everyone who has access to RMContext. In that case we probably mean package and not protected since there is no inheritance story here. I dont think we need a test (although that would be awesome). If we can manually verify then it should be sufficient for now I guess. Implement RMHAServiceProtocol - Key: YARN-1027 URL: https://issues.apache.org/jira/browse/YARN-1027 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: test-yarn-1027.patch, yarn-1027-1.patch, yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760460#comment-13760460 ] Zhijie Shen commented on YARN-1001: --- bq. we are expecting /ws/v1/cluster/appscount to provide all app-types/state-counts in 1 call To prevent RM from being overwhelmed, when no params is specified, the API returns a empty response. Users must supply the states and the types (at least one) to get the counts. bq. Apart from that, we need /ws/v1/cluster/appscount information pushed to Ganglia. Though some discussion, we'd like to exclude this requirement from the scope of this jira due to the performance concern. We can open a ticket to trace it separately. Another issue we need to clarify is that the counts depend on the current apps in RMContext. Old finished apps may be removed from the context if the total app number reaches the limit. Therefore, the count of completed apps may only reflect the count of those in RMContext. Should emphasize the apps in RMContext in the document. YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Attachments: YARN-1001.1.patch In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760757#comment-13760757 ] Omkar Vinit Joshi commented on YARN-1107: - Thanks Vinod.. bq. Put both RMDelegationTokenSecretManager and ClientRMService in RMContext. Then you don't need delegationTokenRenewer.setClientRMService() and ClientRMService.getDelegationTokenSecretManager(). bq. You can add an assert in DelegationTokenRenewer.serviceStart() to check for ClientRMService.start() after the comment. It'll be useful if tests enable assertions, can you check? done.. bq. RMDelegationTokenIdentifier.Renewer.setSecretManager is moved into ClientRMService, but not so in the test. Can we change that. fixed bq. Please also take care of the test-issue and the findbugs warning. Done.. Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1027) Implement RMHAServiceProtocol
[ https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1027: --- Attachment: yarn-1027-4.patch Patch addressing comments from Bikas. Implement RMHAServiceProtocol - Key: YARN-1027 URL: https://issues.apache.org/jira/browse/YARN-1027 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: test-yarn-1027.patch, yarn-1027-1.patch, yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-758: - Priority: Minor (was: Major) Target Version/s: 2.3.0, 2.1.1-beta (was: 2.3.0) Fix Version/s: (was: 2.3.0) 2.1.1-beta Issue Type: Improvement (was: Bug) Augment MockNM to use multiple cores Key: YARN-758 URL: https://issues.apache.org/jira/browse/YARN-758 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Karthik Kambatla Priority: Minor Fix For: 2.1.1-beta Attachments: yarn-758-1.patch, yarn-758-2.patch YARN-757 got fixed by changing the scheduler from Fair to default (which is capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-758: - bq. This was reopened to ensure the patch was tested. Now, resolving this on the basis of tests via TestRMRestart on FairScheduler. Please reopen if needed. Thanks. I actually meant that we don't test this automatically, we don't have a replacement for TestRMRestart with FairScheduler in the test suite. Anyways, too late and too small a thing to worry about. OTOH, some patches started depending on this in 2.1-beta. So I just merged this into branch-2.1-beta. Augment MockNM to use multiple cores Key: YARN-758 URL: https://issues.apache.org/jira/browse/YARN-758 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Karthik Kambatla Fix For: 2.3.0 Attachments: yarn-758-1.patch, yarn-758-2.patch YARN-757 got fixed by changing the scheduler from Fair to default (which is capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol
[ https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760774#comment-13760774 ] Karthik Kambatla commented on YARN-1027: In yarn-1027-4.patch, the RM always addService(HAServiceProtocol). HAServiceProtocol is the one that checks if haEnabled in serviceStart(). If enabled then it transitions to standby and waits for active signal. If not, then it directly transitions to active. However, post RM#init(), RM fields are not instantiated (e.g. TokenManagers) leading a bunch of test failures. Implement RMHAServiceProtocol - Key: YARN-1027 URL: https://issues.apache.org/jira/browse/YARN-1027 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: test-yarn-1027.patch, yarn-1027-1.patch, yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1165) Move init() of activeServices to ResourceManager#serviceStart()
[ https://issues.apache.org/jira/browse/YARN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760775#comment-13760775 ] Karthik Kambatla commented on YARN-1165: The current patch uploaded to YARN-1027 (yarn-1027-3.patch) doesn't have this issue because the RM does the following in serviceInit(): {code} if (haEnabled) { haService = new RMHAProtocolService(this); addService(haService); } else { activeServices = new RMActiveServices(); addService(activeServices); } super.serviceInit(conf); {code} If we were to (1) move handling of RMActiveServices to RMHAProtocolService, and (2) the RM to start the RMHAProtocolService always irrespective of HA being enabled or not, we will run into this. I just uploaded yarn-1027-4.patch that implements this. If HADOOP-9933 were fixed, we could initialize everything RM, RMHAProtocolService, and RMActiveServices on RM#init(). Transition to Active would be RMActiveServices#start(), and transition to Standby would be RMActiveServices#stop(). Move init() of activeServices to ResourceManager#serviceStart() --- Key: YARN-1165 URL: https://issues.apache.org/jira/browse/YARN-1165 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: test-failures.pdf Background: # YARN-1098 separates out RM services into Always-On and Active services, but doesn't change the behavior in any way. # For YARN-1027, we would want to create, initialize, and start RMActiveServices in the scope of RM#serviceStart(). This requires updating test cases that check for certain behavior post RM#serviceInit() - otherwise, most of these tests NPE. Creating a JIRA different from YARN-1027 to address all these test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1107: Attachment: YARN-1107.20130906.1.patch Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, YARN-1107.20130906.1.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760826#comment-13760826 ] Hadoop QA commented on YARN-1107: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601937/YARN-1107.20130906.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1868//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1868//console This message is automatically generated. Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, YARN-1107.20130906.1.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1152) Invalid key to HMAC computation error when getting application report for completed app attempt
[ https://issues.apache.org/jira/browse/YARN-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760842#comment-13760842 ] Vinod Kumar Vavilapalli commented on YARN-1152: --- BTW, forgot to mention that, your test TestRMAppTransitions is good in that I was able to revert just RMAppImpl changes and reproduce the issue. So thumbs up! Invalid key to HMAC computation error when getting application report for completed app attempt --- Key: YARN-1152 URL: https://issues.apache.org/jira/browse/YARN-1152 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-1152.txt On a secure cluster, an invalid key to HMAC error is thrown when trying to get an application report for an application with an attempt that has unregistered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
Tassapol Athiapinya created YARN-1167: - Summary: Submitted distributed shell application shows appMasterHost = empty Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Fix For: 2.1.1-beta Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760870#comment-13760870 ] Omkar Vinit Joshi commented on YARN-1107: - removing locking.. Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, YARN-1107.20130906.1.patch, YARN-1107.20130906.2.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1107: Attachment: YARN-1107.20130906.2.patch Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, YARN-1107.20130906.1.patch, YARN-1107.20130906.2.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-540: - Attachment: YARN-540.4.patch Upload a patch that changes FinishApplicationMasterResponse to contain a response-completed field and MR AM and AMRMClient are changed to retry till it becomes true. Also fixed Bikas's last comments Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, YARN-540.4.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760896#comment-13760896 ] Hadoop QA commented on YARN-540: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601956/YARN-540.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1870//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1870//console This message is automatically generated. Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, YARN-540.4.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760925#comment-13760925 ] Vinod Kumar Vavilapalli commented on YARN-1107: --- +1 for the latest patch. I just made sure that without the core change, TestRMRestart fails. So we are good to go. Checking this in. Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, YARN-1107.20130906.1.patch, YARN-1107.20130906.2.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-540: - Attachment: YARN-540.5.patch New patch fixed the test case Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, YARN-540.4.patch, YARN-540.5.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760930#comment-13760930 ] Hudson commented on YARN-1107: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4383 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4383/]) YARN-1107. Fixed a bug in ResourceManager because of which RM in secure mode fails to restart. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1520726) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch, YARN-1107.20130906.1.patch, YARN-1107.20130906.2.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira