[jira] [Updated] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue without cluster lables
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2918: - Issue Type: Sub-task (was: Bug) Parent: YARN-2492 RM starts up fails if accessible-node-labels are configured to queue without cluster lables --- Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Rohith I configured accessible-node-labels to queue. But RM startup fails with below exception. I see current steps to configure NodeLabel is first need to add via rmadmin and later need to configure for queues. But it will be good if both cluster and queue node labels has consitency in configuring it. {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203) Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2410: - Summary: Nodemanager ShuffleHandler can possible exhaust file descriptors (was: Nodemanager ShuffleHandler can easily exhaust file descriptors) Nodemanager ShuffleHandler can possible exhaust file descriptors Key: YARN-2410 URL: https://issues.apache.org/jira/browse/YARN-2410 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Nathan Roberts Assignee: Chen He The async nature of the shufflehandler can cause it to open a huge number of file descriptors, when it runs out it crashes. Scenario: Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. Let's say all 6K reduces hit a node at about same time asking for their outputs. Each reducer will ask for all 40 map outputs over a single socket in a single request (not necessarily all 40 at once, but with coalescing it is likely to be a large number). sendMapOutput() will open the file for random reading and then perform an async transfer of the particular portion of this file(). This will theoretically happen 6000*40=24 times which will run the NM out of file descriptors and cause it to crash. The algorithm should be refactored a little to not open the fds until they're actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources
[ https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524153#comment-14524153 ] Zhijie Shen commented on YARN-2266: --- Are we still interested in this enhancement? Otherwise, we can close this jira as won't fix. Add an application timeout service in RM to kill applications which are not getting resources - Key: YARN-2266 URL: https://issues.apache.org/jira/browse/YARN-2266 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Ashutosh Jindal Currently , If an application is submitted to RM, the app keeps waiting until the resources are allocated for AM. Such an application may be stuck till a resource is allocated for AM, and this may be due to over utilization of Queue or User limits etc. In a production cluster, some periodic running applications may have lesser cluster share. So after waiting for some time, if resources are not available, such applications can be made as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-886) make APPLICATION_STOP consistent with APPLICATION_INIT
[ https://issues.apache.org/jira/browse/YARN-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524210#comment-14524210 ] Siddharth Seth commented on YARN-886: - [~djp] - this looks like it's still valid. START is sent to the service that the app specified. STOP is sent to all AuxServices. make APPLICATION_STOP consistent with APPLICATION_INIT -- Key: YARN-886 URL: https://issues.apache.org/jira/browse/YARN-886 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.0.4-alpha Reporter: Avner BenHanoch Currently, there is inconsistency between the start/stop behaviour. See Siddharth's comment in MAPREDUCE-5329: The start/stop behaviour should be consistent. We shouldn't send the stop to all service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2419) RM applications page doesn't sort application id properly
[ https://issues.apache.org/jira/browse/YARN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2419: - Target Version/s: (was: 2.6.0) RM applications page doesn't sort application id properly - Key: YARN-2419 URL: https://issues.apache.org/jira/browse/YARN-2419 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Thomas Graves The ResourceManager apps page doesn't sort the application ids properly when the app id rolls over from to 1. When it rolls over the 1+ application ids end up being many pages down by the 0XXX numbers. I assume we just sort alphabetically so we would need a special sorter that knows about application ids. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2415) Expose MiniYARNCluster for use outside of YARN
[ https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524255#comment-14524255 ] Junping Du commented on YARN-2415: -- Hi [~ka...@cloudera.com] and [~ywskycn], do we have a plan for it? Expose MiniYARNCluster for use outside of YARN -- Key: YARN-2415 URL: https://issues.apache.org/jira/browse/YARN-2415 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.5.0 Reporter: Hari Shreedharan Assignee: Wei Yan The MR/HDFS equivalents are available for applications to use in tests, but the YARN Mini cluster is not. It would be really useful to test applications that are written to run on YARN (like Spark) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2318) hadoop configuraion checker
[ https://issues.apache.org/jira/browse/YARN-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524286#comment-14524286 ] Zhijie Shen commented on YARN-2318: --- Do we still need this feature? Or we can close the jira as won't fix. hadoop configuraion checker --- Key: YARN-2318 URL: https://issues.apache.org/jira/browse/YARN-2318 Project: Hadoop YARN Issue Type: New Feature Reporter: tangjunjie hadoop have a lot of config property. People will make mistake when modify configuration file. So hadoop can do config check tool .This tool can find mistake as follow. if config property namemapreduce.tasktracker.reduce.tasks.maximu/name should be mapreduce.tasktracker.reduce.tasks.maximum value9/value descriptionThe maximum number of reduce tasks that will be run simultaneously by a task tracker. /description /property OR this tool can warn use deprecated property name and correct it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner
[ https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3375: - Priority: Critical (was: Major) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner -- Key: YARN-3375 URL: https://issues.apache.org/jira/browse/YARN-3375 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Devaraj K Assignee: Devaraj K Priority: Critical Attachments: YARN-3375.patch 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting the NodeHealthScriptRunner. {code:title=NodeManager.java|borderStyle=solid} if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) { LOG.info(Abey khali); return null; } {code} {code:title=NodeHealthCheckerService.java|borderStyle=solid} if (NodeHealthScriptRunner.shouldRun( conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) { addService(nodeHealthScriptRunner); } {code} {code:title=NodeHealthScriptRunner.java|borderStyle=solid} if (!shouldRun(nodeHealthScript)) { LOG.info(Not starting node health monitor); return; } {code} 2. If we don't configure node health script or configured health script doesn't execute permission, NM logs with the below message. {code:xml} 2015-03-19 19:55:45,713 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner
[ https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3375: - Target Version/s: 2.8.0 (was: 3.0.0) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner -- Key: YARN-3375 URL: https://issues.apache.org/jira/browse/YARN-3375 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Devaraj K Assignee: Devaraj K Attachments: YARN-3375.patch 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting the NodeHealthScriptRunner. {code:title=NodeManager.java|borderStyle=solid} if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) { LOG.info(Abey khali); return null; } {code} {code:title=NodeHealthCheckerService.java|borderStyle=solid} if (NodeHealthScriptRunner.shouldRun( conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) { addService(nodeHealthScriptRunner); } {code} {code:title=NodeHealthScriptRunner.java|borderStyle=solid} if (!shouldRun(nodeHealthScript)) { LOG.info(Not starting node health monitor); return; } {code} 2. If we don't configure node health script or configured health script doesn't execute permission, NM logs with the below message. {code:xml} 2015-03-19 19:55:45,713 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner
[ https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524289#comment-14524289 ] Wangda Tan commented on YARN-3375: -- +1 also, rekicked Jenkins. NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner -- Key: YARN-3375 URL: https://issues.apache.org/jira/browse/YARN-3375 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Devaraj K Assignee: Devaraj K Priority: Critical Attachments: YARN-3375.patch 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting the NodeHealthScriptRunner. {code:title=NodeManager.java|borderStyle=solid} if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) { LOG.info(Abey khali); return null; } {code} {code:title=NodeHealthCheckerService.java|borderStyle=solid} if (NodeHealthScriptRunner.shouldRun( conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) { addService(nodeHealthScriptRunner); } {code} {code:title=NodeHealthScriptRunner.java|borderStyle=solid} if (!shouldRun(nodeHealthScript)) { LOG.info(Not starting node health monitor); return; } {code} 2. If we don't configure node health script or configured health script doesn't execute permission, NM logs with the below message. {code:xml} 2015-03-19 19:55:45,713 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2289) ApplicationHistoryStore should be versioned
[ https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2289. --- Resolution: Won't Fix We won't do improvement for GHS ApplicationHistoryStore should be versioned --- Key: YARN-2289 URL: https://issues.apache.org/jira/browse/YARN-2289 Project: Hadoop YARN Issue Type: Sub-task Components: applications Reporter: Junping Du Assignee: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2454) The function compareTo of variable UNBOUNDED in org.apache.hadoop.yarn.util.resource.Resources is definited wrong.
[ https://issues.apache.org/jira/browse/YARN-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524488#comment-14524488 ] Hadoop QA commented on YARN-2454: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 24s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | | | 38m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12664364/YARN-2454%20-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7592/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7592/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7592/console | This message was automatically generated. The function compareTo of variable UNBOUNDED in org.apache.hadoop.yarn.util.resource.Resources is definited wrong. -- Key: YARN-2454 URL: https://issues.apache.org/jira/browse/YARN-2454 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0, 2.5.0, 2.4.1 Reporter: Xu Yang Assignee: Xu Yang Attachments: YARN-2454 -v2.patch, YARN-2454-patch.diff, YARN-2454.patch The variable UNBOUNDED implement the abstract class Resources, and override the function compareTo. But there is something wrong in this function. We should not compare resources with zero as the same as the variable NONE. We should change 0 to Integer.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524486#comment-14524486 ] zhihai xu commented on YARN-2893: - thanks [~adhoot] for the review and thanks [~jira.shegalov] for the review and committing the patch ! Greatly appreciated. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1772) Fair Scheduler documentation should indicate that admin ACLs also give submit permissions
[ https://issues.apache.org/jira/browse/YARN-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524523#comment-14524523 ] Naren Koneru commented on YARN-1772: Hi Jian, I won't be able to.. Feel free to take it.. Fair Scheduler documentation should indicate that admin ACLs also give submit permissions - Key: YARN-1772 URL: https://issues.apache.org/jira/browse/YARN-1772 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Naren Koneru I can submit to a Fair Scheduler queue if I'm in the submit ACL OR if I'm in the administer ACL. The Fair Scheduler docs seem to leave out the second part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2151) FairScheduler option for global preemption within hierarchical queues
[ https://issues.apache.org/jira/browse/YARN-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524855#comment-14524855 ] Hadoop QA commented on YARN-2151: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12649887/YARN-2151.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7614/console | This message was automatically generated. FairScheduler option for global preemption within hierarchical queues - Key: YARN-2151 URL: https://issues.apache.org/jira/browse/YARN-2151 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Andrey Stepachev Attachments: YARN-2151.patch FairScheduler has hierarchical queues, but fair share calculation and preemption still works withing a limited range and effectively still nonhierarchical. This patch solves this incompleteness in two aspects: 1. Currently MinShare is not propagated to upper queue, that leads to fair share calculation ignores all Min Shares in deeper queues. Lets take an example (implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues) {code} ?xml version=1.0? allocations queue name=queue1 maxResources10240mb, 10vcores/maxResources queue name=big/ queue name=sub1 schedulingPolicyfair/schedulingPolicy queue name=sub11 minResources6192mb, 6vcores/minResources /queue /queue queue name=sub2 /queue /queue /allocations {code} Then bigApp started within queue1.big with 10x1GB containers. That effectively eats all maximum allowed resources for queue1. Subsequent requests for app1 (queue1.sub1.sub11) and app2 (queue1.sub2) (5x1GB each) will wait for free resources. Take a note, that sub11 has min share requirements for 6x1GB. Without given patch fair share will be calculated with no knowledge about min share requirements and app1 and app2 will get equal number of containers. With the patch resources will split according to min share ( in test it will be 5 for app1 and 1 for app2) That behaviour controlled by the same parameter as ‘globalPreemtion’, but that can be changed easily. Implementation is a bit awkward, but seems that method for min share recalculation can be exposed as public or protected api and constructor in FSQueue can call it before using minShare getter. But right now current implementation with nulls should work too. 2. Preemption doesn’t works between queues on different level for the queues hierarchy. Moreover, it is not possible to override various parameters for children queues. This patch adds parameter ‘globalPreemption’, which enables global preemption algorithm modifications. In a nutshell patch adds function shouldAttemptPreemption(queue), which can calculate usage for nested queues, and if queue with usage more that specified threshold is found, preemption can be triggered. Aggregated minShare does the rest of work and preemption will work as expected within hierarchy of queues with different MinShare/MaxShare specifications on different levels. Test case TestFairScheduler#testGlobalPreemption depicts how it works. One big app gets resources above its fair share and app1 has a declared min share. On submission code finds that starvation and preempts enough containers to give enough room for app1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524861#comment-14524861 ] Hadoop QA commented on YARN-2142: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12654924/final.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7616/console | This message was automatically generated. Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. Reporter: anders Priority: Minor Labels: features Attachments: final.patch, trust.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. ***Only in branch-2.2.0 , not in trunk*** OAT wiki link: https://github.com/OpenAttestation/OpenAttestation/wiki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524869#comment-14524869 ] Hadoop QA commented on YARN-2325: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12656795/YARN-2325.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7618/console | This message was automatically generated. need check whether node is null in nodeUpdate for FairScheduler Key: YARN-2325 URL: https://issues.apache.org/jira/browse/YARN-2325 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2325.000.patch need check whether node is null in nodeUpdate for FairScheduler. If nodeUpdate is called after removeNode, the getFSSchedulerNode will be null. If the node is null, we should return with error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1515) Provide ContainerManagementProtocol#signalContainer processing a batch of signals
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524856#comment-14524856 ] Hadoop QA commented on YARN-1515: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12645519/YARN-1515.v08.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7615/console | This message was automatically generated. Provide ContainerManagementProtocol#signalContainer processing a batch of signals -- Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch, YARN-1515.v06.patch, YARN-1515.v07.patch, YARN-1515.v08.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-641) Make AMLauncher in RM Use NMClient
[ https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524867#comment-14524867 ] Hadoop QA commented on YARN-641: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12587395/YARN-641.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7617/console | This message was automatically generated. Make AMLauncher in RM Use NMClient -- Key: YARN-641 URL: https://issues.apache.org/jira/browse/YARN-641 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-641.1.patch, YARN-641.2.patch, YARN-641.3.patch YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions with an application's AM container. AMLauncher should also replace the raw ContainerManager proxy with NMClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT
[ https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524847#comment-14524847 ] Hadoop QA commented on YARN-126: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12580129/YARN-126.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7613/console | This message was automatically generated. yarn rmadmin help message contains reference to hadoop cli and JT - Key: YARN-126 URL: https://issues.apache.org/jira/browse/YARN-126 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Rémy SAISSY Labels: usability Attachments: YARN-126.patch has option to specify a job tracker and the last line for general command line syntax had bin/hadoop command [genericOptions] [commandOptions] ran yarn rmadmin to get usage: RMAdmin Usage: java RMAdmin [-refreshQueues] [-refreshNodes] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshAdminAcls] [-refreshServiceAcl] [-help [cmd]] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1287) Consolidate MockClocks
[ https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524835#comment-14524835 ] Hadoop QA commented on YARN-1287: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12621781/YARN-1287-3.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7611/console | This message was automatically generated. Consolidate MockClocks -- Key: YARN-1287 URL: https://issues.apache.org/jira/browse/YARN-1287 Project: Hadoop YARN Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sebastian Wong Labels: newbie Attachments: YARN-1287-3.patch A bunch of different tests have near-identical implementations of MockClock. TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for example. They should be consolidated into a single MockClock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-670) Add an Exception to indicate 'Maintenance' for NMs and add this to the JavaDoc for appropriate protocols
[ https://issues.apache.org/jira/browse/YARN-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524141#comment-14524141 ] Junping Du commented on YARN-670: - Resolve this as won't fix as Rolling upgrade won't need Maintenance model as containers can still be running when NM goes down. For gracefully decommission, we have YARN-3212 to make sure no new container get assigned on decommissioning container. Add an Exception to indicate 'Maintenance' for NMs and add this to the JavaDoc for appropriate protocols Key: YARN-670 URL: https://issues.apache.org/jira/browse/YARN-670 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-1688) Rethinking about POJO Classes
[ https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reopened YARN-1688: --- Rethinking about POJO Classes - Key: YARN-1688 URL: https://issues.apache.org/jira/browse/YARN-1688 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen We need to think about how the POJO classes evolve. Should we back up them with proto and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes
[ https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1688. --- Resolution: Fixed YARN-3539 will state timeline v1 APIs stable. We won't change v1 pojo classes. Rethinking about POJO Classes - Key: YARN-1688 URL: https://issues.apache.org/jira/browse/YARN-1688 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen We need to think about how the POJO classes evolve. Should we back up them with proto and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes
[ https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1688. --- Resolution: Won't Fix Rethinking about POJO Classes - Key: YARN-1688 URL: https://issues.apache.org/jira/browse/YARN-1688 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen We need to think about how the POJO classes evolve. Should we back up them with proto and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1733) Intermittent failed for TestRMWebServicesApps
[ https://issues.apache.org/jira/browse/YARN-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-1733. --- Resolution: Cannot Reproduce Intermittent failed for TestRMWebServicesApps - Key: YARN-1733 URL: https://issues.apache.org/jira/browse/YARN-1733 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du In some Jenkins tests (like: YARN-1506, YARN-1641), TestRMWebServicesApps get failed with log as: java.lang.AssertionError: incorrect number of elements expected:20 but was:18 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.verifyAppInfo(TestRMWebServicesApps.java:1321) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleAppsHelper(TestRMWebServicesApps.java:1261) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleApp(TestRMWebServicesApps.java:1153) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1971) WindowsLocalWrapperScriptBuilder does not check for errors in generated script
[ https://issues.apache.org/jira/browse/YARN-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524190#comment-14524190 ] Xuan Gong commented on YARN-1971: - [~rusanu] bq. These can fail due to access permissions, disc out of space, bad hardware, cosmic rays etc etc. There should be proper error checking to ease troubleshooting. I agree. The script can fail due to those issues. But for example disc out of space, bad hardware, belongs to NM issue. We have already handle them at NM side. So, do we really need some pre-check for those issues ? It might not be easy. WindowsLocalWrapperScriptBuilder does not check for errors in generated script -- Key: YARN-1971 URL: https://issues.apache.org/jira/browse/YARN-1971 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Similar to YARN-1865. The DefaultContainerExecutor.WindowsLocalWrapperScriptBuilder builds a shell script that contains commands that potentially may fail: {code} pout.println(@echo + containerIdStr ++ normalizedPidFile +.tmp); pout.println(@move /Y + normalizedPidFile + .tmp + normalizedPidFile); {code} These can fail due to access permissions, disc out of space, bad hardware, cosmic rays etc etc. There should be proper error checking to ease troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3518: Labels: configuration newbie (was: newbie) default rm/am expire interval should not less than default resourcemanager connect wait time Key: YARN-3518 URL: https://issues.apache.org/jira/browse/YARN-3518 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: sandflee Labels: configuration, newbie Attachments: YARN-3518.001.patch take am for example, if am can't connect to RM, after am expire (600s), RM relaunch am, and there will be two am at the same time util resourcemanager connect max wait time(900s) passed. DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS = 15 * 60 * 1000; DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60; DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3518: Labels: newbie (was: ) default rm/am expire interval should not less than default resourcemanager connect wait time Key: YARN-3518 URL: https://issues.apache.org/jira/browse/YARN-3518 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: sandflee Labels: configuration, newbie Attachments: YARN-3518.001.patch take am for example, if am can't connect to RM, after am expire (600s), RM relaunch am, and there will be two am at the same time util resourcemanager connect max wait time(900s) passed. DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS = 15 * 60 * 1000; DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60; DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2415) Expose MiniYARNCluster for use outside of YARN
[ https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2415: - Target Version/s: (was: 2.6.0) Expose MiniYARNCluster for use outside of YARN -- Key: YARN-2415 URL: https://issues.apache.org/jira/browse/YARN-2415 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.5.0 Reporter: Hari Shreedharan Assignee: Wei Yan The MR/HDFS equivalents are available for applications to use in tests, but the YARN Mini cluster is not. It would be really useful to test applications that are written to run on YARN (like Spark) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2425) When Application submitted by via Yarn RM WS, log aggregation does not happens
[ https://issues.apache.org/jira/browse/YARN-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524268#comment-14524268 ] Junping Du commented on YARN-2425: -- Is this still the issue? When Application submitted by via Yarn RM WS, log aggregation does not happens -- Key: YARN-2425 URL: https://issues.apache.org/jira/browse/YARN-2425 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.0, 2.6.0 Environment: Secure (Kerberos enabled) hadoop cluster. With SPNEGO for Yarn RM enabled Reporter: Karam Singh Assignee: Varun Vasudev When submit App to Yarn RM using Web service we need to pass credentials/tokens in json object/xml object to webservice As HDFS namenode does not provides any DT over WS (base64 encoded) like webhdfs/timeline server does. (HDFS fetch dt commad fetch java writable object and writes it to target file, we we cannot forward via application Submission WS objects) Looks like there is not way to pass HDFS token to NodeManager. While starting Application container also tries to create Application log aggregation dir and fails with following type exception {code} java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: hostname/ip; destination host is: NameNodeHost:FSPort; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:725) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1781) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1069) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1065) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1065) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:240) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:64) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:253) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:344) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:310) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:421) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:64) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner
[ https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524267#comment-14524267 ] Sidharta Seethana commented on YARN-3375: - +1 to the patch - the changes seem good to me. NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner -- Key: YARN-3375 URL: https://issues.apache.org/jira/browse/YARN-3375 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Devaraj K Assignee: Devaraj K Attachments: YARN-3375.patch 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting the NodeHealthScriptRunner. {code:title=NodeManager.java|borderStyle=solid} if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) { LOG.info(Abey khali); return null; } {code} {code:title=NodeHealthCheckerService.java|borderStyle=solid} if (NodeHealthScriptRunner.shouldRun( conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) { addService(nodeHealthScriptRunner); } {code} {code:title=NodeHealthScriptRunner.java|borderStyle=solid} if (!shouldRun(nodeHealthScript)) { LOG.info(Not starting node health monitor); return; } {code} 2. If we don't configure node health script or configured health script doesn't execute permission, NM logs with the below message. {code:xml} 2015-03-19 19:55:45,713 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524356#comment-14524356 ] Junping Du commented on YARN-2470: -- Agree with [~chris.douglas]. This shouldn't be a problem as this is expected as common behavior for other int value. Close it as won't fix. A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
[ https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524357#comment-14524357 ] Hadoop QA commented on YARN-2892: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 6s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 26s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler | | | hadoop.yarn.server.resourcemanager.TestAppManager | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12684732/YARN-2892.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d3d019c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7584/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7584/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7584/console | This message was automatically generated. Unable to get AMRMToken in unmanaged AM when using a secure cluster --- Key: YARN-2892 URL: https://issues.apache.org/jira/browse/YARN-2892 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Sevada Abraamyan Assignee: Sevada Abraamyan Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be Foo whereas the full username is f...@company.com Note: A very similar problem has been previously reported ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception an
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-2470. -- Resolution: Won't Fix A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3534: -- Attachment: YARN-3534-8.patch Fixing code style issues (nwo the output of the checker is meaningful so life is much easier). The broken tests weren't related to my changes. Any proposal for unit tests? The ones in ContainerMonitorImpl don't really apply. Collect node resource utilization - Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient
[ https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524493#comment-14524493 ] Hadoop QA commented on YARN-1917: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 58s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 29s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | mapreduce tests | 73m 26s | Tests failed in hadoop-mapreduce-client-jobclient. | | {color:red}-1{color} | yarn tests | 6m 57s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 6m 8s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 125m 51s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.mapred.TestMapRed | | | hadoop.yarn.client.api.impl.TestYarnClient | | Timed out tests | org.apache.hadoop.mapred.TestMiniMRClasspath | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729871/YARN-1917.20150501.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6f541ed | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7590/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7590/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7590/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7590/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7590/console | This message was automatically generated. Add waitForApplicationState interface to YarnClient - Key: YARN-1917 URL: https://issues.apache.org/jira/browse/YARN-1917 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-1917.20150501.1.patch, YARN-1917.patch, YARN-1917.patch, YARN-1917.patch Currently, YARN dosen't have this method. Users needs to write implementations like UnmanagedAMLauncher.monitorApplication or mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful to end users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1638) Add an integration test validating post, storage and retrival of entites+events
[ https://issues.apache.org/jira/browse/YARN-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1638. --- Resolution: Fixed We already have integration test in some way, such as in TestDistributedShell Add an integration test validating post, storage and retrival of entites+events --- Key: YARN-1638 URL: https://issues.apache.org/jira/browse/YARN-1638 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3552) RM Web UI shows -1 running containers for completed apps
[ https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3552: Labels: newbie (was: ) RM Web UI shows -1 running containers for completed apps Key: YARN-3552 URL: https://issues.apache.org/jira/browse/YARN-3552 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Rohith Assignee: Rohith Priority: Trivial Labels: newbie Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG In the RMServerUtils, the default values are negative number which results in the displayiing the RM web UI also negative number. {code} public static final ApplicationResourceUsageReport DUMMY_APPLICATION_RESOURCE_USAGE_REPORT = BuilderUtils.newApplicationResourceUsageReport(-1, -1, Resources.createResource(-1, -1), Resources.createResource(-1, -1), Resources.createResource(-1, -1), 0, 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3513) Remove unused variables in ContainersMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524215#comment-14524215 ] Li Lu commented on YARN-3513: - Hi [~Naganarasimha], thanks for the patch. +1 for removing {{vmemStillInUsage}} and {{pmemStillInUsage}}. However, I noticed that we're using content in the 2928 branch. Since we're planning for a branch merge, potentially soon, maybe it's fine to leave it there? Remove unused variables in ContainersMonitorImpl Key: YARN-3513 URL: https://issues.apache.org/jira/browse/YARN-3513 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3513.20150421-1.patch class members : {{private final Context context;}} and some local variables in MonitoringThread.run() : {{vmemStillInUsage and pmemStillInUsage}} are not used and just updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
[ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1530. --- Resolution: Fixed Timeline service v1 is almost done. Most functionality has been committed through multiple versions, but mostly completed before 2.6. There're still a few outstanding issues, which are kept open for further discussion. [Umbrella] Store, manage and serve per-framework application-timeline data -- Key: YARN-1530 URL: https://issues.apache.org/jira/browse/YARN-1530 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, application timeline design-20140116.pdf, application timeline design-20140130.pdf, application timeline design-20140210.pdf This is a sibling JIRA for YARN-321. Today, each application/framework has to do store, and serve per-framework data all by itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management and serving of per-framework data from various applications, both running and finished. The aim is to change YARN to collect and store data in a generic manner with plugin points for frameworks to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524219#comment-14524219 ] Sidharta Seethana commented on YARN-3381: - Patch seems to apply. I'll re-submit to Jenkins. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Attachments: YARN-3381-002.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-1935) Security for timeline server
[ https://issues.apache.org/jira/browse/YARN-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523964#comment-14523964 ] Zhijie Shen edited comment on YARN-1935 at 5/1/15 10:53 PM: Close the umbrella jira as the security work is almost done during 2.5 and 2.6. The only left issue is to put generic history data in a non-default domain in secure scenario. Since we don't go on to develop new feature for ATS v1, we can leave that jira (YARN-2622) open and see if we have the supporting requirement for it. was (Author: zjshen): Close the umbrella jira. The only left issue is to put generic history data in a non-default domain in secure scenario. Since we don't go on to develop new feature for ATS v1, we can leave that jira (YARN-2622) open and see if we have the supporting requirement for it. Security for timeline server Key: YARN-1935 URL: https://issues.apache.org/jira/browse/YARN-1935 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Zhijie Shen Attachments: Timeline Security Diagram.pdf, Timeline_Kerberos_DT_ACLs.2.patch, Timeline_Kerberos_DT_ACLs.patch Jira to track work to secure the ATS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3509) CollectorNodemanagerProtocol's authorization doesn't work
[ https://issues.apache.org/jira/browse/YARN-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524247#comment-14524247 ] Li Lu commented on YARN-3509: - Hi [~zjshen], thanks for working on this. I'm wondering if this problem will block any testing work for the YARN-2928 branch? If so, we may want to have a quick fix now, or else, I agree with [~djp] that we can wait a bit for the security design is ready. CollectorNodemanagerProtocol's authorization doesn't work - Key: YARN-3509 URL: https://issues.apache.org/jira/browse/YARN-3509 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, security, timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3509.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time
[ https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524246#comment-14524246 ] Junping Du commented on YARN-2416: -- Thanks for identifying and reporting the issue, [~john.jian.fang]! Add state transition from ALLOCATED to RUNNING. Mind deliver a fix for it? InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time Key: YARN-2416 URL: https://issues.apache.org/jira/browse/YARN-2416 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jian Fang AMLauncher calls startContainers(allRequests) to launch a container for application master. Normally, the call comes back immediately so that the RMAppAttempt changes its state from ALLOCATED to LAUNCHED. However, we do observed that in some cases, the RPC call came back very late but the AM container was already started. Because the RMAppAttempt stuck in ALLOCATED state, once resource manager received the REGISTERED event from the application master, it threw InvalidStateTransitonException as follows. 2014-07-05 08:59:05,021 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: REGISTERED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, resource manager kept throwing InvalidStateTransitonException. 2014-07-05 08:59:06,152 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) 2014-07-05 08:59:07,779 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_02_02 Container Transitioned from NEW to ALLOCATED 2014-07-05 08:59:07,779 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at
[jira] [Commented] (YARN-2256) Too many nodemanager and resourcemanager audit logs are generated
[ https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524274#comment-14524274 ] Hadoop QA commented on YARN-2256: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 4s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 5m 50s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 42m 51s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.security.TestNMTokenSecretManagerInNM | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12655753/YARN-2256.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d3d019c | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7581/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7581/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7581/console | This message was automatically generated. Too many nodemanager and resourcemanager audit logs are generated - Key: YARN-2256 URL: https://issues.apache.org/jira/browse/YARN-2256 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.4.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-2256.patch Following audit logs are generated too many times(due to the possibility of a large number of containers) : 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a container 2. In RM - Audit logs corresponding to AM allocating a container and AM releasing a container We can have different log levels even for NM and RM audit logs and move these successful container related logs to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2307) Capacity scheduler user only ADMINISTER_QUEUE also can submit app
[ https://issues.apache.org/jira/browse/YARN-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2307. --- Resolution: Invalid You probably miss setting {{yarn.acl.enable=true}} in yarn-site.xml. Close if for now. Feel free to reopen if it's not your case. Capacity scheduler user only ADMINISTER_QUEUE also can submit app -- Key: YARN-2307 URL: https://issues.apache.org/jira/browse/YARN-2307 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.3.0 Environment: hadoop 2.3.0 centos6.5 jdk1.7 Reporter: tangjunjie Priority: Minor Queue acls for user : root Queue Operations = root default china ADMINISTER_QUEUE unfunded user root only have ADMINISTER_QUEUE but user root can sumbit app to china queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1770) Execessive logging for app and attempts on RM recovery
[ https://issues.apache.org/jira/browse/YARN-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1770: -- Summary: Execessive logging for app and attempts on RM recovery (was: Too much logging for app and attempts on RM recovery) Execessive logging for app and attempts on RM recovery -- Key: YARN-1770 URL: https://issues.apache.org/jira/browse/YARN-1770 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Priority: Minor There's too much logging for app and attempts when RM is recovering, some of them are duplicates. we should consolidate them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1990) Track time-to-allocation for different size containers
[ https://issues.apache.org/jira/browse/YARN-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524290#comment-14524290 ] Xuan Gong commented on YARN-1990: - Close this ticket based on [~curino]'s comment. Track time-to-allocation for different size containers --- Key: YARN-1990 URL: https://issues.apache.org/jira/browse/YARN-1990 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Allocation of Large Containers are notoriously problematic, as smaller containers can more easily grab resources. The proposal for this JIRA is to maintain a map of container sizes, and time-to-allocation, that can be used as: * general insight on cluster behavior, * to inform the reservation-system, and allows us to account for delays in allocation, so that the user reservation is respected regardless the size of containers requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1990) Track time-to-allocation for different size containers
[ https://issues.apache.org/jira/browse/YARN-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-1990. - Resolution: Invalid Track time-to-allocation for different size containers --- Key: YARN-1990 URL: https://issues.apache.org/jira/browse/YARN-1990 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Allocation of Large Containers are notoriously problematic, as smaller containers can more easily grab resources. The proposal for this JIRA is to maintain a map of container sizes, and time-to-allocation, that can be used as: * general insight on cluster behavior, * to inform the reservation-system, and allows us to account for delays in allocation, so that the user reservation is respected regardless the size of containers requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2137) Add support for logaggregation to a path on non-default filecontext
[ https://issues.apache.org/jira/browse/YARN-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524303#comment-14524303 ] Xuan Gong commented on YARN-2137: - [~ksumit] Any update for this one ? Add support for logaggregation to a path on non-default filecontext --- Key: YARN-2137 URL: https://issues.apache.org/jira/browse/YARN-2137 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Affects Versions: 2.4.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Attachments: YARN-2137.patch Current log-aggregation implementation supports logaggregation to default filecontext only. This patch is to support logaggregation to any of the supported filesystems within hadoop eco-system (hdfs, s3, swiftfs etc). So for example a customer could use hdfs as default filesystem but use s3 or swiftfs for logaggregation. Current implementation makes mixed usages of FileContext+AbstractFileSystem apis as well as FileSystem apis which is confusing. This patch does two things: # moves logaggregation implementation to use only FileContext apis # adds support for doing log aggregation on non-default filesystem as well. # changes TestLogAggregationService to use local filesystem itself instead of mocking the behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID
[ https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524329#comment-14524329 ] Wangda Tan commented on YARN-3017: -- I can see this behavior in latest trunk as well: {code} 2015-05-01 00:53:44,575 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(793)) - appattempt_1430441527236_0001_01 State change from SUBMITTED to SCHEDULED 2015-05-01 00:53:44,928 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(394)) - container_1430441527236_0001_01_01 Container Transitioned from NEW to ALLOCATED {code} It's better to make them consistent ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID -- Key: YARN-3017 URL: https://issues.apache.org/jira/browse/YARN-3017 Project: Hadoop YARN Issue Type: Improvement Reporter: MUFEED USMAN Priority: Minor Not sure if this should be filed as a bug or not. In the ResourceManager log in the events surrounding the creation of a new application attempt, ... ... 2014-11-14 17:45:37,258 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1412150883650_0001_02 ... ... The application attempt has the ID format _1412150883650_0001_02. Whereas the associated ContainerID goes by _1412150883650_0001_02_. ... ... 2014-11-14 17:45:37,260 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1412150883650_0001_02_01, NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, vCores:1, disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service: 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02 ... ... Curious to know if this is kept like that for a reason. If not while using filtering tools to, say, grep events surrounding a specific attempt by the numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3401) [Security] users should not be able to create a generic TimelineEntity and associate arbitrary type
[ https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524328#comment-14524328 ] Li Lu commented on YARN-3401: - I just changed the title of this JIRA to security so that we're decoupling this JIRA with data model related changes. This JIRA is part of the (not-yet) proposed security design for timeline v2. I'm not sure the role of this JIRA after we have a comprehensive design, so I'm just linking this JIRA to the security JIRA so that we remember this use case. [Security] users should not be able to create a generic TimelineEntity and associate arbitrary type --- Key: YARN-3401 URL: https://issues.apache.org/jira/browse/YARN-3401 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R IIUC it is possible for users to create a generic TimelineEntity and set an arbitrary entity type. For example, for a YARN app, the right entity API is ApplicationEntity. However, today nothing stops users from instantiating a base TimelineEntity class and set the application type on it. This presents a problem in handling these YARN system entities in the storage layer for example. We need to ensure that the API allows only the right type of the class to be created for a given entity type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.
[ https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524352#comment-14524352 ] zhihai xu commented on YARN-3385: - Thanks [~sidharta-s], I uploaded a new patch YARN-3385.001.patch based on the latest code base. Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion. --- Key: YARN-3385 URL: https://issues.apache.org/jira/browse/YARN-3385 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3385.000.patch, YARN-3385.001.patch Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion(Op.delete). The race condition is similar as YARN-3023. since the race condition exists for ZK node creation, it should also exist for ZK node deletion. We see this issue with the following stack trace: {code} 2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.
[ https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3385: -- Target Version/s: 2.8.0 Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion. --- Key: YARN-3385 URL: https://issues.apache.org/jira/browse/YARN-3385 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3385.000.patch, YARN-3385.001.patch Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion(Op.delete). The race condition is similar as YARN-3023. since the race condition exists for ZK node creation, it should also exist for ZK node deletion. We see this issue with the following stack trace: {code} 2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.
[ https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524399#comment-14524399 ] zhihai xu commented on YARN-3385: - Agreed, If we have YARN-2716, this problem may be solved with it. thanks [~jianhe]! It may take sometime to stabilize YARN-2716, In the interim, it will be useful to fix this issue. Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion. --- Key: YARN-3385 URL: https://issues.apache.org/jira/browse/YARN-3385 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3385.000.patch, YARN-3385.001.patch Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion(Op.delete). The race condition is similar as YARN-3023. since the race condition exists for ZK node creation, it should also exist for ZK node deletion. We see this issue with the following stack trace: {code} 2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524400#comment-14524400 ] Hadoop QA commented on YARN-1743: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 49s | The applied patch generated 66 additional warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 14s | The applied patch generated 6 new checkstyle issues (total was , now 6). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 25s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 5m 51s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 46m 33s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12694668/YARN-1743-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6f541ed | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7588/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7588/console | This message was automatically generated. Decorate event transitions and the event-types with their behaviour --- Key: YARN-1743 URL: https://issues.apache.org/jira/browse/YARN-1743 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Jeff Zhang Labels: documentation Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, YARN-1743-3.patch, YARN-1743.patch Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1805) Signal container request delivery from resourcemanager to nodemanager
[ https://issues.apache.org/jira/browse/YARN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524810#comment-14524810 ] Hadoop QA commented on YARN-1805: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12643371/YARN-1805.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7609/console | This message was automatically generated. Signal container request delivery from resourcemanager to nodemanager - Key: YARN-1805 URL: https://issues.apache.org/jira/browse/YARN-1805 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Ming Ma Assignee: Ming Ma Attachments: YARN-1805.patch 1. Update ResourceTracker's HeartbeatResponse to include the list of SignalContainerRequest. 2. Upon receiving the request, NM's NodeStatusUpdater will deliver the request to ContainerManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524802#comment-14524802 ] Hadoop QA commented on YARN-1427: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12646616/YARN-1427.1.patch | | Optional Tests | | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7607/console | This message was automatically generated. yarn-env.cmd should have the analog comments that are in yarn-env.sh Key: YARN-1427 URL: https://issues.apache.org/jira/browse/YARN-1427 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Zhijie Shen Labels: newbie, windows Attachments: YARN-1427.1.patch There're the paragraphs of about RM/NM env vars (probably AHS as well soon) in yarn-env.sh. Should the windows version script provide the similar comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1803) Signal container support in nodemanager
[ https://issues.apache.org/jira/browse/YARN-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524800#comment-14524800 ] Hadoop QA commented on YARN-1803: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12643173/YARN-1803.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7606/console | This message was automatically generated. Signal container support in nodemanager --- Key: YARN-1803 URL: https://issues.apache.org/jira/browse/YARN-1803 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ming Ma Assignee: Ming Ma Attachments: YARN-1803.patch It could include the followings. 1. ContainerManager is able to process a new event type ContainerManagerEventType.SIGNAL_CONTAINERS coming from NodeStatusUpdater and deliver the request to ContainerExecutor. 2. Translate the platform independent signal command to Linux specific signals. Windows support will be tracked by another task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524805#comment-14524805 ] Hadoop QA commented on YARN-445: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7608/console | This message was automatically generated. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Task Components: nodemanager Reporter: Jason Lowe Assignee: Andrey Klochkov Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, YARN-445--n3.patch, YARN-445--n4.patch, YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524817#comment-14524817 ] Hadoop QA commented on YARN-2120: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12648681/YARN-2120.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7610/console | This message was automatically generated. Coloring queues running over minShare on RM Scheduler page -- Key: YARN-2120 URL: https://issues.apache.org/jira/browse/YARN-2120 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: 76AD6A72-9A0D-4F3A-A7B8-6EC1DCBD543A.png, YARN-2120.v1.patch, YARN-2120.v2.patch Today RM Scheduler page shows FairShare, Used, Used (over fair share) and MaxCapacity. Since fairShare is displaying with dotted line, I think we can stop displaying orange when a queue over its fairshare. It would be better to show a queue running over minShare with orange color, so that we know queue is running more than its min share. Also, we can display a queue running at maxShare with red color. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1714) Per user and per queue view in YARN RM
[ https://issues.apache.org/jira/browse/YARN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524103#comment-14524103 ] Jian He commented on YARN-1714: --- [~l201514], I think the RM rest api now supports the filtering. Is this jira to add web UI support? canceling the patch as it doesn't apply now. Per user and per queue view in YARN RM -- Key: YARN-1714 URL: https://issues.apache.org/jira/browse/YARN-1714 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-1714.v1.patch, YARN-1714.v2.patch, YARN-1714.v3.patch ResourceManager exposes either one or all jobs via WebUI. It would be good to have filter for user so that see only their jobs. Provide rest style url to access only user specified queue or user apps. For instance, http://hadoop-example.com:50030/cluster/user/toto displays apps owned by toto http://hadoop-example.com:50030/cluster/user/toto,glinda displays apps owned by toto and glinda http://hadoop-example.com:50030/cluster/queue/root.queue1 displays apps in root.queue1 http://hadoop-example.com:50030/cluster/queue/root.queue1,root.queue2 displays apps in root.queue1 and root.queue2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3554) Default value for maximum nodemanager connect wait time is too high
[ https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3554: Labels: newbie (was: ) Default value for maximum nodemanager connect wait time is too high --- Key: YARN-3554 URL: https://issues.apache.org/jira/browse/YARN-3554 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R Labels: newbie Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 msec or 15 minutes, which is way too high. The default container expiry time from the RM and the default task timeout in MapReduce are both only 10 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit
[ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524170#comment-14524170 ] Wangda Tan commented on YARN-3388: -- [~nroberts], any updates on this? Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit - Key: YARN-3388 URL: https://issues.apache.org/jira/browse/YARN-3388 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch When there are multiple active users in a queue, it should be possible for those users to make use of capacity up-to max_capacity (or close). The resources should be fairly distributed among the active users in the queue. This works pretty well when there is a single resource being scheduled. However, when there are multiple resources the situation gets more complex and the current algorithm tends to get stuck at Capacity. Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1664) Add a utility to retrieve the RM Principal (renewer for tokens)
[ https://issues.apache.org/jira/browse/YARN-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524166#comment-14524166 ] Jian He commented on YARN-1664: --- [~sseth], is this still needed ? how is Tez doing it now ? Add a utility to retrieve the RM Principal (renewer for tokens) --- Key: YARN-1664 URL: https://issues.apache.org/jira/browse/YARN-1664 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth Currently the logic to retrieve the renewer to be used while retrieving HDFS tokens resides in MapReduce. This should ideally be a utility in YARN since it's likely to be required by other applications as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2043) Rename internal names to being Timeline Service instead of application history
[ https://issues.apache.org/jira/browse/YARN-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2043. --- Resolution: Won't Fix We won't refactor ATS v1 any more Rename internal names to being Timeline Service instead of application history -- Key: YARN-2043 URL: https://issues.apache.org/jira/browse/YARN-2043 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Naganarasimha G R Like package and class names. In line with YARN-2033, YARN-1982 etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2060) Add an admin module for the timeline server
[ https://issues.apache.org/jira/browse/YARN-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2060. --- Resolution: Won't Fix We won't add new feature to ATS v1 Add an admin module for the timeline server --- Key: YARN-2060 URL: https://issues.apache.org/jira/browse/YARN-2060 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Like the job history server, it's good to have an admin module for the timeline server to allow the admin to manage the server on the fly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-321) [Umbrella] Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524113#comment-14524113 ] Zhijie Shen edited comment on YARN-321 at 5/1/15 10:47 PM: --- Close this umbrella jira with few sub tasks open. Generic history service has been implemented and rides on timeline server, but is not production ready. The subtasks have come into multiple versions, but mostly before 2.6. YARN-2271 is left open to track one possible performance issue to fetch all the applications stored in the timeline store. was (Author: zjshen): Close this umbrella jira with few sub tasks open. Generic history service has been implemented and rides on timeline server. YARN-2271 is left open to track one possible performance issue to fetch all the applications stored in the timeline store. [Umbrella] Generic application history service -- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2955) mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector
[ https://issues.apache.org/jira/browse/YARN-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-2955. -- Resolution: Invalid It's a warning and this JIRA doesn't contain necessary information, closing as invalid and [~jyf2100] please reopen it if you have more information on this. mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector - Key: YARN-2955 URL: https://issues.apache.org/jira/browse/YARN-2955 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Environment: cdh5.1.0 Reporter: Rocju 2014-12-12 02:26:55,047 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@dcnn2:23188 2014-12-12 02:26:55,052 WARN mortbay.log (Slf4jLog.java:warn(89)) - EXCEPTION java.lang.InterruptedException at java.lang.Object.wait(Native Method) at org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279) at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:545) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:639) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580) at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:154) at org.mortbay.jetty.AbstractGenerator$OutputWriter.write(AbstractGenerator.java:904) at org.mortbay.jetty.AbstractGenerator$OutputWriter.write(AbstractGenerator.java:755) at java.io.Writer.write(Writer.java:157) at java.io.PrintWriter.newLine(PrintWriter.java:480) at java.io.PrintWriter.println(PrintWriter.java:629) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._p(HamletImpl.java:110) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$SCRIPT._(Hamlet.java:454) at org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlock.render(AppsBlock.java:119) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:40) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30347) at org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlockWithMetrics.render(AppsBlockWithMetrics.java:29) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMDispatcher.service(RMDispatcher.java:77) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at
[jira] [Resolved] (YARN-1756) Capture one more timestamp for an application when ApplicationClientProtocol#getNewApplication is executed
[ https://issues.apache.org/jira/browse/YARN-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-1756. --- Resolution: Won't Fix Capture one more timestamp for an application when ApplicationClientProtocol#getNewApplication is executed -- Key: YARN-1756 URL: https://issues.apache.org/jira/browse/YARN-1756 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma The application submission time ( when submitApplication is called) is collected by RM and application history server. But it doesn't capture when the client calls newApplication method. The delta between newApplication and submitApplication could be useful if the client submits large jar files. This metric will be useful for https://issues.apache.org/jira/browse/YARN-1492. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1756) Capture one more timestamp for an application when ApplicationClientProtocol#getNewApplication is executed
[ https://issues.apache.org/jira/browse/YARN-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524239#comment-14524239 ] Jian He commented on YARN-1756: --- this sounds like a customized YARN feature request for MR only. I think MR can do this itself. I don't think we'll have more progress here in reality. close this.. please re-open if this requirement is still needed. Capture one more timestamp for an application when ApplicationClientProtocol#getNewApplication is executed -- Key: YARN-1756 URL: https://issues.apache.org/jira/browse/YARN-1756 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma The application submission time ( when submitApplication is called) is collected by RM and application history server. But it doesn't capture when the client calls newApplication method. The delta between newApplication and submitApplication could be useful if the client submits large jar files. This metric will be useful for https://issues.apache.org/jira/browse/YARN-1492. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-2305. -- Resolution: Duplicate This JIRA should be resolved already, now CSQueueUtils uses QueueResourceUsage instead of QueueMetrics, it is updated for every container allocation/resource update. When a container is in reserved state then total cluster memory is displayed wrongly. - Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina Assignee: Sunil G Attachments: Capture.jpg ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1772) Fair Scheduler documentation should indicate that admin ACLs also give submit permissions
[ https://issues.apache.org/jira/browse/YARN-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524304#comment-14524304 ] Jian He commented on YARN-1772: --- [~naren.koneru], would you still like to work on this ? cc/ [~kasha] Fair Scheduler documentation should indicate that admin ACLs also give submit permissions - Key: YARN-1772 URL: https://issues.apache.org/jira/browse/YARN-1772 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Naren Koneru I can submit to a Fair Scheduler queue if I'm in the submit ACL OR if I'm in the administer ACL. The Fair Scheduler docs seem to leave out the second part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3422: -- Labels: (was: newbie) relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3407) HttpServer2 Max threads in TimelineCollectorManager should be more than 10
[ https://issues.apache.org/jira/browse/YARN-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3407: Issue Type: Sub-task (was: Bug) Parent: YARN-2928 HttpServer2 Max threads in TimelineCollectorManager should be more than 10 -- Key: YARN-3407 URL: https://issues.apache.org/jira/browse/YARN-3407 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently TimelineCollectorManager sets HttpServer2.HTTP_MAX_THREADS to just 10. This value might be too less for serving put requests. By default HttpServer2 will have max threads value of 250. We can probably make it configurable too so that an optimum value can be configured based on number of requests coming to server. Thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524330#comment-14524330 ] Zhijie Shen commented on YARN-3422: --- I think it a valid bug. I took a look at the patch, and it seems that you need to care care of relatedEntitiesWithoutStartTimes too. In addition, would you please add the test case to cover this? /cc [~billie.rina...@gmail.com] relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Labels: newbie Fix For: 2.6.1 Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3422: -- Component/s: timelineserver Target Version/s: 2.7.1 relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3422: -- Fix Version/s: (was: 2.6.1) relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2469) Merge duplicated tests in Fifo/Capacity/Fair Scheduler into some common test
[ https://issues.apache.org/jira/browse/YARN-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2469: - Priority: Minor (was: Major) Merge duplicated tests in Fifo/Capacity/Fair Scheduler into some common test Key: YARN-2469 URL: https://issues.apache.org/jira/browse/YARN-2469 Project: Hadoop YARN Issue Type: Test Components: scheduler Reporter: Junping Du Priority: Minor From discussions in YARN-1506, there are duplicated test cases like: testBlackListNode, testResourceOverCommit, etc. for different schedulers. We need some common test code to cover the same test case for different scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524387#comment-14524387 ] Hadoop QA commented on YARN-2921: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 19s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 49s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 13s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 50m 51s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 68m 1s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12696827/YARN-2921.004.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / d3d019c | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7586/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7586/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7586/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7586/console | This message was automatically generated. MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi Ozawa Attachments: YARN-2921.001.patch, YARN-2921.002.patch, YARN-2921.003.patch, YARN-2921.004.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2482) DockerContainerExecutor configuration
[ https://issues.apache.org/jira/browse/YARN-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2482: - Issue Type: Sub-task (was: New Feature) Parent: YARN-2466 DockerContainerExecutor configuration - Key: YARN-2482 URL: https://issues.apache.org/jira/browse/YARN-2482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abin Shahab Labels: security Currently DockerContainerExecutor can be configured from yarn-site.xml, and users can add arbtrary arguments to the container launch command. This should be fixed so that the cluster and other jobs are protected from malicious string injections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2482) DockerContainerExecutor configuration
[ https://issues.apache.org/jira/browse/YARN-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524386#comment-14524386 ] Junping Du commented on YARN-2482: -- Move it to under YARN-2466 DockerContainerExecutor configuration - Key: YARN-2482 URL: https://issues.apache.org/jira/browse/YARN-2482 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Labels: security Currently DockerContainerExecutor can be configured from yarn-site.xml, and users can add arbtrary arguments to the container launch command. This should be fixed so that the cluster and other jobs are protected from malicious string injections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2137) Add support for logaggregation to a path on non-default filecontext
[ https://issues.apache.org/jira/browse/YARN-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524511#comment-14524511 ] Sumit Kumar commented on YARN-2137: --- Apologies for the delay, i will rebase this patch and look into the required testing that [~vinodkv] recommended. Add support for logaggregation to a path on non-default filecontext --- Key: YARN-2137 URL: https://issues.apache.org/jira/browse/YARN-2137 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Affects Versions: 2.4.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Attachments: YARN-2137.patch Current log-aggregation implementation supports logaggregation to default filecontext only. This patch is to support logaggregation to any of the supported filesystems within hadoop eco-system (hdfs, s3, swiftfs etc). So for example a customer could use hdfs as default filesystem but use s3 or swiftfs for logaggregation. Current implementation makes mixed usages of FileContext+AbstractFileSystem apis as well as FileSystem apis which is confusing. This patch does two things: # moves logaggregation implementation to use only FileContext apis # adds support for doing log aggregation on non-default filesystem as well. # changes TestLogAggregationService to use local filesystem itself instead of mocking the behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1329) yarn-config.sh overwrites YARN_CONF_DIR indiscriminately
[ https://issues.apache.org/jira/browse/YARN-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524730#comment-14524730 ] Hadoop QA commented on YARN-1329: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12613087/YARN-1329.patch | | Optional Tests | shellcheck | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7595/console | This message was automatically generated. yarn-config.sh overwrites YARN_CONF_DIR indiscriminately - Key: YARN-1329 URL: https://issues.apache.org/jira/browse/YARN-1329 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: Aaron Gottlieb Assignee: haosdent Labels: easyfix Attachments: YARN-1329.patch The script yarn-daemons.sh calls {code}${HADOOP_LIBEXEC_DIR}/yarn-config.sh{code} yarn-config.sh overwrites any previously set value of environment variable YARN_CONF_DIR starting at line 40: {code:title=yarn-config.sh|borderStyle=solid} #check to see if the conf dir is given as an optional argument if [ $# -gt 1 ] then if [ --config = $1 ] then shift confdir=$1 shift YARN_CONF_DIR=$confdir fi fi # Allow alternate conf dir location. export YARN_CONF_DIR=${HADOOP_CONF_DIR:-$HADOOP_YARN_HOME/conf} {code} The last line should check for the existence of YARN_CONF_DIR first. {code} DEFAULT_CONF_DIR=${HADOOP_CONF_DIR:-$YARN_HOME/conf} export YARN_CONF_DIR=${YARN_CONF_DIR:-$DEFAULT_CONF_DIR} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.
[ https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524736#comment-14524736 ] Hadoop QA commented on YARN-1019: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12616524/YARN-1019.0.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7596/console | This message was automatically generated. YarnConfiguration validation for local disk path and http addresses. Key: YARN-1019 URL: https://issues.apache.org/jira/browse/YARN-1019 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Omkar Vinit Joshi Priority: Minor Labels: newbie Attachments: YARN-1019.0.patch Today we are not validating certain configuration parameters set in yarn-site.xml. 1) Configurations related to paths... such as local-dirs, log-dirs.. Our NM crashes during startup if they are set to relative paths rather than absolute paths. To avoid such failures we can enforce checks (absolute paths) before startup . i.e. before we actually startup...( i.e. directory handler creating directories). 2) Also for all the parameters using hostname:port unless we are ok with default port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1725) RM should provide an easier way for the app to reject a bad allocation
[ https://issues.apache.org/jira/browse/YARN-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1725: -- Component/s: resourcemanager api Issue Type: Improvement (was: Bug) RM should provide an easier way for the app to reject a bad allocation -- Key: YARN-1725 URL: https://issues.apache.org/jira/browse/YARN-1725 Project: Hadoop YARN Issue Type: Improvement Components: api, resourcemanager Reporter: Bikas Saha Currently, if the app gets a bad allocation then it can release the container. However, the app now needs to request those resources again or else the RM will not give it a new container in lieu of the one just rejected. This makes the app writers life hard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1969) Fair Scheduler: Add policy for Earliest Endtime First
[ https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1969: Component/s: fairscheduler Fair Scheduler: Add policy for Earliest Endtime First - Key: YARN-1969 URL: https://issues.apache.org/jira/browse/YARN-1969 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh What we are observing is that some big jobs with many allocated containers are waiting for a few containers to finish. Under *fair-share scheduling* however they have a low priority since there are other jobs (usually much smaller, new comers) that are using resources way below their fair share, hence new released containers are not offered to the big, yet close-to-be-finished job. Nevertheless, everybody would benefit from an unfair scheduling that offers the resource to the big job since the sooner the big job finishes, the sooner it releases its many allocated resources to be used by other jobs.In other words, we need a relaxed version of *Earliest Endtime First scheduling*, that takes into account the number of already-allocated resources and estimated time to finish. For example, if a job is using MEM GB of memory and is expected to finish in TIME minutes, the priority in scheduling would be a function p of (MEM, TIME). The expected time to finish can be estimated by the AppMaster using TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource request messages. To be less susceptible to the issue of apps gaming the system, we can have this scheduling limited to leaf queues which have applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1968) YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations.
[ https://issues.apache.org/jira/browse/YARN-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1968: Component/s: resourcemanager YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations. Key: YARN-1968 URL: https://issues.apache.org/jira/browse/YARN-1968 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Junping Du AdminService's operation today have different dimensions of management, some are on user management while others are on cluster management, etc. Today, we only check if user belongs to some authorized group to see if he can execute operations in admin service. The result is who can either execute all operations or none which is a simple strategy but not very precisely so we cannot separate different management roles to several admins. We may need more fine-grained ACLs which can authorized user with partial operations in AdminService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2626) Document of timeline server needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2626. --- Resolution: Duplicate YARN-3539 is updating it. Close this one. Document of timeline server needs to be updated --- Key: YARN-2626 URL: https://issues.apache.org/jira/browse/YARN-2626 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Zhijie Shen YARN-2033, the document is no longer accurate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524176#comment-14524176 ] Jian He commented on YARN-1735: --- canceling the patch since it's not applying anymore. cc/ [~kasha] For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB --- Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Siqi Li Attachments: YARN-1735.v1.patch in monitoring graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the queue max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
[ https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524174#comment-14524174 ] Hadoop QA commented on YARN-2892: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 52m 19s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12684732/YARN-2892.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3393461 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7577/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7577/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7577/console | This message was automatically generated. Unable to get AMRMToken in unmanaged AM when using a secure cluster --- Key: YARN-2892 URL: https://issues.apache.org/jira/browse/YARN-2892 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Sevada Abraamyan Assignee: Sevada Abraamyan Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be Foo whereas the full username is f...@company.com Note: A very similar problem has been previously reported ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2345) yarn rmadmin -report
[ https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524178#comment-14524178 ] Hadoop QA commented on YARN-2345: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12663619/YARN-2345.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d3d019c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7580/console | This message was automatically generated. yarn rmadmin -report Key: YARN-2345 URL: https://issues.apache.org/jira/browse/YARN-2345 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Allen Wittenauer Assignee: Hao Gao Labels: newbie Attachments: YARN-2345.1.patch It would be good to have an equivalent of hdfs dfsadmin -report in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience
[ https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3523: Labels: newbie (was: ) Cleanup ResourceManagerAdministrationProtocol interface audience Key: YARN-3523 URL: https://issues.apache.org/jira/browse/YARN-3523 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Labels: newbie Fix For: 2.8.0 Attachments: YARN-3523.20150422-1.patch I noticed ResourceManagerAdministrationProtocol has @Private audience for the class and @Public audience for methods. It doesn't make sense to me. We should make class audience and methods audience consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1971) WindowsLocalWrapperScriptBuilder does not check for errors in generated script
[ https://issues.apache.org/jira/browse/YARN-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1971: Component/s: nodemanager WindowsLocalWrapperScriptBuilder does not check for errors in generated script -- Key: YARN-1971 URL: https://issues.apache.org/jira/browse/YARN-1971 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Similar to YARN-1865. The DefaultContainerExecutor.WindowsLocalWrapperScriptBuilder builds a shell script that contains commands that potentially may fail: {code} pout.println(@echo + containerIdStr ++ normalizedPidFile +.tmp); pout.println(@move /Y + normalizedPidFile + .tmp + normalizedPidFile); {code} These can fail due to access permissions, disc out of space, bad hardware, cosmic rays etc etc. There should be proper error checking to ease troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3380) Add protobuf compatibility checker to jenkins test runs
[ https://issues.apache.org/jira/browse/YARN-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524232#comment-14524232 ] Li Lu commented on YARN-3380: - Hi [~sidharta-s], thanks for poking. Right now my bandwidth is limited, so if any one would like to get this done soon and happen to have bandwidth on it, please feel free to let me know. Thanks! Add protobuf compatibility checker to jenkins test runs --- Key: YARN-3380 URL: https://issues.apache.org/jira/browse/YARN-3380 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Labels: jenkins, scripting We may want to run the protobuf compatibility checker for each incoming patch, to prevent incompatible changes for rolling upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524300#comment-14524300 ] Wangda Tan commented on YARN-3006: -- +1 for latest patch, committing. Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2454) The function compareTo of variable UNBOUNDED in org.apache.hadoop.yarn.util.resource.Resources is definited wrong.
[ https://issues.apache.org/jira/browse/YARN-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524327#comment-14524327 ] Junping Du commented on YARN-2454: -- Patch LGTM. Kick off Jenkins again, +1 based on Jenkins' results. The function compareTo of variable UNBOUNDED in org.apache.hadoop.yarn.util.resource.Resources is definited wrong. -- Key: YARN-2454 URL: https://issues.apache.org/jira/browse/YARN-2454 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0, 2.5.0, 2.4.1 Reporter: Xu Yang Assignee: Xu Yang Attachments: YARN-2454 -v2.patch, YARN-2454-patch.diff, YARN-2454.patch The variable UNBOUNDED implement the abstract class Resources, and override the function compareTo. But there is something wrong in this function. We should not compare resources with zero as the same as the variable NONE. We should change 0 to Integer.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524373#comment-14524373 ] Hadoop QA commented on YARN-20: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 1s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 36m 13s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12725385/YARN-20.1.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / 6f541ed | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7587/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7587/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7587/console | This message was automatically generated. More information for yarn.resourcemanager.webapp.address in yarn-default.xml -- Key: YARN-20 URL: https://issues.apache.org/jira/browse/YARN-20 Project: Hadoop YARN Issue Type: Improvement Components: documentation, resourcemanager Affects Versions: 2.0.0-alpha Reporter: Nemon Lou Priority: Trivial Attachments: YARN-20.1.patch, YARN-20.patch Original Estimate: 1h Remaining Estimate: 1h The parameter yarn.resourcemanager.webapp.address in yarn-default.xml is in host:port format,which is noted in the cluster set up guide (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html). When i read though the code,i find host format is also supported. In host format,the port will be random. So we may add more documentation in yarn-default.xml for easy understood. I will submit a patch if it's helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2483) TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state
[ https://issues.apache.org/jira/browse/YARN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-2483. -- Resolution: Duplicate Target Version/s: (was: 2.6.0) Resolve this JIRA as duplicated. TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state Key: YARN-2483 URL: https://issues.apache.org/jira/browse/YARN-2483 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu From https://builds.apache.org/job/Hadoop-Yarn-trunk/665/console : {code} testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart) Time elapsed: 49.686 sec FAILURE! java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:84) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:582) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:589) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForNewAMToLaunchAndRegister(MockRM.java:182) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:402) {code} TestApplicationMasterLauncher#testallocateBeforeAMRegistration fails with similar cause. These tests failed in build #664 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)