[jira] [Commented] (YARN-4795) ContainerMetrics drops records
[ https://issues.apache.org/jira/browse/YARN-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202571#comment-15202571 ] Hadoop QA commented on YARN-4795: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 23s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 9s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 50s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793202/YARN-4795.001.patch | | JIRA Issue | YARN-4795 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6bf5bc037e48 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 33239c9 | | Default Java | 1.7.0_95 | | Multi-JDK versions |
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202592#comment-15202592 ] Wangda Tan commented on YARN-4002: -- [~rohithsharma], [~zhiguohong], Thanks for working on this patch, generally looks good. Do you think we need to acquire readlock in printConfiguredHosts and setDecomissionedNMsMetrics? > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
[ https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198733#comment-15198733 ] Li Lu commented on YARN-4815: - Fine... My concern is that we do not need to have separate caches for each use cases if they can be modeled by Guava. I'm fine with either ways. > ATS 1.5 timelineclinet impl try to create attempt directory for every event > call > > > Key: YARN-4815 > URL: https://issues.apache.org/jira/browse/YARN-4815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4815.1.patch > > > ATS 1.5 timelineclinet impl, try to create attempt directory for every event > call. Since per attempt only one call to create directory is enough, this is > causing perf issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200536#comment-15200536 ] Karthik Kambatla commented on YARN-4686: bq. Currently it is in the RM start method, but if you feel that it is better to put it in the MiniYARNCluster start method then we can add the check there. I was just wondering if we needed to transition to active. I am fine with the check being at either place. We can stick to this if it keeps the code clean. > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, > YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, > YARN-4686.005.patch, YARN-4686.006.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400
[ https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199400#comment-15199400 ] Steve Loughran commented on YARN-4746: -- looks reasonable. Given all uses of the conversion in these pages now look like: {code} try { id = ConverterUtils.toApplicationId(recordFactory, appId); } catch (IllegalArgumentException e) { throw new BadRequestException(e); } {code} could we factor this out into something which always does this for the web UIs? > yarn web services should convert parse failures of appId to 400 > --- > > Key: YARN-4746 > URL: https://issues.apache.org/jira/browse/YARN-4746 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch > > > I'm seeing somewhere in the WS API tests of mine an error with exception > conversion of a bad app ID sent in as an argument to a GET. I know it's in > ATS, but a scan of the core RM web services implies a same problem > {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} > to convert an argument; this throws IllegalArgumentException, which is then > handled somewhere by jetty as a 500 error. > In fact, it's a bad argument, which should be handled by returning a 400. > This can be done by catching the raised argument and explicitly converting it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters
[ https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197218#comment-15197218 ] Steve Loughran commented on YARN-4820: -- What about 307 redirects? Even if they aren't generated in any RM REST API yet, they might ... this filter should be ready for them > ResourceManager web redirects in HA mode drops query parameters > --- > > Key: YARN-4820 > URL: https://issues.apache.org/jira/browse/YARN-4820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4820.001.patch > > > The RMWebAppFilter redirects http requests from the standby to the active. > However it drops all the query parameters when it does the redirect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4835) [YARN-3368] REST API related changes for new Web UI
Varun Saxena created YARN-4835: -- Summary: [YARN-3368] REST API related changes for new Web UI Key: YARN-4835 URL: https://issues.apache.org/jira/browse/YARN-4835 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4833) For Queue AccessControlException client retries multiple times on both RM
[ https://issues.apache.org/jira/browse/YARN-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4833: --- Description: Submit application to queue where ACL is enabled and submitted user is not having access. Client retries till failMaxattempt 10 times. {noformat} 16/03/18 10:01:06 INFO retry.RetryInvocationHandler: Exception while invoking submitApplication of class ApplicationClientProtocolPBClientImpl over rm1. Trying to fail over immediately. org.apache.hadoop.security.AccessControlException: User hdfs does not have permission to submit application_1458273884145_0001 to queue default at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:380) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:291) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:618) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:252) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2360) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2356) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2356) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:272) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:257) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy23.submitApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:261) at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:295) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at
[jira] [Commented] (YARN-4825) Remove redundant code in ClientRMService::listReservations
[ https://issues.apache.org/jira/browse/YARN-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200854#comment-15200854 ] Subru Krishnan commented on YARN-4825: -- The test case failures are consistent and unrelated and are covered in YARN-4478 > Remove redundant code in ClientRMService::listReservations > -- > > Key: YARN-4825 > URL: https://issues.apache.org/jira/browse/YARN-4825 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Minor > Attachments: YARN-4825-v1.patch > > > We do the null check and parsing of ReservationId twice currently in > ClientRMService::listReservations. This happened due to parallel changes as > part of YARN-4340 and YARN-2575. This JIRA proposes cleaning up the redundant > code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201868#comment-15201868 ] Varun Saxena commented on YARN-4712: I have committed this to branch YARN-2928. Thanks [~Naganarasimha] for your contribution. And thanks [~sjlee0], [~djp] and [~sunilg] for reviews. > CPU Usage Metric is not captured properly in YARN-2928 > -- > > Key: YARN-4712 > URL: https://issues.apache.org/jira/browse/YARN-4712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Fix For: YARN-2928 > > Attachments: YARN-4712-YARN-2928.v1.001.patch, > YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, > YARN-4712-YARN-2928.v1.004.patch, YARN-4712-YARN-2928.v1.005.patch, > YARN-4712-YARN-2928.v1.006.patch > > > There are 2 issues with CPU usage collection > * I was able to observe that that many times CPU usage got from > {{pTree.getCpuUsagePercent()}} is > ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do > the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore > /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE > check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not > encountered. so proper checks needs to be handled > * {{EntityColumnPrefix.METRIC}} uses always LongConverter but > ContainerMonitor is publishing decimal values for the CPU usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4822) Refactor existing Preemption Policy of CS for easier adding new approach to select preemption candidates
[ https://issues.apache.org/jira/browse/YARN-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200526#comment-15200526 ] Hadoop QA commented on YARN-4822: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 54s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74 with JDK v1.8.0_74 generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 39 new + 30 unchanged - 29 fixed = 69 total (was 59) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 9s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 12s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 153m 25s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests |
[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters
[ https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199449#comment-15199449 ] Junping Du commented on YARN-4820: -- Hi [~ste...@apache.org], I think the redirect logic now is return 307 now: {code} response.setStatus(HttpServletResponse.SC_TEMPORARY_REDIRECT); {code} Do I miss something here? > ResourceManager web redirects in HA mode drops query parameters > --- > > Key: YARN-4820 > URL: https://issues.apache.org/jira/browse/YARN-4820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4820.001.patch > > > The RMWebAppFilter redirects http requests from the standby to the active. > However it drops all the query parameters when it does the redirect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to only kill containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198600#comment-15198600 ] Hudson commented on YARN-4108: -- FAILURE: Integrated in Hadoop-trunk-Commit #9470 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9470/]) YARN-4108. CapacityScheduler: Improve preemption to only kill containers (wangda: rev ae14e5d07f1b6702a5160637438028bb03d9387e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/preemption/KillableContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/preemption/PreemptableQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/preemption/PreemptionManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java *
[jira] [Commented] (YARN-4576) Enhancement for tracking Blacklist in AM Launching
[ https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202664#comment-15202664 ] Varun Vasudev commented on YARN-4576: - bq. No. DISKS_FAILED mark on bad disks are transient status for the node. Take an example, if "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" is set to 90% (by default), another job (no matter YARN or not) writing some files and deleting them afterwards back-and-forth - if disks usage for the node is just happen to be around 90%, it make NM's healthy status report to RM between healthy and unhealthy back and forth. Blacklist of AM launching can evaluate history record to decide a better place to launch AM. The bar for launching normal containers could be different or we could end up with so less choice. Couple of points here - # Specifically to the disks failed issue, there is now support for a watermark to avoid the issue you described - YARN-3943. # To the more general point of nodes switching back and forth from good to bad and back - the better solution would be to have the RM detect bouncing nodes and then not to allocate new containers to bouncing nodes until they stabilize. > Enhancement for tracking Blacklist in AM Launching > -- > > Key: YARN-4576 > URL: https://issues.apache.org/jira/browse/YARN-4576 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: EnhancementAMLaunchingBlacklist.pdf > > > Before YARN-2005, YARN blacklist mechanism is to track the bad nodes by AM: > If AM tried to launch containers on a specific node get failed for several > times, AM will blacklist this node in future resource asking. This mechanism > works fine for normal containers. However, from our observation on behaviors > of several clusters: if this problematic node launch AM failed, then RM could > pickup this problematic node to launch next AM attempts again and again that > cause application failure in case other functional nodes are busy. In normal > case, the customized healthy checker script cannot be so sensitive to mark > node as unhealthy when one or two containers get launched failed. > After YARN-2005, we can have a BlacklistManager in each RMapp, so those nodes > who launching AM attempts failed for specific application before will get > blacklisted. To get rid of potential risks that all nodes being blacklisted > by BlacklistManager, a disable-failure-threshold is involved to stop adding > more nodes into blacklist if hit certain ratio already. > There are already some enhancements for this AM blacklist mechanism: > YARN-4284 is to address the more wider case for AM container get launched > failure and YARN-4389 tries to make configuration settings available for > change by App to meet app specific requirement. However, there are still > several gaps to address more scenarios: > 1. We may need a global blacklist instead of each app maintain a separated > one. The reason is: AM could get more chance to fail if other AM get failed > before. A quick example is: in a busy cluster, all nodes are busy except two > problematic nodes: node a and node b, app1 already submit and get failed in > two AM attempts on a and b. app2 and other apps should wait for other busy > nodes rather than waste attempts on these two problematic nodes. > 2. If AM container failure is recognized as global event instead app own > issue, we should consider the blacklist is not a permanent thing but with a > specific time window. > 3. We could have user defined black list polices to address more possible > cases and scenarios, so it reasonable to make blacklist policy pluggable. > 4. For some test scenario, we could have whitelist mechanism for AM launching. > 5. Some minor issues: it sounds like NM reconnect won't refresh blacklist so > far. > Will try to address all issues here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes
Nathan Roberts created YARN-4834: Summary: ProcfsBasedProcessTree doesn't track daemonized processes Key: YARN-4834 URL: https://issues.apache.org/jira/browse/YARN-4834 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.2, 3.0.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Currently the algorithm uses ppid from /proc//stat which can be 1 if a child process has daemonized itself. This causes potentially large processes from not being monitored. session id might be a better choice since that's what we use to signal the container during teardown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4743) ResourceManager crash because TimSort
[ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-4743: -- Assignee: Yufei Gu > ResourceManager crash because TimSort > - > > Key: YARN-4743 > URL: https://issues.apache.org/jira/browse/YARN-4743 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.4 >Reporter: Zephyr Guo >Assignee: Yufei Gu > > {code} > 2016-02-26 14:08:50,821 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeCollapse(TimSort.java:410) > at java.util.TimSort.sort(TimSort.java:214) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2016-02-26 14:08:50,822 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} > Actually, this issue found in 2.6.0-cdh5.4.7. > I think the cause is that we modify {{Resouce}} while we are sorting > {{runnableApps}}. > {code:title=FSLeafQueue.java} > Comparator comparator = policy.getComparator(); > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > {code} > {code:title=FairShareComparator} > public int compare(Schedulable s1, Schedulable s2) { > .. > s1.getResourceUsage(), minShare1); > boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null, > s2.getResourceUsage(), minShare2); > minShareRatio1 = (double) s1.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare1, > ONE).getMemory(); > minShareRatio2 = (double) s2.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare2, > ONE).getMemory(); > .. > {code} > {{getResourceUsage}} will return current Resource. The current Resource is > unstable. > {code:title=FSAppAttempt.java} > @Override > public Resource getResourceUsage() { > // Here the getPreemptedResources() always return zero, except in > // a preemption round > return Resources.subtract(getCurrentConsumption(), > getPreemptedResources()); > } > {code} > {code:title=SchedulerApplicationAttempt} > public Resource getCurrentConsumption() { > return currentConsumption; > } > // This method may modify current Resource. > public synchronized void recoverContainer(RMContainer rmContainer) { > .. > Resources.addTo(currentConsumption, rmContainer.getContainer() > .getResource()); > .. > } > {code} > I suggest that use stable Resource in comparator. > Is there something i think wrong? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles
[ https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197925#comment-15197925 ] Lei Guo commented on YARN-3926: --- Another topic related to rm-nm protocol is constraint label. It's not a must to be considered in this Jira, but I'd like to raise it as I can see the design in this Jira may affect the constraint label one. The constraint label could be some server attribute reported by NM, it could be required to be predefined in RM, but if we can allow NM to define something not defined in RM, and then RM automatically add it in label repository, it will be great. for example, for OS version or JDK version, customer may prefer automatically added instead of adding label before use. > Extend the YARN resource model for easier resource-type management and > profiles > --- > > Key: YARN-3926 > URL: https://issues.apache.org/jira/browse/YARN-3926 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Proposal for modifying resource model and profiles.pdf > > > Currently, there are efforts to add support for various resource-types such > as disk(YARN-2139), network(YARN-2140), and HDFS bandwidth(YARN-2681). These > efforts all aim to add support for a new resource type and are fairly > involved efforts. In addition, once support is added, it becomes harder for > users to specify the resources they need. All existing jobs have to be > modified, or have to use the minimum allocation. > This ticket is a proposal to extend the YARN resource model to a more > flexible model which makes it easier to support additional resource-types. It > also considers the related aspect of “resource profiles” which allow users to > easily specify the various resources they need for any given container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4825) Remove redundant code in ClientRMService::listReservations
[ https://issues.apache.org/jira/browse/YARN-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-4825: - Attachment: YARN-4825-v1.patch Attaching a patch that removes redundant code. Note: the patch does NOT have any test case modifications as it only removes duplicate code > Remove redundant code in ClientRMService::listReservations > -- > > Key: YARN-4825 > URL: https://issues.apache.org/jira/browse/YARN-4825 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Minor > Attachments: YARN-4825-v1.patch > > > We do the null check and parsing of ReservationId twice currently in > ClientRMService::listReservations. This happened due to parallel changes as > part of YARN-4340 and YARN-2575. This JIRA proposes cleaning up the redundant > code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4838) TestLogAggregationService. testLocalFileDeletionOnDiskFull failed
Haibo Chen created YARN-4838: Summary: TestLogAggregationService. testLocalFileDeletionOnDiskFull failed Key: YARN-4838 URL: https://issues.apache.org/jira/browse/YARN-4838 Project: Hadoop YARN Issue Type: Test Components: log-aggregation Reporter: Haibo Chen org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService testLocalFileDeletionOnDiskFull failed java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:232) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:288) The failure is caused by the time issue of DeletionService. DeletionService runs its only thread pool to delete files. When verfiyLocalFileDeletion() method checks file existence, it is possible that the FileDeletionTask has been executed by the thread pool in DeletionService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's
[ https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197956#comment-15197956 ] Vinod Kumar Vavilapalli commented on YARN-2915: --- One thing that occurred to me in an offline conversation with [~subru] and [~leftnoteasy] is about the modeling of queues and their shares in different sub-clusters. As seems to be already proposed, it is very desirable to have a unified *logic queues* that are applicable across all sub-clusters. With unified logical queues, looks like there are some proposals for ways of how resources can get sub-divided amongst different sub-clusters. But to me, they already map to an existing concept in YARN - *Node Partitions* / node-labels ! Essentially you have *one YARN cluster* -> *multiple sub-clusters* -> *each sub-cluster with multiple node-partitions*. This can further be extended to more levels. For e.g. we can unify rack also under the same concept. The advantage of unifying this with node-partitions is that we can have - one single administrative view philosophy of sub-clusters, node-partitions, racks etc - unified configuration mechanisms: Today we support centralized and distributed node-partition mechanisms, exclusive / non-exclusive access etc. - unified queue-sharing models - today we already can assign X% of a node-partition to a queue. This way we can, again, reuse existing concepts, mental models and allocation policies - instead of creating specific policies for sub-cluster sharing like the user-based share that is proposed. We will have to dig deeper into the details, but it seems to me that node-partition and sub-cluster are equivalence classes except for the fact that two sub-clusters report to two different RMs (physically / implementation wise) which isn't the case today with node-partitions. Thoughts? /cc [~curino] [~chris.douglas] > Enable YARN RM scale out via federation using multiple RM's > --- > > Key: YARN-2915 > URL: https://issues.apache.org/jira/browse/YARN-2915 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao >Assignee: Subru Krishnan > Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, > Federation-BoF.pdf, Yarn_federation_design_v1.pdf, federation-prototype.patch > > > This is an umbrella JIRA that proposes to scale out YARN to support large > clusters comprising of tens of thousands of nodes. That is, rather than > limiting a YARN managed cluster to about 4k in size, the proposal is to > enable the YARN managed cluster to be elastically scalable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4595) Add support for configurable read-only mounts
[ https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-4595: - Attachment: YARN-4595.2.patch Rebased patch. > Add support for configurable read-only mounts > - > > Key: YARN-4595 > URL: https://issues.apache.org/jira/browse/YARN-4595 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-4595.1.patch, YARN-4595.2.patch > > > Mounting files or directories from the host is one way of passing > configuration and other information into a docker container. We could allow > the user to set a list of mounts in the environment of ContainerLaunchContext > (e.g. /dir1:/targetdir1,/dir2:/targetdir2). These would be mounted read-only > to the specified target locations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt
[ https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201680#comment-15201680 ] Jason Lowe commented on YARN-4839: -- This appears to have been fixed as a side-effect of YARN-3361 which wasn't in the build that reproduced this issue. That change updated getMasterContainer to avoid locking the RMAppAttemptImpl. > ResourceManager deadlock between RMAppAttemptImpl and > SchedulerApplicationAttempt > - > > Key: YARN-4839 > URL: https://issues.apache.org/jira/browse/YARN-4839 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Jason Lowe >Priority: Blocker > > Hit a deadlock in the ResourceManager as one thread was holding the > SchedulerApplicationAttempt lock and trying to call > RMAppAttemptImpl.getMasterContainer while another thread had the > RMAppAttemptImpl lock and was trying to call > SchedulerApplicationAttempt.getResourceUsageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200490#comment-15200490 ] Vinod Kumar Vavilapalli commented on YARN-4837: --- Here are my concerns - First up the feature isn't 'AM blacklisting' - we are not blacklisting AMs. The goal is for the system to not schedule AMs on faulty nodes. The right solution is to identify why we keep launching on bad-nodes instead of marking them unhealthy - but I can see why a blacklist threshold is useful when we *simply don't know*. - The configurations are all named yarn.am.blacklisting even though they should be under a yarn.resourcemanager hierarchy - We just blindly add a node to the app's blacklist even if we just hit *one* AM failure. And the error / exit-code doesn't matter at all. - Irrespective of all that, I actually don't see why we should already expose this to end-users i.e the whole premise of YARN-4389. Why should an app specifically care "the number of nodes YARN blacklists for my AM container launch"? I'm digging into the feature more for a careful look. /cc - [~adhoot], [~jlowe], [~kasha] who were involved with YARN-2005 for the naming changes - [~sunilg] / [~djp] who worked on YARN-4389. While we discuss this, I think we should take the private feature before 2.8.0 goes out. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4829: Attachment: YARN-4829-YARN-3926.003.patch Good catch. Updated both of them. > Add support for binary units > > > Key: YARN-4829 > URL: https://issues.apache.org/jira/browse/YARN-4829 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4829-YARN-3926.001.patch, > YARN-4829-YARN-3926.002.patch, YARN-4829-YARN-3926.003.patch > > > The units conversion util should have support for binary units. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4790) Per user blacklist node for user specific error for container launch failure.
[ https://issues.apache.org/jira/browse/YARN-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201156#comment-15201156 ] Vinod Kumar Vavilapalli commented on YARN-4790: --- bq. when enabling LinuxContainerExecutor, but some node doesn't have such user exists As for the root-cause reported on this JIRA, this invalidates our fundamental assumptions of LinuxContainerExecutor. We assume that user-accounts corresponding to all job-submitters are present on all the machines. If not it is a gross misconfiguration of the system, and should be handled by lower layers like installers / management systems. If it is really deemed that we should support different user-accounts on different hosts (for whatever reason), then the right way to look at solving that problem is by recognizing user-accounts as a resource on each host - kind of like node-constraints. Blacklisting that node for an app is absolutely the wrong way to go about it. > Per user blacklist node for user specific error for container launch failure. > - > > Key: YARN-4790 > URL: https://issues.apache.org/jira/browse/YARN-4790 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Junping Du > > There are some user specific error for container launch failure, like: > when enabling LinuxContainerExecutor, but some node doesn't have such user > exists, so container launch should get failed with following information: > {noformat} > 2016-02-14 15:37:03,111 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED > 2016-02-14 15:37:03,111 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application > application_1434045496283_0036 failed 2 times due to AM Container for > appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: > Application application_1434045496283_0036 initialization failed > (exitCode=255) with output: User jdu not found > {noformat} > Obviously, this node is not suitable for launching container for this user's > other applications. We need a per user blacklist track mechanism rather than > per application now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197300#comment-15197300 ] Yi Zhou commented on YARN-796: -- Thanks [~Naganarasimha] > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, > Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.14.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler
[ https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197388#comment-15197388 ] Junping Du commented on YARN-4785: -- Thanks [~vvasudev] for explanation. I verified it in branch-2.7 that the test without main code change will failed with below failures: {noformat} testClusterScheduler(org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched) Time elapsed: 0.463 sec <<< FAILURE! org.junit.ComparisonFailure: "type" field is incorrect expected:<[capacitySchedulerLeafQueueInfo]> but was:<[["capacitySchedulerLeafQueueInfo"]]> at org.junit.Assert.assertEquals(Assert.java:115) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.verifySubQueue(TestRMWebServicesCapacitySched.java:378) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.verifySubQueue(TestRMWebServicesCapacitySched.java:375) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.verifyClusterScheduler(TestRMWebServicesCapacitySched.java:331) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.testClusterScheduler(TestRMWebServicesCapacitySched.java:179) {noformat} The test failure reported by Jenkins is not related and we already saw it many times (like in YARN-998). So +1. Patch LGTM. Will wait a while to commit in case someone else want to review it also. > inconsistent value type of the "type" field for LeafQueueInfo in response of > RM REST API - cluster/scheduler > > > Key: YARN-4785 > URL: https://issues.apache.org/jira/browse/YARN-4785 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.6.0 >Reporter: Jayesh >Assignee: Varun Vasudev > Labels: REST_API > Attachments: YARN-4785.001.patch > > > I see inconsistent value type ( String and Array ) of the "type" field for > LeafQueueInfo in response of RM REST API - cluster/scheduler > as per the spec it should be always String. > here is the sample output ( removed non-relevant fields ) > {code} > { > "scheduler": { > "schedulerInfo": { > "type": "capacityScheduler", > "capacity": 100, > ... > "queueName": "root", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 0.1, > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 0.1, > "queueName": "test-queue", > "state": "RUNNING", > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 2.5, > > }, > { > "capacity": 25, > > "state": "RUNNING", > "queues": { > "queue": [ > { > "capacity": 6, > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > > }, > { > "capacity": 6, > ... > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > ... > }, > ... > ] > }, > ... > } > ] > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table
[ https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-4062: - Attachment: YARN-4062-YARN-2928.08.patch Attaching patch v8 to address Sangjin’s review comments. bq. (FlowRunCoprocessor.java) l.268: Logging of request.isMajor() seems a bit cryptic. It would print strings like "Compactionrequest= ... true for ObserverContext ...". Should we do something like request.isMajor() ? "major compaction" : "minor compaction" instead? And did you mean it to be an INFO logging statement? Updated the log message. I would like to log it at the INFO level. Compactions have traditionally taken up cycles on region servers such that they have been blocked from other operations, hence it would be good to know the details of this request that has come in. bq. (FlowRunColumnPrefix.java) l.43: if we're removing the aggregation operation from the only enum entry, should we remove the aggregation operation (aggOp) completely from this class? As we chatted offline, the aggOp operation should be part of any ColumnPrefix for this table. I have filed YARN-4786 for enhancements to add in a FINAL attribute (among other things). bq. l.381-382: should we throw an exception as this is not possible (currently)? Hmm. Not sure if throwing an exception is the right thing here. Thinking out loud, in the case that we add in another cell agg operation but have a different scanner invoked, throwing an exception won’t be correct. thanks Vrushali bq. ▪ l.220: could you elaborate on why this change is needed? I'm generally not too clear on the difference betweencellLimit and limit. There was a warning generated by check style I think, which is why I modified the code. cellLimit and limit are differently named variables since previously some functions had an argument which was called limit and check style flagged that in the maven jenkins build. > Add the flush and compaction functionality via coprocessors and scanners for > flow run table > --- > > Key: YARN-4062 > URL: https://issues.apache.org/jira/browse/YARN-4062 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-2928-1st-milestone > Attachments: YARN-4062-YARN-2928.04.patch, > YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, > YARN-4062-YARN-2928.07.patch, YARN-4062-YARN-2928.08.patch, > YARN-4062-YARN-2928.1.patch, YARN-4062-feature-YARN-2928.01.patch, > YARN-4062-feature-YARN-2928.02.patch, YARN-4062-feature-YARN-2928.03.patch > > > As part of YARN-3901, coprocessor and scanner is being added for storing into > the flow_run table. It also needs a flush & compaction processing in the > coprocessor and perhaps a new scanner to deal with the data during flushing > and compaction stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4831) Recovered containers will be killed after NM stateful restart
[ https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198512#comment-15198512 ] Hadoop QA commented on YARN-4831: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 91 unchanged - 0 fixed = 92 total (was 91) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 11s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 50s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 35s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793824/YARN-4831.v1.patch | | JIRA Issue | YARN-4831 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e771231775da 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3
[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts
[ https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1518#comment-1518 ] Vinod Kumar Vavilapalli commented on YARN-4595: --- Can this also be the general solution for accessing all (or read-only files/dirs) of distributed-cache files inside a docker container? If so, this will enable us to mix and match artifacts inside the image and outside the image. > Add support for configurable read-only mounts > - > > Key: YARN-4595 > URL: https://issues.apache.org/jira/browse/YARN-4595 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-4595.1.patch, YARN-4595.2.patch > > > Mounting files or directories from the host is one way of passing > configuration and other information into a docker container. We could allow > the user to set a list of mounts in the environment of ContainerLaunchContext > (e.g. /dir1:/targetdir1,/dir2:/targetdir2). These would be mounted read-only > to the specified target locations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt
[ https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201884#comment-15201884 ] Sunil G commented on YARN-4839: --- I feel that a call to {{RMAppAttemptImpl.getMasterContainer }} is not very clean or good from Scheduler side. I mentioned this point once in YARN-2005. Its better we set that AM container is allocated via an Event to Scheduler or via a direct api. But it comes with few lines of extra code. There were few discussions on YARN-4143 for this. Will this be clean solution for this, so that we can remove the dependency. > ResourceManager deadlock between RMAppAttemptImpl and > SchedulerApplicationAttempt > - > > Key: YARN-4839 > URL: https://issues.apache.org/jira/browse/YARN-4839 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Jason Lowe >Priority: Blocker > > Hit a deadlock in the ResourceManager as one thread was holding the > SchedulerApplicationAttempt lock and trying to call > RMAppAttemptImpl.getMasterContainer while another thread had the > RMAppAttemptImpl lock and was trying to call > SchedulerApplicationAttempt.getResourceUsageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage
[ https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200138#comment-15200138 ] Hadoop QA commented on YARN-4767: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 58s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: patch generated 6 new + 58 unchanged - 9 fixed = 64 total (was 67) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 19s {color} | {color:red} hadoop-yarn-server-web-proxy in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 9s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s
[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
[ https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199611#comment-15199611 ] Junping Du commented on YARN-4815: -- The patch seems out of sync with trunk. [~xgong], can you rebase the patch against latest trunk? Thanks! > ATS 1.5 timelineclinet impl try to create attempt directory for every event > call > > > Key: YARN-4815 > URL: https://issues.apache.org/jira/browse/YARN-4815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4815.1.patch > > > ATS 1.5 timelineclinet impl, try to create attempt directory for every event > call. Since per attempt only one call to create directory is enough, this is > causing perf issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4836) [YARN-3368] Add AM related pages
Varun Saxena created YARN-4836: -- Summary: [YARN-3368] Add AM related pages Key: YARN-4836 URL: https://issues.apache.org/jira/browse/YARN-4836 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
[ https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197668#comment-15197668 ] Sunil G commented on YARN-4751: --- Yes, agreeing to the point that we have to aggregate and peek into multiple patches to get the functionality. If 2.7 doesnt need new features/enhancements for labels, we can bring in patches on use case basis. cc/[~wangda.tan] > In 2.7, Labeled queue usage not shown properly in capacity scheduler UI > --- > > Key: YARN-4751 > URL: https://issues.apache.org/jira/browse/YARN-4751 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: 2.7 CS UI No BarGraph.jpg, > YARH-4752-branch-2.7.001.patch, YARH-4752-branch-2.7.002.patch > > > In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs > separated by partition. When applications are running on a labeled queue, no > color is shown in the bar graph, and several of the "Used" metrics are zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt
[ https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201578#comment-15201578 ] Jason Lowe commented on YARN-4839: -- Stack trace of the relevant threads: {noformat} "IPC Server handler 32 on 8030" #153 daemon prio=5 os_prio=0 tid=0x7fb649603800 nid=0x20b1 waiting on condition [0x7fb5888d2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00036de978f0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getMasterContainer(RMAppAttemptImpl.java:779) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:467) - locked <0x00032a106f00> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:278) - locked <0x00032a106f00> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1008) - locked <0x00032a106f00> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:534) - locked <0x000383ce08b0> (a org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server.call(Server.java:2267) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217) [...] "1413615244@qtp-1677286081-37" #40337 daemon prio=5 os_prio=0 tid=0x7fb62c089800 nid=0x1b8d waiting for monitor entry [0x7fb5ca40e000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:580) - waiting to lock <0x00032a106f00> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:267) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:826) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:580) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:815) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:457) at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at
[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler
[ https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197781#comment-15197781 ] Jayesh commented on YARN-4785: -- +1 ( thanks for explaining the solution in code comment ) > inconsistent value type of the "type" field for LeafQueueInfo in response of > RM REST API - cluster/scheduler > > > Key: YARN-4785 > URL: https://issues.apache.org/jira/browse/YARN-4785 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.6.0 >Reporter: Jayesh >Assignee: Varun Vasudev > Labels: REST_API > Attachments: YARN-4785.001.patch > > > I see inconsistent value type ( String and Array ) of the "type" field for > LeafQueueInfo in response of RM REST API - cluster/scheduler > as per the spec it should be always String. > here is the sample output ( removed non-relevant fields ) > {code} > { > "scheduler": { > "schedulerInfo": { > "type": "capacityScheduler", > "capacity": 100, > ... > "queueName": "root", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 0.1, > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 0.1, > "queueName": "test-queue", > "state": "RUNNING", > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 2.5, > > }, > { > "capacity": 25, > > "state": "RUNNING", > "queues": { > "queue": [ > { > "capacity": 6, > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > > }, > { > "capacity": 6, > ... > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > ... > }, > ... > ] > }, > ... > } > ] > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4686: -- Attachment: YARN-4686-branch-2.7.006.patch [~eepayne] Attaching the branch-2.7 patch. It passed all of the tests locally on my machine. > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, > YARN-4686-branch-2.7.006.patch, YARN-4686.001.patch, YARN-4686.002.patch, > YARN-4686.003.patch, YARN-4686.004.patch, YARN-4686.005.patch, > YARN-4686.006.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197845#comment-15197845 ] Hadoop QA commented on YARN-4686: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 0s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 20s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 18s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 33s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 30s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} |
[jira] [Commented] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200326#comment-15200326 ] Hadoop QA commented on YARN-4829: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 42s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s {color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 19s {color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 14s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198218#comment-15198218 ] Daniel Templeton commented on YARN-4311: I don't have any further comments. LGTM. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, > YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v2.patch, > YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, > YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4831) Recovered containers will be killed after NM stateful restart
[ https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-4831: -- Description: {code} 2016-03-04 19:43:48,130 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1456335621285_0040_01_66 transitioned from NEW to DONE 2016-03-04 19:43:48,130 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1456335621285_0040 {code} > Recovered containers will be killed after NM stateful restart > -- > > Key: YARN-4831 > URL: https://issues.apache.org/jira/browse/YARN-4831 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li > > {code} > 2016-03-04 19:43:48,130 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1456335621285_0040_01_66 transitioned from NEW to > DONE > 2016-03-04 19:43:48,130 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service >OPERATION=Container Finished - Killed TARGET=ContainerImpl > RESULT=SUCCESS APPID=application_1456335621285_0040 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page
[ https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199784#comment-15199784 ] Sunil G commented on YARN-4517: --- [~varun_saxena] bq.Regarding node labels, we can add it in REST response indicating if labels are enabled or not. We can do this later because this would require another JIRA for REST changes For this, I have some done changes and we can immediately know whether labels are in cluster (i am finishing up with node-label page now, will upload a patch soon). I will try and see whether we can make a unified patch for all REST changes needed for UI together. Will syncup with you offline and will share summary here. bq.Maybe something like: "You have to ssh to the missing nodes' /xxx/ dir to look for the logs" I am +1 for giving more information's. From RM, we can get the node ip/hostname. Atleast we can give a relative patch for getting dir for logs (may be from available path from yarn-default.xml). User can changes, so message can be a possible suggestion. > [YARN-3368] Add nodes page > -- > > Key: YARN-4517 > URL: https://issues.apache.org/jira/browse/YARN-4517 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Wangda Tan >Assignee: Varun Saxena > Labels: webui > Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, > Screenshot_after_4709.png, Screenshot_after_4709_1.png, > YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch > > > We need nodes page added to next generation web UI, similar to existing > RM/nodes page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-998) Persistent resource change during NM/RM restart
[ https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197345#comment-15197345 ] Junping Du commented on YARN-998: - Thanks Jian for review and comments. bq. DynamicResourceConfiguration(configuration, true), the second parameter is not needed because it’s always passing ‘true’; Nice catch! Will remove it in next patch. bq. instead of reload the config again, looks like we can just call resourceTrackerServce.set(newConf) to replace the config? newConfig is reloaded earlier in the same call path. I thought of this before but my original concern is a bit risky to have an api to replace config with whatever come in. Will update it if this is not a valid concern. > Persistent resource change during NM/RM restart > --- > > Key: YARN-998 > URL: https://issues.apache.org/jira/browse/YARN-998 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-998-sample.patch, YARN-998-v1.patch > > > When NM is restarted by plan or from a failure, previous dynamic resource > setting should be kept for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4576) Enhancement for tracking Blacklist in AM Launching
[ https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201139#comment-15201139 ] Vinod Kumar Vavilapalli commented on YARN-4576: --- Please read my comments on YARN-4837, this whole "AM blacklisting" feature is unnecessarily blown way out of proportion - we just don't need this amount of complexity. Adding more functionality like global lists (YARN-4635), per-user lists (YARN-4790), pluggable blacklisting ((!)) (YARN-4636) etc will makes things far worse. Containers are marked DISKS_FAILED only if all the disks have become bad, in which case the node itself becomes unhealthy. So there is no need for blacklisting per app at all !! If an AM is killed due to memory over-flow, blacklisting the node will not help at all! Overall, like I commented on [the JIRA YARN-4790|https://issues.apache.org/jira/browse/YARN-4790?focusedCommentId=15191217=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15191217], what we need is to not penalize applications for system related issues. When YARN finds a node with configuration / permission issues, it should itself take an action to (a) avoid scheduling on that node, (b) alert administrators etc. Implementing heuristics for app / user level blacklisting to work-around platform problems should be a last-ditch effort. We did that in Hadoop 1 MapReduce as we didn't have clear demarcation between app vs system failures. But that isn't the case with YARN - part of the reason why we never implemented heuristics based per-app blacklisting in YARN - we left that completely up to applications. > Enhancement for tracking Blacklist in AM Launching > -- > > Key: YARN-4576 > URL: https://issues.apache.org/jira/browse/YARN-4576 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: EnhancementAMLaunchingBlacklist.pdf > > > Before YARN-2005, YARN blacklist mechanism is to track the bad nodes by AM: > If AM tried to launch containers on a specific node get failed for several > times, AM will blacklist this node in future resource asking. This mechanism > works fine for normal containers. However, from our observation on behaviors > of several clusters: if this problematic node launch AM failed, then RM could > pickup this problematic node to launch next AM attempts again and again that > cause application failure in case other functional nodes are busy. In normal > case, the customized healthy checker script cannot be so sensitive to mark > node as unhealthy when one or two containers get launched failed. > After YARN-2005, we can have a BlacklistManager in each RMapp, so those nodes > who launching AM attempts failed for specific application before will get > blacklisted. To get rid of potential risks that all nodes being blacklisted > by BlacklistManager, a disable-failure-threshold is involved to stop adding > more nodes into blacklist if hit certain ratio already. > There are already some enhancements for this AM blacklist mechanism: > YARN-4284 is to address the more wider case for AM container get launched > failure and YARN-4389 tries to make configuration settings available for > change by App to meet app specific requirement. However, there are still > several gaps to address more scenarios: > 1. We may need a global blacklist instead of each app maintain a separated > one. The reason is: AM could get more chance to fail if other AM get failed > before. A quick example is: in a busy cluster, all nodes are busy except two > problematic nodes: node a and node b, app1 already submit and get failed in > two AM attempts on a and b. app2 and other apps should wait for other busy > nodes rather than waste attempts on these two problematic nodes. > 2. If AM container failure is recognized as global event instead app own > issue, we should consider the blacklist is not a permanent thing but with a > specific time window. > 3. We could have user defined black list polices to address more possible > cases and scenarios, so it reasonable to make blacklist policy pluggable. > 4. For some test scenario, we could have whitelist mechanism for AM launching. > 5. Some minor issues: it sounds like NM reconnect won't refresh blacklist so > far. > Will try to address all issues here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler
[ https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200116#comment-15200116 ] Hudson commented on YARN-4785: -- FAILURE: Integrated in Hadoop-trunk-Commit #9473 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9473/]) YARN-4785. inconsistent value type of the type field for LeafQueueInfo (junping_du: rev ca8106d2dd03458944303d93679daa03b1d82ad5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java > inconsistent value type of the "type" field for LeafQueueInfo in response of > RM REST API - cluster/scheduler > > > Key: YARN-4785 > URL: https://issues.apache.org/jira/browse/YARN-4785 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.6.0 >Reporter: Jayesh >Assignee: Varun Vasudev > Labels: REST_API > Fix For: 2.8.0, 2.7.3, 2.6.5 > > Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, > YARN-4785.branch-2.7.001.patch > > > I see inconsistent value type ( String and Array ) of the "type" field for > LeafQueueInfo in response of RM REST API - cluster/scheduler > as per the spec it should be always String. > here is the sample output ( removed non-relevant fields ) > {code} > { > "scheduler": { > "schedulerInfo": { > "type": "capacityScheduler", > "capacity": 100, > ... > "queueName": "root", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 0.1, > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 0.1, > "queueName": "test-queue", > "state": "RUNNING", > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 2.5, > > }, > { > "capacity": 25, > > "state": "RUNNING", > "queues": { > "queue": [ > { > "capacity": 6, > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > > }, > { > "capacity": 6, > ... > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > ... > }, > ... > ] > }, > ... > } > ] > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400
[ https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4746: --- Attachment: 0003-YARN-4746.patch [~ste...@apache.org] Thanks you for review.Uploading patch with below changes # testInvalidAppAttempts corrected to check invalid attempt earlier was checking invalid appId # Moved parse validation for applicationID to WebAppUtil and re-factoring done Please do review > yarn web services should convert parse failures of appId to 400 > --- > > Key: YARN-4746 > URL: https://issues.apache.org/jira/browse/YARN-4746 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, > 0003-YARN-4746.patch > > > I'm seeing somewhere in the WS API tests of mine an error with exception > conversion of a bad app ID sent in as an argument to a GET. I know it's in > ATS, but a scan of the core RM web services implies a same problem > {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} > to convert an argument; this throws IllegalArgumentException, which is then > handled somewhere by jetty as a 500 error. > In fact, it's a bad argument, which should be handled by returning a 400. > This can be done by catching the raised argument and explicitly converting it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reopened YARN-4390: -- Assignee: Wangda Tan (was: Eric Payne) {quote} Since YARN-4108 doesn't solve all the issues. (I planned to solve this together with YARN-4108, but YARN-4108 only tackled half of the problem: when containers selected, only preempt useful containers). However, we need select container more clever based on requirement. I'm thinking about this recently and I plan to make some progresses as soon as possible. May I reopen this JIRA and take over from you? {quote} [~leftnoteasy], I had forgotten that we had closed this JIRA in favor of YARN-4108. Yes, I had noticed that the selection of containers to preempt in YARN-4108 do not actually consider the properties of the needed resources like size or locality. Even still, YARN-4108 is a big improvement and does prevent unnecessary preemption. However, you are correct, implementing this JIRA would eliminate some extra event passing and processing if killable containers are rejected over and over. I am reopening and assigning to you. > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page
[ https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200211#comment-15200211 ] Varun Saxena commented on YARN-4517: Filed YARN-4835 > [YARN-3368] Add nodes page > -- > > Key: YARN-4517 > URL: https://issues.apache.org/jira/browse/YARN-4517 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Wangda Tan >Assignee: Varun Saxena > Labels: webui > Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, > Screenshot_after_4709.png, Screenshot_after_4709_1.png, > YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch > > > We need nodes page added to next generation web UI, similar to existing > RM/nodes page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198862#comment-15198862 ] Anoop Sam John commented on YARN-4736: -- When one RS is down, those regions in that would get moved to other RS(s). And if that was a META serving RS, then META will get moved to new RS. So any way next retries may get things moving. The issue here was that the entire HBase cluster went down and no one brought it back. There are certain must fix things in HBase. Will talk abt that in HBase jira. > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Vrushali C >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, > threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198456#comment-15198456 ] Hadoop QA commented on YARN-4829: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 46s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 36s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s {color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 42s {color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 56s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 13s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}
[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
[ https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Yuan updated YARN-4336: Description: Hi folks after performing some debug for our Unix Engineering and Active Directory teams it was discovered that on YARN Container Initialization a call via Hadoop Common AccessControlList.java: for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } Unfortunately with the security call to check access on "appattempt_X_X_X" will always return false but will make unnecessary calls to NameSwitch service on linux which will call things like SSSD/Quest VASD which will then initiate LDAP calls looking for non existent userid's causing excessive load on LDAP. For now our tactical work around is as follows: Example of VASD Debug log showing the lookups for one task attempt 32 of them: {code} /** * Checks if a user represented by the provided {@link UserGroupInformation} * is a member of the Access Control List * @param ugi UserGroupInformation to check if contained in the ACL * @return true if ugi is member of the list */ public final boolean isUserInList(UserGroupInformation ugi) { if (allAllowed || users.contains(ugi.getShortUserName())) { return true; } else { String patternString = "^appattempt_\\d+_\\d+_\\d+$"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(ugi.getShortUserName()); boolean matches = matcher.matches(); if (matches) { LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR GROUPS!!");; return false; } for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } } return false; } public boolean isUserAllowed(UserGroupInformation ugi) { return isUserInList(ugi); }{code} One task: Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d
[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort
[ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201747#comment-15201747 ] Zephyr Guo commented on YARN-4743: -- I am trying to solve the issue, but I am failed. In my opinion, the issue cause by concurrent operation on {{FSAppAttempt}}.When {{FSLeafQueue}} is sorting FSAppAttempt, the inner {{Resource}} of FsAppAttempt is modified.In this case, {{FairShareComparator}} may cannot work correctly.Base on this idea, I write YARN-4743-cdh5.4.7.patch(I have attached).The patch use snapshot to protect elements during the sorting.Sadly, this problem doesn't resolve with the patch.I got same exception on sorting and more frequently crash.I begin to doubt whether the comparator have a problem really.I reviewed {{FairShareComparator}} code and simulate all cases, but did not found any bugs. I need some idea. I'd like to verify two things.1)Can inner Resource be modified during the sorting?Who could review it for me? 2)Does comparator also have mistakes really or my patch is incorrect? I doubt that float-point precision in comparator, but it's hard to reappear in test cluster(never reappear). It happen with low probability in larger cluster. > ResourceManager crash because TimSort > - > > Key: YARN-4743 > URL: https://issues.apache.org/jira/browse/YARN-4743 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.4 >Reporter: Zephyr Guo >Assignee: Yufei Gu > > {code} > 2016-02-26 14:08:50,821 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeCollapse(TimSort.java:410) > at java.util.TimSort.sort(TimSort.java:214) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2016-02-26 14:08:50,822 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} > Actually, this issue found in 2.6.0-cdh5.4.7. > I think the cause is that we modify {{Resouce}} while we are sorting > {{runnableApps}}. > {code:title=FSLeafQueue.java} > Comparator comparator = policy.getComparator(); > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > {code} > {code:title=FairShareComparator} > public int compare(Schedulable s1, Schedulable s2) { > .. > s1.getResourceUsage(), minShare1); > boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null, > s2.getResourceUsage(), minShare2); > minShareRatio1 = (double) s1.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare1, > ONE).getMemory(); > minShareRatio2 = (double) s2.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare2, > ONE).getMemory(); > .. > {code} > {{getResourceUsage}} will return current Resource. The current Resource is > unstable. > {code:title=FSAppAttempt.java} > @Override > public Resource getResourceUsage() { > // Here the getPreemptedResources() always return zero, except in > // a preemption round > return Resources.subtract(getCurrentConsumption(), > getPreemptedResources()); > } > {code} > {code:title=SchedulerApplicationAttempt} > public Resource getCurrentConsumption() { > return currentConsumption; > } > // This method may modify current Resource. > public synchronized void
[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
[ https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Yuan updated YARN-4336: Description: Hi folks after performing some debug for our Unix Engineering and Active Directory teams it was discovered that on YARN Container Initialization a call via Hadoop Common AccessControlList.java: for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } Unfortunately with the security call to check access on "appattempt_X_X_X" will always return false but will make unnecessary calls to NameSwitch service on linux which will call things like SSSD/Quest VASD which will then initiate LDAP calls looking for non existent userid's causing excessive load on LDAP. For now our tactical work around is as follows: Example of VASD Debug log showing the lookups for one task attempt 32 of them: { /** * Checks if a user represented by the provided {@link UserGroupInformation} * is a member of the Access Control List * @param ugi UserGroupInformation to check if contained in the ACL * @return true if ugi is member of the list */ public final boolean isUserInList(UserGroupInformation ugi) { if (allAllowed || users.contains(ugi.getShortUserName())) { return true; } else { String patternString = "^appattempt_\\d+_\\d+_\\d+$"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(ugi.getShortUserName()); boolean matches = matcher.matches(); if (matches) { LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR GROUPS!!");; return false; } for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } } return false; } public boolean isUserAllowed(UserGroupInformation ugi) { return isUserInList(ugi); }} One task: Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]:
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199987#comment-15199987 ] Wangda Tan commented on YARN-4390: -- Thanks [~eepayne]/[~sunilg]. Will add implementation soon. > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
[ https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Yuan updated YARN-4336: Description: Hi folks after performing some debug for our Unix Engineering and Active Directory teams it was discovered that on YARN Container Initialization a call via Hadoop Common AccessControlList.java: for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } Unfortunately with the security call to check access on "appattempt_X_X_X" will always return false but will make unnecessary calls to NameSwitch service on linux which will call things like SSSD/Quest VASD which will then initiate LDAP calls looking for non existent userid's causing excessive load on LDAP. For now our tactical work around is as follows: Example of VASD Debug log showing the lookups for one task attempt 32 of them: {code} /** * Checks if a user represented by the provided {@link UserGroupInformation} * is a member of the Access Control List * @param ugi UserGroupInformation to check if contained in the ACL * @return true if ugi is member of the list */ public final boolean isUserInList(UserGroupInformation ugi) { if (allAllowed || users.contains(ugi.getShortUserName())) { return true; } else { String patternString = "^appattempt_\\d+_\\d+_\\d+$"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(ugi.getShortUserName()); boolean matches = matcher.matches(); if (matches) { LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR GROUPS!!");; return false; } for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } } return false; } public boolean isUserAllowed(UserGroupInformation ugi) { return isUserInList(ugi); }{code} One task: Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d
[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
[ https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Yuan updated YARN-4336: Description: Hi folks after performing some debug for our Unix Engineering and Active Directory teams it was discovered that on YARN Container Initialization a call via Hadoop Common AccessControlList.java: for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } Unfortunately with the security call to check access on "appattempt_X_X_X" will always return false but will make unnecessary calls to NameSwitch service on linux which will call things like SSSD/Quest VASD which will then initiate LDAP calls looking for non existent userid's causing excessive load on LDAP. For now our tactical work around is as follows: Example of VASD Debug log showing the lookups for one task attempt 32 of them: /** * Checks if a user represented by the provided {@link UserGroupInformation} * is a member of the Access Control List * @param ugi UserGroupInformation to check if contained in the ACL * @return true if ugi is member of the list */ public final boolean isUserInList(UserGroupInformation ugi) { if (allAllowed || users.contains(ugi.getShortUserName())) { return true; } else { String patternString = "^appattempt_\\d+_\\d+_\\d+$"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(ugi.getShortUserName()); boolean matches = matcher.matches(); if (matches) { LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR GROUPS!!");; return false; } for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } } return false; } public boolean isUserAllowed(UserGroupInformation ugi) { return isUserInList(ugi); } One task: Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]:
[jira] [Commented] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198171#comment-15198171 ] Hadoop QA commented on YARN-4829: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 33s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s {color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s {color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 18s {color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 54s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 12s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m
[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()
[ https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197504#comment-15197504 ] Hudson commented on YARN-4593: -- FAILURE: Integrated in Hadoop-trunk-Commit #9469 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9469/]) YARN-4593 Deadlock in AbstractService.getConfig() (stevel) (stevel: rev 605fdcbb81687c73ba91a3bd0d607cabd3dc5a67) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/service/AbstractService.java > Deadlock in AbstractService.getConfig() > --- > > Key: YARN-4593 > URL: https://issues.apache.org/jira/browse/YARN-4593 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 > Environment: AM restarting on kerberized cluster >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.9.0 > > Attachments: YARN-4593-001.patch > > > SLIDER-1052 has found a deadlock which can arise in it during AM restart. > Looking at the thread trace, one of the blockages is actually > {{AbstractService.getConfig()}} —this is synchronized and so blocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters
[ https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200374#comment-15200374 ] Hadoop QA commented on YARN-4820: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 7s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 34s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 41s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 44s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 302m 49s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK
[jira] [Updated] (YARN-1508) Document Dynamic Resource Configuration feature
[ https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1508: - Summary: Document Dynamic Resource Configuration feature (was: Rename ResourceOption and document resource over-commitment cases) > Document Dynamic Resource Configuration feature > --- > > Key: YARN-1508 > URL: https://issues.apache.org/jira/browse/YARN-1508 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > > Per Vinod's comment in > YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087) > and Bikas' comment in > YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615), > the name of ResourceOption is not good enough for being understood. Also, we > need to document more on resource overcommitment time and use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts
[ https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200385#comment-15200385 ] Hadoop QA commented on YARN-4595: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 40s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 20s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12794016/YARN-4595.2.patch | | JIRA Issue | YARN-4595 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b1bf64ea3a7a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dc951e6 | | Default Java | 1.7.0_95 | | Multi-JDK versions |
[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201546#comment-15201546 ] Sunil G commented on YARN-4837: --- Thanks [~vinodkv] for pitching in. YARN-2005 blacklists nodes if AM container launch failed due to DISK_FAILED. And after YARN-4284, blacklisting for am-container-failure is made for all container failure except PREEMPTED. There were few discussion on usecase aspects for this change. If blacklisting (am container failure) feature is enabled in cluster level, all applications will be forced to comply the blacklisting rule. YARN-4389 had also an option to disable this feature from application end. Also it could control the threshold if its too strict (and vice versa). Yes, agreeing to your point and its early for user to take blacklisting decisions w/o having much needed/useful information. But by seeing the current aggressive nature, this change was helping in skipping this feature. Agreeing that this has to be a controllable feature without causing problems in a busy cluster. I think may be a time based purging solution can be ideal to allow same app to use the node again. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4502) Fix two AM containers get allocated when AM restart
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200423#comment-15200423 ] Vinod Kumar Vavilapalli commented on YARN-4502: --- [~zxu] / [~djp] bq. It looks like the implementation for AbstractYarnScheduler#getApplicationAttempt(ApplicationAttemptId applicationAttemptId) is also confusing. This is by design - see YARN-1041 - we want to route all the events destined for AppAttempt *only* to the current attempt. We should just document this and move on. > Fix two AM containers get allocated when AM restart > --- > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt > > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400
[ https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201265#comment-15201265 ] Hadoop QA commented on YARN-4746: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 44s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 8s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 53s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 20s {color} | {color:red}
[jira] [Created] (YARN-4828) Create a pull request template for github
Steve Loughran created YARN-4828: Summary: Create a pull request template for github Key: YARN-4828 URL: https://issues.apache.org/jira/browse/YARN-4828 Project: Hadoop YARN Issue Type: Improvement Components: build Affects Versions: 3.0.0 Environment: github Reporter: Steve Loughran Priority: Minor We're starting to see PRs appear without any JIRA, explanation etc. These are going to be ignored without them. It's possible to [create a PR text template](https://help.github.com/articles/creating-a-pull-request-template-for-your-repository/) under {{.github/PULL_REQUEST_TEMPLATE}} We can do such a template, which provides template summary points, such as: * which JIRA * if against an object store, how did you test it? * if its a shell script, how did you test it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201146#comment-15201146 ] Vinod Kumar Vavilapalli commented on YARN-2005: --- -1 for backporting this, while I understand that the original feature-ask is useful for avoiding AM scheduling getting blocked, there are far too many issues with the feature as it is. Please see my comments on YARN-4576 and YARN-4837. > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, > YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, > YARN-2005.008.patch, YARN-2005.009.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4832) NM side resource value should get updated if change applied in RM side
Junping Du created YARN-4832: Summary: NM side resource value should get updated if change applied in RM side Key: YARN-4832 URL: https://issues.apache.org/jira/browse/YARN-4832 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Junping Du Assignee: Junping Du Priority: Critical Now, if we execute CLI to update node (single or multiple) resource in RM side, NM will not receive any notification. It doesn't affect resource scheduling but will make resource usage metrics reported by NM a bit weird. We should sync up new resource between RM and NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters
[ https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197561#comment-15197561 ] Hadoop QA commented on YARN-4820: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 35s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 45s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} |
[jira] [Updated] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4390: - Attachment: YARN-4390-design.1.pdf Uploaded ver.1 design doc for review. You can jump to "Proposal" part directly if you are already familiar with preemption implementation. > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles
[ https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197440#comment-15197440 ] Varun Vasudev commented on YARN-3926: - [~asuresh] - I was speaking with [~leftnoteasy] offline. I suspect we'll have to support 3 modes for the RM-NM handshake which admins can configure - # Strict - the RM and NM resource types must match # RM subset - as long the NM resource types are a superset of the RM, the handshake proceeds - I believe this will address your concerns. Correct? # Allow mismatch - the handshake will not fail due to missing resource types - missing resource types are presumed to be of value 0 by the RM. Does that make sense? > Extend the YARN resource model for easier resource-type management and > profiles > --- > > Key: YARN-3926 > URL: https://issues.apache.org/jira/browse/YARN-3926 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Proposal for modifying resource model and profiles.pdf > > > Currently, there are efforts to add support for various resource-types such > as disk(YARN-2139), network(YARN-2140), and HDFS bandwidth(YARN-2681). These > efforts all aim to add support for a new resource type and are fairly > involved efforts. In addition, once support is added, it becomes harder for > users to specify the resources they need. All existing jobs have to be > modified, or have to use the minimum allocation. > This ticket is a proposal to extend the YARN resource model to a more > flexible model which makes it easier to support additional resource-types. It > also considers the related aspect of “resource profiles” which allow users to > easily specify the various resources they need for any given container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200170#comment-15200170 ] Varun Saxena commented on YARN-2962: bq. Could you also explain in the parameter description why one would want to change it from the default of 0 and how to know what a good split value would be? Ok... bq. I'm not sure this constant adds anything. I found it made the code harder to read than just hard-coding in 0. Hmm...If its harder to read, I can put 0 everywhere. This value should not change in future. bq. This violates the Principle of Least Astonishment. At least log a warning that you're not doing what the user said to. Correct, a warning log should be added. bq. I don't think the accessors are needed. Yes, they are not required. bq. Might want to swap those method names. Agree. safeDeleteIfExists makes more sense in the other case. bq. should be HashMapattempts = new HashMap<>(); Yeah, <> can be used to reduce clutter. Regarding other comments, will add more comments to make tests and main code more easier to read and fix missing javadocs. > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.04.patch, > YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
[ https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Yuan updated YARN-4336: Description: Hi folks after performing some debug for our Unix Engineering and Active Directory teams it was discovered that on YARN Container Initialization a call via Hadoop Common AccessControlList.java: for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } Unfortunately with the security call to check access on "appattempt_X_X_X" will always return false but will make unnecessary calls to NameSwitch service on linux which will call things like SSSD/Quest VASD which will then initiate LDAP calls looking for non existent userid's causing excessive load on LDAP. For now our tactical work around is as follows: Example of VASD Debug log showing the lookups for one task attempt 32 of them: {code /** * Checks if a user represented by the provided {@link UserGroupInformation} * is a member of the Access Control List * @param ugi UserGroupInformation to check if contained in the ACL * @return true if ugi is member of the list */ public final boolean isUserInList(UserGroupInformation ugi) { if (allAllowed || users.contains(ugi.getShortUserName())) { return true; } else { String patternString = "^appattempt_\\d+_\\d+_\\d+$"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(ugi.getShortUserName()); boolean matches = matcher.matches(); if (matches) { LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR GROUPS!!");; return false; } for(String group: ugi.getGroupNames()) { if (groups.contains(group)) { return true; } } } return false; } public boolean isUserAllowed(UserGroupInformation ugi) { return isUserInList(ugi); }} One task: Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching with filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>, base=<>, scope= Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01)) Oct 30 22:57:18 xhadoopm5d vasd[20741]:
[jira] [Updated] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4829: Attachment: YARN-4829-YARN-3926.002.patch Thanks for the review [~asuresh]! Uploaded a new patch with new test cases and some re-factoring of code. With regards to the Ki, Mi, etc - the binary prefix symbols used are the ones in the IEC standard. It's also the format used by Kubernetes. > Add support for binary units > > > Key: YARN-4829 > URL: https://issues.apache.org/jira/browse/YARN-4829 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4829-YARN-3926.001.patch, > YARN-4829-YARN-3926.002.patch > > > The units conversion util should have support for binary units. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4829: Attachment: YARN-4829-YARN-3926.004.patch Uploaded a new patch fixing the test failure. > Add support for binary units > > > Key: YARN-4829 > URL: https://issues.apache.org/jira/browse/YARN-4829 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4829-YARN-3926.001.patch, > YARN-4829-YARN-3926.002.patch, YARN-4829-YARN-3926.003.patch, > YARN-4829-YARN-3926.004.patch > > > The units conversion util should have support for binary units. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-4783. -- Resolution: Won't Fix Resolving as Won't Fix per the above discussion since we don't want to keep an application's security tokens valid for an arbitrary time after the application completes. > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters
[ https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4820: Attachment: YARN-4820.002.patch Uploaded a new patch to address the findbugs warnings. > ResourceManager web redirects in HA mode drops query parameters > --- > > Key: YARN-4820 > URL: https://issues.apache.org/jira/browse/YARN-4820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4820.001.patch, YARN-4820.002.patch > > > The RMWebAppFilter redirects http requests from the standby to the active. > However it drops all the query parameters when it does the redirect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4830) Add support for resource types in the nodemanager
Varun Vasudev created YARN-4830: --- Summary: Add support for resource types in the nodemanager Key: YARN-4830 URL: https://issues.apache.org/jira/browse/YARN-4830 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198410#comment-15198410 ] Karthik Kambatla commented on YARN-4686: On a cursory look, the patch looks reasonable. One thing that caught my eye: are we explicitly transitioning the RM to active even when HA is not enabled? Is that required? > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, > YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, > YARN-4686.005.patch, YARN-4686.006.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-998) Persistent resource change during NM/RM restart
[ https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199584#comment-15199584 ] Junping Du commented on YARN-998: - Hi [~jianhe], would you kindly review the patch again? Thanks! > Persistent resource change during NM/RM restart > --- > > Key: YARN-998 > URL: https://issues.apache.org/jira/browse/YARN-998 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-998-sample.patch, YARN-998-v1.patch, > YARN-998-v2.patch > > > When NM is restarted by plan or from a failure, previous dynamic resource > setting should be kept for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler
[ https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199865#comment-15199865 ] Hadoop QA commented on YARN-4785: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-4785 does not apply to branch-2.6. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793996/YARN-4785.branch-2.6.001.patch | | JIRA Issue | YARN-4785 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10806/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > inconsistent value type of the "type" field for LeafQueueInfo in response of > RM REST API - cluster/scheduler > > > Key: YARN-4785 > URL: https://issues.apache.org/jira/browse/YARN-4785 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.6.0 >Reporter: Jayesh >Assignee: Varun Vasudev > Labels: REST_API > Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, > YARN-4785.branch-2.7.001.patch > > > I see inconsistent value type ( String and Array ) of the "type" field for > LeafQueueInfo in response of RM REST API - cluster/scheduler > as per the spec it should be always String. > here is the sample output ( removed non-relevant fields ) > {code} > { > "scheduler": { > "schedulerInfo": { > "type": "capacityScheduler", > "capacity": 100, > ... > "queueName": "root", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 0.1, > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 0.1, > "queueName": "test-queue", > "state": "RUNNING", > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 2.5, > > }, > { > "capacity": 25, > > "state": "RUNNING", > "queues": { > "queue": [ > { > "capacity": 6, > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > > }, > { > "capacity": 6, > ... > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > ... > }, > ... > ] > }, > ... > } > ] > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field
[ https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200852#comment-15200852 ] Subru Krishnan commented on YARN-4823: -- The test case failures are consistent and unrelated and are covered in YARN-4478 > Refactor the nested reservation id field in listReservation to simple string > field > -- > > Key: YARN-4823 > URL: https://issues.apache.org/jira/browse/YARN-4823 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-4823-v1.patch > > > The listReservation REST API returns a ReservationId field which has a nested > id field which is also called ReservationId. This JIRA proposes to rename the > nested field to a string as it's easier to read and moreover what the > update/delete APIs take in as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4835) [YARN-3368] REST API related changes for new Web UI
[ https://issues.apache.org/jira/browse/YARN-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4835: --- Description: Following things need to be added for AM related web pages. 1. Support task state query param in REST URL for fetching tasks. 2. Support task attempt state query param in REST URL for fetching task attempts. 3. A new REST endpoint to fetch counters for each task belonging to a job. Also have a query param for counter name. i.e. something like : {{/jobs/\{jobid\}/taskCounters}} 4. A REST endpoint in NM for fetching all log files associated with a container. Useful if logs served by NM. > [YARN-3368] REST API related changes for new Web UI > --- > > Key: YARN-4835 > URL: https://issues.apache.org/jira/browse/YARN-4835 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Reporter: Varun Saxena >Assignee: Varun Saxena > > Following things need to be added for AM related web pages. > 1. Support task state query param in REST URL for fetching tasks. > 2. Support task attempt state query param in REST URL for fetching task > attempts. > 3. A new REST endpoint to fetch counters for each task belonging to a job. > Also have a query param for counter name. >i.e. something like : > {{/jobs/\{jobid\}/taskCounters}} > 4. A REST endpoint in NM for fetching all log files associated with a > container. Useful if logs served by NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
[ https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4815: Attachment: YARN-4815.2.patch rebase the patch > ATS 1.5 timelineclinet impl try to create attempt directory for every event > call > > > Key: YARN-4815 > URL: https://issues.apache.org/jira/browse/YARN-4815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4815.1.patch, YARN-4815.2.patch > > > ATS 1.5 timelineclinet impl, try to create attempt directory for every event > call. Since per attempt only one call to create directory is enough, this is > causing perf issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4808) SchedulerNode can use a few more cosmetic changes
[ https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202938#comment-15202938 ] Karthik Kambatla commented on YARN-4808: [~leftnoteasy] - will you be able to review this? > SchedulerNode can use a few more cosmetic changes > - > > Key: YARN-4808 > URL: https://issues.apache.org/jira/browse/YARN-4808 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4808-1.patch > > > We have made some cosmetic changes to SchedulerNode recently. While working > on YARN-4511, realized we could improve it a little more: > # Remove volatile variables - don't see the need for them being volatile > # Some methods end up doing very similar things, so consolidating them > # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity > to include the un-utilized resources, and having two totals can be a little > confusing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table
[ https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200736#comment-15200736 ] Hadoop QA commented on YARN-4062: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 12s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 23s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_74 with JDK v1.8.0_74 generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 45s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_95 with JDK v1.7.0_95 generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s {color} | {color:green} hadoop-yarn-project/hadoop-yarn: patch generated 0 new + 212 unchanged - 1 fixed = 212 total (was 213) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s {color} | {color:green}
[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page
[ https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199761#comment-15199761 ] Varun Saxena commented on YARN-4517: [~gtCarrera9], as per discussion with Wangda, unification of app/container information, we can do after this branch goes into trunk. I think we can definitely unify and have a single container page. I will do this later as part of another JIRA. NM single app page, will have to see. Regarding this, bq. However, in the new UI, when the node is shutdown, I just could not hold myself to try to find a link to the NM logs to figure out why. I think the workflow here changed slightly, hence the user experience. Some other projects like Apache Ambari may want to maintain those information as well, but in YARN, it will be great if we could provide our users a way out. Maybe something like: "You have to ssh to the missing nodes' /xxx/ dir to look for the logs" would even be helpful. In old UI, we did not show anything if node was shutdown. So here, the change is that we are showing even the nodes which have been SHUTDOWN. Just thought that this might be useful info for the admin. Now, some information regarding on which path, user can check the logs may be useful but currently this information is not available in RM. I am not sure how NM can report this to RM. You can report it in node registration but do we need to ? Ambari may have this information because I guess it knows exactly where installation has been done through it. Regarding node labels, we can add it in REST response indicating if labels are enabled or not. We can do this later because this would require another JIRA for REST changes. 500 error I think you are encountering on app page. I will fix it while doing AM pages or as a separate JIRA. > [YARN-3368] Add nodes page > -- > > Key: YARN-4517 > URL: https://issues.apache.org/jira/browse/YARN-4517 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Wangda Tan >Assignee: Varun Saxena > Labels: webui > Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, > Screenshot_after_4709.png, Screenshot_after_4709_1.png, > YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch > > > We need nodes page added to next generation web UI, similar to existing > RM/nodes page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines
[ https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198204#comment-15198204 ] Srikanth Kandula commented on YARN-2965: Go for it :-) We have some dummy code that was good enough to get numbers and experiments but are not actively working on pushing that in. Inigo, i will share that code with you offline so you can pick any useful pieces if you like from that. > Enhance Node Managers to monitor and report the resource usage on machines > -- > > Key: YARN-2965 > URL: https://issues.apache.org/jira/browse/YARN-2965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Robert Grandl >Assignee: Robert Grandl > Attachments: ddoc_RT.docx > > > This JIRA is about augmenting Node Managers to monitor the resource usage on > the machine, aggregates these reports and exposes them to the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters
[ https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197235#comment-15197235 ] Varun Vasudev commented on YARN-4820: - [~steve_l] - sorry I didn't understand the case you mentioned. You're talking about a scenario where the active RM web services redirect you to a standby RM? > ResourceManager web redirects in HA mode drops query parameters > --- > > Key: YARN-4820 > URL: https://issues.apache.org/jira/browse/YARN-4820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4820.001.patch > > > The RMWebAppFilter redirects http requests from the standby to the active. > However it drops all the query parameters when it does the redirect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table
[ https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200414#comment-15200414 ] Sangjin Lee commented on YARN-4062: --- LGTM pending jenkins. I'll commit it once the jenkins comes back clean. > Add the flush and compaction functionality via coprocessors and scanners for > flow run table > --- > > Key: YARN-4062 > URL: https://issues.apache.org/jira/browse/YARN-4062 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-2928-1st-milestone > Attachments: YARN-4062-YARN-2928.04.patch, > YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, > YARN-4062-YARN-2928.07.patch, YARN-4062-YARN-2928.08.patch, > YARN-4062-YARN-2928.09.patch, YARN-4062-YARN-2928.1.patch, > YARN-4062-feature-YARN-2928.01.patch, YARN-4062-feature-YARN-2928.02.patch, > YARN-4062-feature-YARN-2928.03.patch > > > As part of YARN-3901, coprocessor and scanner is being added for storing into > the flow_run table. It also needs a flush & compaction processing in the > coprocessor and perhaps a new scanner to deal with the data during flushing > and compaction stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
Vinod Kumar Vavilapalli created YARN-4837: - Summary: User facing aspects of 'AM blacklisting' feature need fixing Key: YARN-4837 URL: https://issues.apache.org/jira/browse/YARN-4837 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. Looking at the 'AM blacklisting feature', I see several things to be fixed before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field
[ https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-4823: - Attachment: YARN-4823-v1.patch Attaching a patch that refactors the nested reservation id field in listReservation to simple string field > Refactor the nested reservation id field in listReservation to simple string > field > -- > > Key: YARN-4823 > URL: https://issues.apache.org/jira/browse/YARN-4823 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-4823-v1.patch > > > The listReservation REST API returns a ReservationId field which has a nested > id field which is also called ReservationId. This JIRA proposes to rename the > nested field to a string as it's easier to read and moreover what the > update/delete APIs take in as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters
[ https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200204#comment-15200204 ] Steve Loughran commented on YARN-4820: -- ok, I'm wrong, this is a 307, not a 302 ... ignore everything I was complaining about > ResourceManager web redirects in HA mode drops query parameters > --- > > Key: YARN-4820 > URL: https://issues.apache.org/jira/browse/YARN-4820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4820.001.patch, YARN-4820.002.patch > > > The RMWebAppFilter redirects http requests from the standby to the active. > However it drops all the query parameters when it does the redirect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197795#comment-15197795 ] Arun Suresh commented on YARN-4829: --- The patch looks mostly good. Thanks [~vvasudev] Couplo minor nits: * Maybe rename *Mi, Ti, Pi* to *Me, Te, Pe* or maybe replace eveything with *b (Kb, Mb..) to signify binary ? * Can we have a test case that converts between a binary to non-binary (K to Ki) for eg. ? > Add support for binary units > > > Key: YARN-4829 > URL: https://issues.apache.org/jira/browse/YARN-4829 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4829-YARN-3926.001.patch > > > The units conversion util should have support for binary units. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt
[ https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201669#comment-15201669 ] Sangjin Lee commented on YARN-4839: --- Could this be the same issue as pointed out by YARN-4247? We did see this issue in our environment (which is 2.6 + patches), but that was because we backported YARN-2005 without YARN-3361. Not sure if there has been a more recent regression. > ResourceManager deadlock between RMAppAttemptImpl and > SchedulerApplicationAttempt > - > > Key: YARN-4839 > URL: https://issues.apache.org/jira/browse/YARN-4839 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Jason Lowe >Priority: Blocker > > Hit a deadlock in the ResourceManager as one thread was holding the > SchedulerApplicationAttempt lock and trying to call > RMAppAttemptImpl.getMasterContainer while another thread had the > RMAppAttemptImpl lock and was trying to call > SchedulerApplicationAttempt.getResourceUsageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4830) Add support for resource types in the nodemanager
[ https://issues.apache.org/jira/browse/YARN-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4830: Component/s: (was: resourcemanager) > Add support for resource types in the nodemanager > - > > Key: YARN-4830 > URL: https://issues.apache.org/jira/browse/YARN-4830 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197237#comment-15197237 ] Naganarasimha G R commented on YARN-796: Hi [~jameszhouyi], In 2.6.0 label exclusivity is not supported and hope you are also aware that labels are supported only in CS > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, > Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.14.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler
[ https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199809#comment-15199809 ] Junping Du commented on YARN-4785: -- Thanks [~vvasudev], the patch for branch-2.6 and branch-2.7 LGTM. Will commit them shortly. > inconsistent value type of the "type" field for LeafQueueInfo in response of > RM REST API - cluster/scheduler > > > Key: YARN-4785 > URL: https://issues.apache.org/jira/browse/YARN-4785 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.6.0 >Reporter: Jayesh >Assignee: Varun Vasudev > Labels: REST_API > Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, > YARN-4785.branch-2.7.001.patch > > > I see inconsistent value type ( String and Array ) of the "type" field for > LeafQueueInfo in response of RM REST API - cluster/scheduler > as per the spec it should be always String. > here is the sample output ( removed non-relevant fields ) > {code} > { > "scheduler": { > "schedulerInfo": { > "type": "capacityScheduler", > "capacity": 100, > ... > "queueName": "root", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 0.1, > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 0.1, > "queueName": "test-queue", > "state": "RUNNING", > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 2.5, > > }, > { > "capacity": 25, > > "state": "RUNNING", > "queues": { > "queue": [ > { > "capacity": 6, > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > > }, > { > "capacity": 6, > ... > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > ... > }, > ... > ] > }, > ... > } > ] > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400
[ https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4746: --- Attachment: 0003-YARN-4746.patch Attaching patch after testcase fix > yarn web services should convert parse failures of appId to 400 > --- > > Key: YARN-4746 > URL: https://issues.apache.org/jira/browse/YARN-4746 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, > 0003-YARN-4746.patch, 0003-YARN-4746.patch > > > I'm seeing somewhere in the WS API tests of mine an error with exception > conversion of a bad app ID sent in as an argument to a GET. I know it's in > ATS, but a scan of the core RM web services implies a same problem > {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} > to convert an argument; this throws IllegalArgumentException, which is then > handled somewhere by jetty as a 500 error. > In fact, it's a bad argument, which should be handled by returning a 400. > This can be done by catching the raised argument and explicitly converting it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198795#comment-15198795 ] Wangda Tan commented on YARN-4390: -- Hi [~eepayne], Since YARN-4108 doesn't solve all the issues. (I planned to solve this together with YARN-4108, but YARN-4108 only tackled half of the problem: when containers selected, only preempt useful containers). However, we need select container more clever based on requirement. I'm thinking about this recently and I plan to make some progresses as soon as possible. May I reopen this JIRA and take over from you? > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)