[jira] [Updated] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4108: - Attachment: YARN-4108.11.patch Removed unrelated changes. (ver.11) > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, > YARN-4108.10.patch, YARN-4108.11.patch, YARN-4108.2.patch, YARN-4108.3.patch, > YARN-4108.4.patch, YARN-4108.5.patch, YARN-4108.6.patch, YARN-4108.7.patch, > YARN-4108.8.patch, YARN-4108.9.patch, YARN-4108.poc.1.patch, > YARN-4108.poc.2-WIP.patch, YARN-4108.poc.3-WIP.patch, > YARN-4108.poc.4-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4108: - Attachment: YARN-4108.10.patch > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, > YARN-4108.10.patch, YARN-4108.2.patch, YARN-4108.3.patch, YARN-4108.4.patch, > YARN-4108.5.patch, YARN-4108.6.patch, YARN-4108.7.patch, YARN-4108.8.patch, > YARN-4108.9.patch, YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, > YARN-4108.poc.3-WIP.patch, YARN-4108.poc.4-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4108: - Attachment: (was: YARN-4108.10.patch) > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, > YARN-4108.2.patch, YARN-4108.3.patch, YARN-4108.4.patch, YARN-4108.5.patch, > YARN-4108.6.patch, YARN-4108.7.patch, YARN-4108.8.patch, YARN-4108.9.patch, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, YARN-4108.poc.3-WIP.patch, > YARN-4108.poc.4-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4108: - Attachment: YARN-4108.10.patch Thanks [~jianhe] for such detailed reviews. Attached ver.10 patch addressed all your comments except: bq. may be check the queue resource here directly instead of this isAllowPreemption flag? I deliberately avoided this because we want to draw a border between LeafQueue and ContainerAllocator, and with this we only need to check queue capacity once for every LeafQueue allocation, otherwise we have to do this in each application. > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, > YARN-4108.10.patch, YARN-4108.2.patch, YARN-4108.3.patch, YARN-4108.4.patch, > YARN-4108.5.patch, YARN-4108.6.patch, YARN-4108.7.patch, YARN-4108.8.patch, > YARN-4108.9.patch, YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, > YARN-4108.poc.3-WIP.patch, YARN-4108.poc.4-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194726#comment-15194726 ] Hudson commented on YARN-4816: -- FAILURE: Integrated in Hadoop-trunk-Commit #9462 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9462/]) YARN-4816. Fix incompatible change in SystemClock. (sseth: rev eba66a64d28b50a660d6f537c767677f5fa0f7ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/SystemClock.java > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.9.0 > > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4712: Attachment: YARN-4712-YARN-2928.v1.006.patch Hi [~sjlee0], Have fixed your comments please review > CPU Usage Metric is not captured properly in YARN-2928 > -- > > Key: YARN-4712 > URL: https://issues.apache.org/jira/browse/YARN-4712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4712-YARN-2928.v1.001.patch, > YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, > YARN-4712-YARN-2928.v1.004.patch, YARN-4712-YARN-2928.v1.005.patch, > YARN-4712-YARN-2928.v1.006.patch > > > There are 2 issues with CPU usage collection > * I was able to observe that that many times CPU usage got from > {{pTree.getCpuUsagePercent()}} is > ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do > the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore > /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE > check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not > encountered. so proper checks needs to be handled > * {{EntityColumnPrefix.METRIC}} uses always LongConverter but > ContainerMonitor is publishing decimal values for the CPU usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
[ https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194708#comment-15194708 ] Hadoop QA commented on YARN-4815: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 28s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 8s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 29s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 50s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 50s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 215 unchanged - 0 fixed = 218 total (was 215) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 6s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 41s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 56s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 42s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch
[jira] [Updated] (YARN-4711) NM is going down with NPE's due to single thread processing of events by Timeline client
[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4711: Attachment: 4711Analysis.txt Hi [~sjlee0], Sorry for the long delay ! >From the analysis i am able to identify that it happens on any exception from >the web server while publishing the entity. Earlier i suspected it might be >due to the time taken to publish but other imp cause could be we are retrying >to publish the entity irrespective of the exception type. So basically {{ContainerManagerImpl.ContainerEventDispatcher.handle(ContainerEvent)}} -> {{nmMetricsPublisher.publishContainerEvent}} -> {{NMTimelinePublisher.ContainerEventHandler.handle(ContainerEvent)}}. So synchronous container metric event dispatching in {{NMTimelinePublisher.dispatcher}} is getting slowed down as {{TimelineClientImpl.putObjects}} is retrying on exception. > NM is going down with NPE's due to single thread processing of events by > Timeline client > > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: 4711Analysis.txt > > > After YARN-3367, while testing the latest 2928 branch came across few NPEs > due to which NM is shutting down. > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > On analysis found that the there was delay in processing of events, as after > YARN-3367 all the events were getting processed by a single thread inside the > timeline client. > Additionally found one scenario where there is possibility of NPE: > * TimelineEntity.toString() when {{real}} is not null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned YARN-4816: Assignee: Siddharth Seth > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194673#comment-15194673 ] Siddharth Seth commented on YARN-4816: -- Thanks for the review [~kasha] - committing to master and branch-2. > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194670#comment-15194670 ] Hadoop QA commented on YARN-4766: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 59s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 56s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 32s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 54s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 54s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 2 new + 121 unchanged - 2 fixed = 123 total (was 123) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 54s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 34s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 36s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch
[jira] [Commented] (YARN-4812) TestFairScheduler#testContinuousScheduling fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194658#comment-15194658 ] Hadoop QA commented on YARN-4812: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 11s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 17s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 160m 4s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793398/yarn-4812-1.patch | | JIRA Issue | YARN-4812 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | |
[jira] [Updated] (YARN-4818) AggregatedLogFormat.LogValue.write() incorrectly truncates files
[ https://issues.apache.org/jira/browse/YARN-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brook Zhou updated YARN-4818: - Summary: AggregatedLogFormat.LogValue.write() incorrectly truncates files (was: AggregatedLogFormat.LogValue writes only in blocks of buffer size) > AggregatedLogFormat.LogValue.write() incorrectly truncates files > > > Key: YARN-4818 > URL: https://issues.apache.org/jira/browse/YARN-4818 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Brook Zhou >Assignee: Brook Zhou > Fix For: 2.8.0 > > > AggregatedLogFormat.LogValue.write() currently has a bug where it only writes > in blocks of the buffer size (65535). This is because > FileInputStream.read(byte[] buf) returns -1 if there are less than buf.length > bytes remaining. In cases where the file size is not an exact multiple of > 65535 bytes, the remaining bytes are truncated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4818) AggregatedLogFormat.LogValue writes only in blocks of buffer size
[ https://issues.apache.org/jira/browse/YARN-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brook Zhou updated YARN-4818: - Description: AggregatedLogFormat.LogValue.write() currently has a bug where it only writes in blocks of the buffer size (65535). This is because FileInputStream.read(byte[] buf) returns -1 if there are less than buf.length bytes remaining. In cases where the file size is not an exact multiple of 65535 bytes, the remaining bytes are truncated. (was: AggregatedLogFormat.LogValue.write() currently has a bug where it only writes in blocks of the buffer size (65535). This is because FileInputStream.read(byte[] buf) returns -1 if there are less than 65535 bytes remaining. In cases where the file is less than 65535 bytes, 0 bytes are written.) > AggregatedLogFormat.LogValue writes only in blocks of buffer size > - > > Key: YARN-4818 > URL: https://issues.apache.org/jira/browse/YARN-4818 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Brook Zhou >Assignee: Brook Zhou > Fix For: 2.8.0 > > > AggregatedLogFormat.LogValue.write() currently has a bug where it only writes > in blocks of the buffer size (65535). This is because > FileInputStream.read(byte[] buf) returns -1 if there are less than buf.length > bytes remaining. In cases where the file size is not an exact multiple of > 65535 bytes, the remaining bytes are truncated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4818) AggregatedLogFormat.LogValue writes only in blocks of buffer size
Brook Zhou created YARN-4818: Summary: AggregatedLogFormat.LogValue writes only in blocks of buffer size Key: YARN-4818 URL: https://issues.apache.org/jira/browse/YARN-4818 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Brook Zhou Assignee: Brook Zhou Fix For: 2.8.0 AggregatedLogFormat.LogValue.write() currently has a bug where it only writes in blocks of the buffer size (65535). This is because FileInputStream.read(byte[] buf) returns -1 if there are less than 65535 bytes remaining. In cases where the file is less than 65535 bytes, 0 bytes are written. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194571#comment-15194571 ] Li Lu commented on YARN-3150: - Thanks [~sjlee0]! Currently both [~xgong] and I are still waiting for some free cycles to work on the documentations of ATS v1.5. YARN-4694 will be the JIRA to track that work. > [Documentation] Documenting the timeline service v2 > --- > > Key: YARN-3150 > URL: https://issues.apache.org/jira/browse/YARN-3150 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > > Let's make sure we will have a document to describe what's new in TS v2, the > APIs, the client libs and so on. We should do better around documentation in > v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4817) Change Log Level to DEBUG for putDomain call in ATS 1.5
[ https://issues.apache.org/jira/browse/YARN-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194568#comment-15194568 ] Hadoop QA commented on YARN-4817: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 37s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793422/YARN-4817.1.patch | | JIRA Issue | YARN-4817 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 73cd428530aa 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh
[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
[ https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194569#comment-15194569 ] Li Lu commented on YARN-4815: - OK, one question, why we're not using the Guava LRU cache here? Thanks! > ATS 1.5 timelineclinet impl try to create attempt directory for every event > call > > > Key: YARN-4815 > URL: https://issues.apache.org/jira/browse/YARN-4815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4815.1.patch > > > ATS 1.5 timelineclinet impl, try to create attempt directory for every event > call. Since per attempt only one call to create directory is enough, this is > causing perf issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4817) Change Log Level to DEBUG for putDomain call in ATS 1.5
[ https://issues.apache.org/jira/browse/YARN-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194564#comment-15194564 ] Li Lu commented on YARN-4817: - LGTM. +1. > Change Log Level to DEBUG for putDomain call in ATS 1.5 > --- > > Key: YARN-4817 > URL: https://issues.apache.org/jira/browse/YARN-4817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Trivial > Attachments: YARN-4817.1.patch > > > We have already changed the log level to DEBUG for putEntity call. Let us > make it consistence for the putDomain call -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
[ https://issues.apache.org/jira/browse/YARN-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194563#comment-15194563 ] Li Lu commented on YARN-4814: - Oops I didn't look at the patch when posting the last comment. What a big patch. LGTM. +1. Will commit in half a day if there's no objections. > ATS 1.5 timelineclient impl call flush after every event write > -- > > Key: YARN-4814 > URL: https://issues.apache.org/jira/browse/YARN-4814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4814.1.patch > > > ATS 1.5 timelineclient impl call flush after every event write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
[ https://issues.apache.org/jira/browse/YARN-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194562#comment-15194562 ] Li Lu commented on YARN-4814: - Sure. Will take a look at it soon. > ATS 1.5 timelineclient impl call flush after every event write > -- > > Key: YARN-4814 > URL: https://issues.apache.org/jira/browse/YARN-4814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4814.1.patch > > > ATS 1.5 timelineclient impl call flush after every event write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler
[ https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194535#comment-15194535 ] Jayesh commented on YARN-4785: -- prod env: linux 2.6.32-431.el6.x86_64 ( jdk 1.7.0_79) more info: I have following libs in classpath {code} jackson-annotations-2.2.3.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-xc-1.8.8.jar jackson-core-2.2.3.jar jackson-databind-2.2.3.jar jackson-mapper-asl-1.8.8.jar jersey-client-1.8.jar jersey-core-1.8.jar jersey-json-1.8.jar jersey-server-1.8.jar jersey-servlet-1.14.jar {code} dev env ( where I can reproduce this issue) : mac os Yosemite (10.11.3 (15D21)) - jdk 1.7.0_79 more info : this is bare hadoop code (hdp though - HDP-2.2.9.0-tag ) on which I am running the test cases to reproduce. Thanks for looking into this.. did you add test for type assessment in verifySubQueue() ? > inconsistent value type of the "type" field for LeafQueueInfo in response of > RM REST API - cluster/scheduler > > > Key: YARN-4785 > URL: https://issues.apache.org/jira/browse/YARN-4785 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.6.0 >Reporter: Jayesh > Labels: REST_API > > I see inconsistent value type ( String and Array ) of the "type" field for > LeafQueueInfo in response of RM REST API - cluster/scheduler > as per the spec it should be always String. > here is the sample output ( removed non-relevant fields ) > {code} > { > "scheduler": { > "schedulerInfo": { > "type": "capacityScheduler", > "capacity": 100, > ... > "queueName": "root", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 0.1, > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 0.1, > "queueName": "test-queue", > "state": "RUNNING", > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 2.5, > > }, > { > "capacity": 25, > > "state": "RUNNING", > "queues": { > "queue": [ > { > "capacity": 6, > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > > }, > { > "capacity": 6, > ... > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > ... > }, > ... > ] > }, > ... > } > ] > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194534#comment-15194534 ] Jian He commented on YARN-4108: --- - revert RMAppAttemptMetrics changes. - remove setIsAlive method - remove RMContainer#setLeafQueue too as RMContainer#getQueueName is not used any where - usedConsideredKillable -> usedExceptKillable - remove CapacityScheduler#liveContainers - PreemptableEntity -> PreemptableQueue - markContainerForKillableInternal does not need to be a separate method, it can be merged into markContainerForKillable. - parentMaxAvailableResource is no_label resource ? {code} // Deduct killable from used Resources.addTo(parentMaxAvailableResource, getTotalKillableResource(nodePartition)); {code} - availableConsidersKillable -> availableAndKillable - add comments for why killContainersToEnforceMaxQueueCapacity is needed. - may be check the queue resource here directly instead of this isAllowPreemption flag? {{if (availableContainers == 0 && currentResoureLimits.isAllowPreemption()) {}} - add a test case that container will not be preempted if user limit is hit ? > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, > YARN-4108.2.patch, YARN-4108.3.patch, YARN-4108.4.patch, YARN-4108.5.patch, > YARN-4108.6.patch, YARN-4108.7.patch, YARN-4108.8.patch, YARN-4108.9.patch, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, YARN-4108.poc.3-WIP.patch, > YARN-4108.poc.4-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4809) De-duplicate container completion across schedulers
[ https://issues.apache.org/jira/browse/YARN-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-4809: - Assignee: Sunil G > De-duplicate container completion across schedulers > --- > > Key: YARN-4809 > URL: https://issues.apache.org/jira/browse/YARN-4809 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Karthik Kambatla >Assignee: Sunil G > > CapacityScheduler and FairScheduler implement containerCompleted the exact > same way. Duplication across the schedulers can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194487#comment-15194487 ] Sangjin Lee commented on YARN-3150: --- This is to start the discussion. The main documentation for Timeline Service is here: http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/TimelineServer.html This contains a fairly significant amount of information. I think we have some options. We could either (1) add v.2-related information within the same doc, or (2) create a separate doc that's linked off of this doc. I think a separate doc might be easier for users to consume. Otherwise, v.2-related info would be sprinkled throughout the existing document. Thoughts? Also, I'm noticing it has not been updated with v.1.5. Is that planned? If so, where will it be done? cc [~gtCarrera9] > [Documentation] Documenting the timeline service v2 > --- > > Key: YARN-3150 > URL: https://issues.apache.org/jira/browse/YARN-3150 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > > Let's make sure we will have a document to describe what's new in TS v2, the > APIs, the client libs and so on. We should do better around documentation in > v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
[ https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4815: Attachment: YARN-4815.1.patch > ATS 1.5 timelineclinet impl try to create attempt directory for every event > call > > > Key: YARN-4815 > URL: https://issues.apache.org/jira/browse/YARN-4815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4815.1.patch > > > ATS 1.5 timelineclinet impl, try to create attempt directory for every event > call. Since per attempt only one call to create directory is enough, this is > causing perf issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
[ https://issues.apache.org/jira/browse/YARN-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194480#comment-15194480 ] Hadoop QA commented on YARN-4814: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 10s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793412/YARN-4814.1.patch | | JIRA Issue | YARN-4814 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3e2880c2af76 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh
[jira] [Commented] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194475#comment-15194475 ] Hadoop QA commented on YARN-4816: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 55s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s {color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 4s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 12s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793414/YARN-4816.1.txt | | JIRA Issue | YARN-4816 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit
[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4766: - Attachment: yarn4766.002.patch Addressed checkstyle issues > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4766.001.patch, yarn4766.002.patch > > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194418#comment-15194418 ] Naganarasimha G R commented on YARN-4736: - Also seems like [~anoop.hbase] has got the cause for this issue in HBASE-15436, so i think no handling from YARN side right ? > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Vrushali C >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, > threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4736: Attachment: NM_Hang_hbase1.0.3.tar.gz I was able to reproduce Issue 2 with Hbase 1.0.3 too, well its not as frequent as i was able to reproduce it with Hbase 1.0.2. Also one more thing i was able to note was that i was able to reproduce when {{hbase.zookeeper.property.dataDir}} was not configured i.e. when zookeeper's datadir is in */tmp/hbase-*. Well if anything more is required will be ready to share > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Vrushali C >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, > threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
[ https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-4815: --- Assignee: Xuan Gong > ATS 1.5 timelineclinet impl try to create attempt directory for every event > call > > > Key: YARN-4815 > URL: https://issues.apache.org/jira/browse/YARN-4815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > > ATS 1.5 timelineclinet impl, try to create attempt directory for every event > call. Since per attempt only one call to create directory is enough, this is > causing perf issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4245) Clean up container-executor binary invocation interface
[ https://issues.apache.org/jira/browse/YARN-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4245: -- Summary: Clean up container-executor binary invocation interface (was: Clean up container-executor invocation interface) > Clean up container-executor binary invocation interface > --- > > Key: YARN-4245 > URL: https://issues.apache.org/jira/browse/YARN-4245 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0, 2.8.0 >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > > The current container-executor invocation interface (especially for launch > container) is cumbersome to use . Launching a container now requires 13-15 > arguments. This becomes especially problematic when additional, potentially > optional, arguments are required. We need a better mechanism to deal with > this. One such mechanism could be to handle this could be to use a file > containing key/value pairs (similar to container-executor.cfg) corresponding > to the arguments each invocation needs. Such a mechanism would make it easier > to add new optional arguments to container-executor and better manage > existing ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4245) Clean up container-executor invocation interface
[ https://issues.apache.org/jira/browse/YARN-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194345#comment-15194345 ] Vinod Kumar Vavilapalli commented on YARN-4245: --- bq. We need a better mechanism to deal with this. One such mechanism could be to handle this could be to use a file containing key/value pairs (similar to container-executor.cfg) corresponding to the arguments each invocation needs. A little late, but +10! > Clean up container-executor invocation interface > > > Key: YARN-4245 > URL: https://issues.apache.org/jira/browse/YARN-4245 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0, 2.8.0 >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > > The current container-executor invocation interface (especially for launch > container) is cumbersome to use . Launching a container now requires 13-15 > arguments. This becomes especially problematic when additional, potentially > optional, arguments are required. We need a better mechanism to deal with > this. One such mechanism could be to handle this could be to use a file > containing key/value pairs (similar to container-executor.cfg) corresponding > to the arguments each invocation needs. Such a mechanism would make it easier > to add new optional arguments to container-executor and better manage > existing ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3854) Add localization support for docker images
[ https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194342#comment-15194342 ] Vinod Kumar Vavilapalli commented on YARN-3854: --- [~sidharta-s], this one's old but is it a dup of YARN-3289? If the goal is the same, we should close the newer of the two and change JIRA hierarchy etc of the older one as needed. > Add localization support for docker images > -- > > Key: YARN-3854 > URL: https://issues.apache.org/jira/browse/YARN-3854 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > > We need the ability to localize images from HDFS and load them for use when > launching docker containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4817) Change Log Level to DEBUG for putDomain call in ATS 1.5
[ https://issues.apache.org/jira/browse/YARN-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194329#comment-15194329 ] Xuan Gong commented on YARN-4817: - trivial patch > Change Log Level to DEBUG for putDomain call in ATS 1.5 > --- > > Key: YARN-4817 > URL: https://issues.apache.org/jira/browse/YARN-4817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Trivial > Attachments: YARN-4817.1.patch > > > We have already changed the log level to DEBUG for putEntity call. Let us > make it consistence for the putDomain call -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4817) Change Log Level to DEBUG for putDomain call in ATS 1.5
[ https://issues.apache.org/jira/browse/YARN-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4817: Attachment: YARN-4817.1.patch > Change Log Level to DEBUG for putDomain call in ATS 1.5 > --- > > Key: YARN-4817 > URL: https://issues.apache.org/jira/browse/YARN-4817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Trivial > Attachments: YARN-4817.1.patch > > > We have already changed the log level to DEBUG for putEntity call. Let us > make it consistence for the putDomain call -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4817) Change Log Level to DEBUG for putDomain call in ATS 1.5
[ https://issues.apache.org/jira/browse/YARN-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4817: Priority: Trivial (was: Major) > Change Log Level to DEBUG for putDomain call in ATS 1.5 > --- > > Key: YARN-4817 > URL: https://issues.apache.org/jira/browse/YARN-4817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Trivial > Attachments: YARN-4817.1.patch > > > We have already changed the log level to DEBUG for putEntity call. Let us > make it consistence for the putDomain call -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4817) Change Log Level to DEBUG for putDomain call in ATS 1.5
Xuan Gong created YARN-4817: --- Summary: Change Log Level to DEBUG for putDomain call in ATS 1.5 Key: YARN-4817 URL: https://issues.apache.org/jira/browse/YARN-4817 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong We have already changed the log level to DEBUG for putEntity call. Let us make it consistence for the putDomain call -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4805) Don't go through all schedulers in ParameterizedTestBase
[ https://issues.apache.org/jira/browse/YARN-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194326#comment-15194326 ] Karthik Kambatla commented on YARN-4805: Thanks Robert. Will check this in tomorrow if I don't hear any objections. > Don't go through all schedulers in ParameterizedTestBase > > > Key: YARN-4805 > URL: https://issues.apache.org/jira/browse/YARN-4805 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4805-1.patch > > > ParameterizedSchedulerTestBase was created to make sure tests that were > written with CapacityScheduler in mind don't fail when run against > FairScheduler. Before this was introduced, tests would fail because > FairScheduler requires an allocation file. > However, the tests that extend it take about 10 minutes per scheduler. So, > instead of running against both schedulers, we could setup the scheduler > appropriately so the tests pass against both schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194325#comment-15194325 ] Karthik Kambatla commented on YARN-4816: My bad. Thanks for catching and fixing this, Sid. +1. > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries
[ https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194316#comment-15194316 ] Hudson commented on YARN-4719: -- FAILURE: Integrated in Hadoop-trunk-Commit #9460 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9460/]) YARN-4719. Add a helper library to maintain node state and allows common (kasha: rev 20d389ce61eaacb5ddfb329015f50e96ad894f8d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ClusterNodeTracker.java > Add a helper library to maintain node state and allows common queries > - > > Key: YARN-4719 > URL: https://issues.apache.org/jira/browse/YARN-4719 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Fix For: 2.9.0 > > Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch, > yarn-4719-4.patch, yarn-4719-5.patch, yarn-4719-6.patch, yarn-4719-7.patch > > > The scheduler could use a helper library to maintain node state and allowing > matching/sorting queries. Several reasons for this: > # Today, a lot of the node state management is done separately in each > scheduler. Having a single library will take us that much closer to reducing > duplication among schedulers. > # Adding a filtering/matching API would simplify node labels and locality > significantly. > # An API that returns a sorted list for a custom comparator would help > YARN-1011 where we want to sort by allocation and utilization for > continuous/asynchronous and opportunistic scheduling respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194303#comment-15194303 ] Hadoop QA commented on YARN-4766: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 11 new + 123 unchanged - 2 fixed = 134 total (was 125) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 36s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch
[jira] [Updated] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-4816: - Attachment: YARN-4816.1.txt Trivial patch. Re-introduces the public constructor and marks it as deprecated. [~kasha] - please review. > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4805) Don't go through all schedulers in ParameterizedTestBase
[ https://issues.apache.org/jira/browse/YARN-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194274#comment-15194274 ] Robert Kanter commented on YARN-4805: - +1 > Don't go through all schedulers in ParameterizedTestBase > > > Key: YARN-4805 > URL: https://issues.apache.org/jira/browse/YARN-4805 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4805-1.patch > > > ParameterizedSchedulerTestBase was created to make sure tests that were > written with CapacityScheduler in mind don't fail when run against > FairScheduler. Before this was introduced, tests would fail because > FairScheduler requires an allocation file. > However, the tests that extend it take about 10 minutes per scheduler. So, > instead of running against both schedulers, we could setup the scheduler > appropriately so the tests pass against both schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4816) SystemClock API broken in 2.9.0
Siddharth Seth created YARN-4816: Summary: SystemClock API broken in 2.9.0 Key: YARN-4816 URL: https://issues.apache.org/jira/browse/YARN-4816 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.9.0 Reporter: Siddharth Seth https://issues.apache.org/jira/browse/YARN-4526 removed the public constructor on SystemClock - making it an incompatible change. cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
[ https://issues.apache.org/jira/browse/YARN-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4814: Attachment: YARN-4814.1.patch > ATS 1.5 timelineclient impl call flush after every event write > -- > > Key: YARN-4814 > URL: https://issues.apache.org/jira/browse/YARN-4814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4814.1.patch > > > ATS 1.5 timelineclient impl call flush after every event write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
[ https://issues.apache.org/jira/browse/YARN-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194270#comment-15194270 ] Xuan Gong commented on YARN-4814: - [~gtCarrera9] Could you review it, please ? > ATS 1.5 timelineclient impl call flush after every event write > -- > > Key: YARN-4814 > URL: https://issues.apache.org/jira/browse/YARN-4814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4814.1.patch > > > ATS 1.5 timelineclient impl call flush after every event write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194235#comment-15194235 ] Eric Badger commented on YARN-4686: --- As per my comment above: TestYarnCLI, TestAMRMClient, TestYarnClient, TestNMClient, and TestGetGroups are failing in multiple recent precommit builds YARN-4117, YARN-4630, YARN-4676. TestMiniYarnClusterNodeUtilization is tracked by YARN-4566. TestContainerManagerSecurity is failing on other recent precommit builds YARN-4117, YARN-4566. Those are the only tests that have failed and all all unrelated to the patch. [~jlowe] [~kasha] [~eepayne] please review the patch and give me your thoughts. Thanks! > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, > YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, > YARN-4686.005.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call
Xuan Gong created YARN-4815: --- Summary: ATS 1.5 timelineclinet impl try to create attempt directory for every event call Key: YARN-4815 URL: https://issues.apache.org/jira/browse/YARN-4815 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong ATS 1.5 timelineclinet impl, try to create attempt directory for every event call. Since per attempt only one call to create directory is enough, this is causing perf issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4812) TestFairScheduler#testContinuousScheduling fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194170#comment-15194170 ] Karthik Kambatla commented on YARN-4812: I ran it a few thousand times, and didn't see it fail. Before this patch, it would hardly take 10 runs to see this fail. > TestFairScheduler#testContinuousScheduling fails intermittently > --- > > Key: YARN-4812 > URL: https://issues.apache.org/jira/browse/YARN-4812 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4812-1.patch > > > This test has failed in the past, and there seem to be more issues. > {noformat} > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3816) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
[ https://issues.apache.org/jira/browse/YARN-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194161#comment-15194161 ] Xuan Gong commented on YARN-4814: - Looks like the flush happens in ObjectMapper#writeValue. {code} @Override public void writeValue(JsonGenerator jgen, Object value) throws IOException, JsonGenerationException, JsonMappingException { SerializationConfig config = copySerializationConfig(); if (config.isEnabled(SerializationConfig.Feature.CLOSE_CLOSEABLE) && (value instanceof Closeable)) { _writeCloseableValue(jgen, value, config); } else { _serializerProvider.serializeValue(config, jgen, value, _serializerFactory); if (config.isEnabled(SerializationConfig.Feature.FLUSH_AFTER_WRITE_VALUE)) { jgen.flush(); } } } {code} For the performance purpose, we already have the flush timer, so we do not need to flush every time. > ATS 1.5 timelineclient impl call flush after every event write > -- > > Key: YARN-4814 > URL: https://issues.apache.org/jira/browse/YARN-4814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > > ATS 1.5 timelineclient impl call flush after every event write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
[ https://issues.apache.org/jira/browse/YARN-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194162#comment-15194162 ] Xuan Gong commented on YARN-4814: - The simple fix could be: config Feature.FLUSH_AFTER_WRITE_VALUE as false when we create objectMapper object. > ATS 1.5 timelineclient impl call flush after every event write > -- > > Key: YARN-4814 > URL: https://issues.apache.org/jira/browse/YARN-4814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > > ATS 1.5 timelineclient impl call flush after every event write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4814) ATS 1.5 timelineclient impl call flush after every event write
Xuan Gong created YARN-4814: --- Summary: ATS 1.5 timelineclient impl call flush after every event write Key: YARN-4814 URL: https://issues.apache.org/jira/browse/YARN-4814 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong ATS 1.5 timelineclient impl call flush after every event write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194145#comment-15194145 ] Hadoop QA commented on YARN-4686: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 35s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 57s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 4s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 38s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 8s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 4s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} |
[jira] [Created] (YARN-4813) TestRMWebServicesDelegationTokenAuthentication.testDoAs fails intermittently
Daniel Templeton created YARN-4813: -- Summary: TestRMWebServicesDelegationTokenAuthentication.testDoAs fails intermittently Key: YARN-4813 URL: https://issues.apache.org/jira/browse/YARN-4813 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.9.0 Reporter: Daniel Templeton {noformat} --- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 11.627 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication testDoAs[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication) Time elapsed: 0.208 sec <<< ERROR! java.io.IOException: Server returned HTTP response code: 403 for URL: http://localhost:8088/ws/v1/cluster/delegation-token at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication$3.call(TestRMWebServicesDelegationTokenAuthentication.java:407) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication$3.call(TestRMWebServicesDelegationTokenAuthentication.java:398) at org.apache.hadoop.security.authentication.KerberosTestUtils$1.run(KerberosTestUtils.java:120) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.authentication.KerberosTestUtils.doAs(KerberosTestUtils.java:117) at org.apache.hadoop.security.authentication.KerberosTestUtils.doAsClient(KerberosTestUtils.java:133) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.getDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:398) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.testDoAs(TestRMWebServicesDelegationTokenAuthentication.java:357) Results : Tests in error: TestRMWebServicesDelegationTokenAuthentication.testDoAs:357->getDelegationToken:398 » IO Tests run: 8, Failures: 0, Errors: 1, Skipped: 0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4809) De-duplicate container completion across schedulers
[ https://issues.apache.org/jira/browse/YARN-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194119#comment-15194119 ] Karthik Kambatla commented on YARN-4809: Please feel free to take up. I might be able to review. > De-duplicate container completion across schedulers > --- > > Key: YARN-4809 > URL: https://issues.apache.org/jira/browse/YARN-4809 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Karthik Kambatla > > CapacityScheduler and FairScheduler implement containerCompleted the exact > same way. Duplication across the schedulers can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4812) TestFairScheduler#testContinuousScheduling fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4812: --- Attachment: yarn-4812-1.patch Moved testContinuousScheduling to a separate class that uses mock clocks instead of depending on system clocks. > TestFairScheduler#testContinuousScheduling fails intermittently > --- > > Key: YARN-4812 > URL: https://issues.apache.org/jira/browse/YARN-4812 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4812-1.patch > > > This test has failed in the past, and there seem to be more issues. > {noformat} > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3816) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194047#comment-15194047 ] Jonathan Maron commented on YARN-4757: -- I'm trying to address all of these issues/concerns in the document I reference above - it'll probably be a good way to structure the discussion. I hope to have it posted to this JIRA this week. Some quick points: - I'm trying to address security by leveraging the existing DNS security extensions (DNSSEC). The exposed DNS facility will have to accommodate both Java and non-Java clients, and as such should probably not provide proprietary or non-compliant security mechanisms. In addition, for the DNS facility will more than likely need to interoperate with existing DNS resources (e.g. a corporate BIND server). DNS security is structured more around the idea of validating the authenticity of returned information rather than authenticating identities. In addition, I believe the approach I'm proposing will address the authentication concerns. - As Allen mentioned - there are existing approaches for interacting with DNS name servers. I have been utilizing dnsjava to prototype some approaches. > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4766: - Attachment: yarn4766.001.patch > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4766.001.patch > > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4766: - Attachment: (was: yarn4766.001.patch) > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4766: - Attachment: (was: yarn4766.001.patch) > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4766: - Attachment: (was: yarn4766.001.patch) > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4686: -- Target Version/s: 3.0.0, 2.8.0, 2.7.3 (was: 2.7.3) > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, > YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, > YARN-4686.005.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193930#comment-15193930 ] Allen Wittenauer commented on YARN-4757: bq. I am not expert on DNS so it is good to hear that you have thought through this and done your homework. I'm (probably) not doing the work either, but I've been working with DNS for an extremely long time... (as in "before Java existed" long time) bq. It still does not change the need for 2 way authentication and making sure that we can restrict who registers for a service Yup. I share this concern. This is a security hole waiting to happen. bq. I can tell java does not come with built in support, not the end of the world, but also likely non-trivial. I'm assuming by built-in, you mean a specific method for querying SRV records since the Java libs clearly allow one to query for records, even if it is through things like the "fun" JNDI. But fret not, others are already working in this space with lots of example code to look at. See https://github.com/spotify/dns-java, https://github.com/couchbase/couchbase-java-client, http://www.dnsjava.org/, and several others. This isn't new ground being covered here at all and all of the above referenced code should be in a compatible license. > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries
[ https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193857#comment-15193857 ] Wangda Tan commented on YARN-4719: -- +1, thanks [~kasha]. > Add a helper library to maintain node state and allows common queries > - > > Key: YARN-4719 > URL: https://issues.apache.org/jira/browse/YARN-4719 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch, > yarn-4719-4.patch, yarn-4719-5.patch, yarn-4719-6.patch, yarn-4719-7.patch > > > The scheduler could use a helper library to maintain node state and allowing > matching/sorting queries. Several reasons for this: > # Today, a lot of the node state management is done separately in each > scheduler. Having a single library will take us that much closer to reducing > duplication among schedulers. > # Adding a filtering/matching API would simplify node labels and locality > significantly. > # An API that returns a sorted list for a custom comparator would help > YARN-1011 where we want to sort by allocation and utilization for > continuous/asynchronous and opportunistic scheduling respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1547) Prevent DoS of ApplicationMasterProtocol by putting in limits
[ https://issues.apache.org/jira/browse/YARN-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-1547: --- Attachment: (was: YARN-1547.pdf) > Prevent DoS of ApplicationMasterProtocol by putting in limits > - > > Key: YARN-1547 > URL: https://issues.apache.org/jira/browse/YARN-1547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Giovanni Matteo Fumarola > > Points of DoS in ApplicationMasterProtocol > - Host and trackingURL in RegisterApplicationMasterRequest > - Diagnostics, final trackingURL in FinishApplicationMasterRequest > - Unlimited number of resourceAsks, containersToBeReleased and > resourceBlacklistRequest in AllocateRequest > -- Unbounded number of priorities and/or resourceRequests in each ask. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
[ https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193832#comment-15193832 ] Eric Payne commented on YARN-4751: -- Thanks, [~sunilg]. I will look into trunk's version of {{TestCapacitySchedulerNodeLabelUpdate}}. Regarding the larger issue of how to fix this problem in 2.7, I am fine if you want to provide a backport of some of the patches you mentioned. However, my biggest concern is the time and effort that will take, along with the added risk. As I mentioned above, there seems to be a lot of inter-dependencies that involve adding more features and fixes than just the one documented by this JIRA. I'm not sure we want all of that complexity going back into 2.7. > In 2.7, Labeled queue usage not shown properly in capacity scheduler UI > --- > > Key: YARN-4751 > URL: https://issues.apache.org/jira/browse/YARN-4751 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: 2.7 CS UI No BarGraph.jpg, > YARH-4752-branch-2.7.001.patch, YARH-4752-branch-2.7.002.patch > > > In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs > separated by partition. When applications are running on a labeled queue, no > color is shown in the bar graph, and several of the "Used" metrics are zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193769#comment-15193769 ] Robert Joseph Evans commented on YARN-4757: --- [~aw], I am not expert on DNS so it is good to hear that you have thought through this and done your homework. I read up a little on SRV records and it looks like a good fit. It still does not change the need for 2 way authentication and making sure that we can restrict who registers for a service, but because SRV records are not a drop in replacement for A/CNAME records it should not be as big of an issue. Clients are likely going to need to make changes to support SRV records, and from what I can tell java does not come with built in support, not the end of the world, but also likely non-trivial. Especially when it looks like the industry has not decided on how they want to support http. (Although I could be wrong on all of that, because like I said I am not an expert here) I just want to be sure that you are thinking things through, and it looks like you are so I am happy. > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193764#comment-15193764 ] Sangjin Lee commented on YARN-4712: --- Also some quick comments on the latest patch: (ContainersMonitorImpl.java) - l.469-473: We need to note other usages of {{cpuUsageTotalCoresPercentage}}. It is used in tracking the container resource utilization, as well as passed to {{ContainerMetrics.forContainer()}}. If we're no longer going to use this for the {{NMTimelinePublisher}}, we might need to it differently? (NMTimelinePublisher.java) - l.117: we should change the argument name from {{cpuUsageTotalCoresPercentage}} to {{cpuUsagePercentPerCore}} > CPU Usage Metric is not captured properly in YARN-2928 > -- > > Key: YARN-4712 > URL: https://issues.apache.org/jira/browse/YARN-4712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4712-YARN-2928.v1.001.patch, > YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, > YARN-4712-YARN-2928.v1.004.patch, YARN-4712-YARN-2928.v1.005.patch > > > There are 2 issues with CPU usage collection > * I was able to observe that that many times CPU usage got from > {{pTree.getCpuUsagePercent()}} is > ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do > the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore > /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE > check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not > encountered. so proper checks needs to be handled > * {{EntityColumnPrefix.METRIC}} uses always LongConverter but > ContainerMonitor is publishing decimal values for the CPU usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4812) TestFairScheduler#testContinuousScheduling fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4812: --- Description: This test has failed in the past, and there seem to be more issues. {noformat} java.lang.AssertionError: expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3816) {noformat} was:This test has failed in the past, and there seem to be more issues. > TestFairScheduler#testContinuousScheduling fails intermittently > --- > > Key: YARN-4812 > URL: https://issues.apache.org/jira/browse/YARN-4812 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > This test has failed in the past, and there seem to be more issues. > {noformat} > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3816) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4794: -- Target Version/s: 2.8.0, 2.7.3, 2.9.0 Tentatively targeting all unreleased versions.. > Distributed shell app gets stuck on stopping containers after App completes > --- > > Key: YARN-4794 > URL: https://issues.apache.org/jira/browse/YARN-4794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > > Distributed shell app gets stuck on stopping containers after App completes > with the following exception > {code:title = app log} > 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to > the server : java.nio.channels.ClosedByInterruptException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4771) Some containers can be skipped during log aggregation after NM restart
[ https://issues.apache.org/jira/browse/YARN-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4771: -- Priority: Critical (was: Major) Sounds bad given also the high possibility of leak on the file-system with non-aggregated / non-deleted container-logs, bumping priority. > Some containers can be skipped during log aggregation after NM restart > -- > > Key: YARN-4771 > URL: https://issues.apache.org/jira/browse/YARN-4771 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Priority: Critical > Attachments: YARN-4771.001.patch, YARN-4771.002.patch > > > A container can be skipped during log aggregation after a work-preserving > nodemanager restart if the following events occur: > # Container completes more than > yarn.nodemanager.duration-to-track-stopped-containers milliseconds before the > restart > # At least one other container completes after the above container and before > the restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193732#comment-15193732 ] Sangjin Lee commented on YARN-4712: --- I also think that {{cpuUsagePercentPerCore}} might be a better metric to record than {{cpuUsageTotalCorePercentage}}. One way to understand the difference in using either is with the former the unit is the cores and with the latter it is the machines. Other aspects are entirely similar. Thus, it follows that {{cpuUsagePercentPerCore}} is a finer-grained value than {{cpuUsageTotalCorePercentage}}. For example, to come up with a relative utilization of an app against the full cluster, you need the number of cores as the denominator with the former, and the number of machines with the latter. Granted, obtaining the number of cores can be more difficult than the number of machines. Either model breaks down when those units are no longer interchangeable. For example, with {{cpuUsageTotalCorePercentage}}, it causes inaccurate values if the machines are not of equal size (e.g. machines with different numbers of cores). With {{cpuUsagePercentPerCore}}, it can report inaccurate utilization of the cluster if clock speeds are different between machines. \[1\] cpuUsagePercentPerCore - pro: more accurate and finer-grained reporting of utilization - con: requires the number of cores to come up with the cluster-wide utilization of anything - con: still doesn’t account for different core performance \[2\] cpuUsageTotalCoresPercentage - pro: easier to come up with cluster-wide utilization - con: coarser-grained metric that breaks down the moment machines are not equivalent \[other points\] \[1\] stick with pure utilization One point to consider is whether we should take into account the available capacity as opposed to full machine capacity. There are a couple of ways the available capacity can be different than the full capacity. One is via {{nodeCpuPercentageForYARN}} (coming from the cpu-limit config). Another mechanism is via the allocated vcores mechanism. Either way, for example, one may allocate only 6 cores out of a 8-core machine. If a container is using 6 cores, the question is whether that should be reported as 100% utilization or 75% utilization. Although an argument can be made for either outcome, I think it might be simpler to stick with a pure utilization approach. It would be easier to match those numbers against CPU measurements coming from direct means. We should consider CPU reported by the NM as plain utilization numbers. \[2\] stick with physical cores vs. vcores Another potentially complicating factor is whether we should consider using vcores Using vcores would put this closer to YARN’s resource scheduling model. However, IMO it would make things unnecessarily more complicated. Again, in the vein of treating the CPU as plain utilization that can be matched against the direct measurements, I think we should stick with physical cores. Thoughts? > CPU Usage Metric is not captured properly in YARN-2928 > -- > > Key: YARN-4712 > URL: https://issues.apache.org/jira/browse/YARN-4712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4712-YARN-2928.v1.001.patch, > YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, > YARN-4712-YARN-2928.v1.004.patch, YARN-4712-YARN-2928.v1.005.patch > > > There are 2 issues with CPU usage collection > * I was able to observe that that many times CPU usage got from > {{pTree.getCpuUsagePercent()}} is > ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do > the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore > /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE > check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not > encountered. so proper checks needs to be handled > * {{EntityColumnPrefix.METRIC}} uses always LongConverter but > ContainerMonitor is publishing decimal values for the CPU usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4812) TestFairScheduler#testContinuousScheduling fails intermittently
Karthik Kambatla created YARN-4812: -- Summary: TestFairScheduler#testContinuousScheduling fails intermittently Key: YARN-4812 URL: https://issues.apache.org/jira/browse/YARN-4812 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Karthik Kambatla Assignee: Karthik Kambatla This test has failed in the past, and there seem to be more issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193627#comment-15193627 ] Allen Wittenauer commented on YARN-4757: bq. As far as I know there is no standard for including port(s) in a DNS entry. The proposed solution better use SRV records and not just some stupidly naive approach with A/CNAME records. SRV is built for long lived service discovery using DNS and covers such things as port numbers, weighting, etc. > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4686: -- Attachment: YARN-4686.005.patch I moved the total capacity check out of the waitForNodeManagersToConnect method and into the TestYarnClient#testReservationAPIs test so as to not fail other tests. > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, > YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, > YARN-4686.005.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193577#comment-15193577 ] Robert Joseph Evans commented on YARN-4757: --- I am +1 on the idea of using DNS for long lived service discovery, but we need to be very very careful about security. If we are not all of the problems possible with https://en.wikipedia.org/wiki/DNS_spoofing would likely be possible with this too. We need to be positive that we can restrict the names allowed so there are no conflicts with other servers on the network/internet. Additionally if we make this super simple, which is the entire goal here, then we are covering up some really potentially serious issues with client code, that a normal server running off YARN would not expect to have. It really comes down to any service running on YARN that wants to be secure needs to have 2 way authentication client authenticates server and server authenticates clients. There are timing attacks and other things that can happen when a process crashes and lets go of a port. Internal web services especially feel vulnerable because unless you enable SSL they will be insecure, something that many groups avoid on internal services because of the extra overhead of doing encryption. Do you plan on handling ephemeral ports in some way? As far as I know there is no standard for including port(s) in a DNS entry. If we do come up with something that is non-standard doesn't that still necessitate client side changes which was an expressed goal of this JIRA? If we don't handle ephemeral ports are we going to add in mesos-like scheduling of ports? > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4811) Generate histograms for actual container resource usage
Varun Vasudev created YARN-4811: --- Summary: Generate histograms for actual container resource usage Key: YARN-4811 URL: https://issues.apache.org/jira/browse/YARN-4811 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev The ContainerMetrics class stores some details about actual container resource usage. It would be useful to generate histograms for the actual resource usage as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193505#comment-15193505 ] Hadoop QA commented on YARN-4686: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 14s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 31s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 37s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 21s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} |
[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler
[ https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193476#comment-15193476 ] Varun Vasudev commented on YARN-4785: - [~jhsenjaliya] - Sorry to annoy you about this - can you give me some details about your environment? I added a test for the type field but TestRMWebServicesCapacitySched passed for me. I tested on a Mac OS X - 10.11.2 with JDK 1.7.0_71 and on Ubuntu 14.04 with OpenJDK 1.7.0_79 and 1.8.0_72. > inconsistent value type of the "type" field for LeafQueueInfo in response of > RM REST API - cluster/scheduler > > > Key: YARN-4785 > URL: https://issues.apache.org/jira/browse/YARN-4785 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.6.0 >Reporter: Jayesh > Labels: REST_API > > I see inconsistent value type ( String and Array ) of the "type" field for > LeafQueueInfo in response of RM REST API - cluster/scheduler > as per the spec it should be always String. > here is the sample output ( removed non-relevant fields ) > {code} > { > "scheduler": { > "schedulerInfo": { > "type": "capacityScheduler", > "capacity": 100, > ... > "queueName": "root", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 0.1, > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 0.1, > "queueName": "test-queue", > "state": "RUNNING", > > }, > { > "type": [ > "capacitySchedulerLeafQueueInfo" > ], > "capacity": 2.5, > > }, > { > "capacity": 25, > > "state": "RUNNING", > "queues": { > "queue": [ > { > "capacity": 6, > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > > }, > { > "capacity": 6, > ... > "state": "RUNNING", > "queues": { > "queue": [ > { > "type": "capacitySchedulerLeafQueueInfo", > "capacity": 100, > ... > } > ] > }, > ... > }, > ... > ] > }, > ... > } > ] > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs
[ https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193474#comment-15193474 ] Hudson commented on YARN-4545: -- FAILURE: Integrated in Hadoop-trunk-Commit #9458 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9458/]) YARN-4545. Allow YARN distributed shell to use ATS v1.5 APIs. Li Lu via (junping_du: rev f291d82cd49c04a81380bc45c97c279d791b571c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/DistributedShellTimelinePlugin.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/test/java/org/apache/hadoop/yarn/server/timeline/PluginStoreTestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-project/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineVersion.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineVersionWatcher.java > Allow YARN distributed shell to use ATS v1.5 APIs > - > > Key: YARN-4545 > URL: https://issues.apache.org/jira/browse/YARN-4545 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4545-YARN-4265.001.patch, > YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, > YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, > YARN-4545-trunk.005.patch, YARN-4545-trunk.006.patch, > YARN-4545-trunk.007.patch, YARN-4545-trunk.008.patch, > YARN-4545-trunk.009.patch > > > We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to > allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the > system. We also need to provide a sample plugin to read those data out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs
[ https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193458#comment-15193458 ] Junping Du commented on YARN-4545: -- LGTM too. +1. Committing it now. > Allow YARN distributed shell to use ATS v1.5 APIs > - > > Key: YARN-4545 > URL: https://issues.apache.org/jira/browse/YARN-4545 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4545-YARN-4265.001.patch, > YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, > YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, > YARN-4545-trunk.005.patch, YARN-4545-trunk.006.patch, > YARN-4545-trunk.007.patch, YARN-4545-trunk.008.patch, > YARN-4545-trunk.009.patch > > > We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to > allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the > system. We also need to provide a sample plugin to read those data out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page
[ https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193451#comment-15193451 ] Varun Saxena commented on YARN-4517: Thanks [~gtCarrera9] for the review. bq. One main question: maybe we want to unify all application/container views? I noticed that right now, the "application views" from the application list and from the NM are different. Ideally, we'd like to provide one unified place to show one application, no matter the user arrives from the app list, flow list, NM app list or anywhere else? We can also integrate Similar story also applies to the container view? The application and container states in NM will be distinct from RM. The applications and containers seen here are running containers(applications are seen a bit longer based on keep alive time - depends on config). This information including NM states by itself can be useful. However, I think we can fit some container related information(which is fetched from NM) on the main container page. We can leave out some unnecessary info too. This is more a mimic of what was there on old UI. Will have to discuss page layouts and organization in detail. bq. Meanwhile, maybe it's time to start detailed page style designs. With unified app/container views, we need to address questions like where to put node id/app ids in the page, and how to organize all available data on the page? Sure. Suggestions are welcome. I agree we need to discuss this in detail and reach a consensus on what goes where. Even I have not given a great deal of thought on this. Maybe after we move this branch's code into trunk. Because this is important to go in for YARN-2928. bq. I noticed one workflow related problem: once a NM is in shutdown state, it is not possible to go into the node page. What is the assumed debug workflow on this? The link to node page has been disabled because we cannot reach NM in this state. What information are you expecting to give here ? A way to display NM logs, for instance ? bq. On my local machine, links to application logs are broken with just a 500 error. Maybe we can improve this in future. Where ? On the app page ? bq. Seems like there's no need to show "node labels" if node label is not enabled? In UI, we cannot know if node labels are enabled or not(until and unless we iterate over the whole output and assume if labels are not there for any node, it means labels are not enabled). Even if labels are enabled but not attached to any node, output will be same. Maybe once we start ember from within RM(there is a JIRA for this), we can think about using these configurations. Thoughts ? bq. I'm not sure about the meaning of the row in node status showing "Node Health Report". In NM, we can configure disk health check(true by default) and health check scripts. Node Health report will contain output from that. It will contain information about which disks are bad, for instance. In normal case, it will be empty. > [YARN-3368] Add nodes page > -- > > Key: YARN-4517 > URL: https://issues.apache.org/jira/browse/YARN-4517 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Wangda Tan >Assignee: Varun Saxena > Labels: webui > Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, > Screenshot_after_4709.png, Screenshot_after_4709_1.png, > YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch > > > We need nodes page added to next generation web UI, similar to existing > RM/nodes page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4686: -- Attachment: YARN-4686.004.patch The new patch changes the waitForNodeManagersToConnect method in MiniYARNCluster.java so that it waits for the total plan capacity to be greater than 0 (to ensure that reservations can be made). It fixes the TestYarnClient#testReservationAPIs test failure locally on my machine. However, I'm not sure if this check should be in the MiniYARNCluster itself or whether it should be in the test code that calls it, since only a small amount of tests will actually be worried about the reservation system. > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, > YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4810) NM applicationpage cause internal error 500
[ https://issues.apache.org/jira/browse/YARN-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4810: --- Description: Use url /node/application/ *Case 1* {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45) at org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) ... 44 more {noformat} *Case 2* {noformat} at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:131) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:126) at org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:79) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) ... 44 more {noformat} was: Use url /node/application/ {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45) at org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) ... 44 more {noformat} > NM applicationpage cause internal error 500 > --- > > Key: YARN-4810 > URL: https://issues.apache.org/jira/browse/YARN-4810 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Use url /node/application/ > *Case 1* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at >
[jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192988#comment-15192988 ] Surendra Singh Lilhore commented on YARN-4783: -- Thanks [~jlowe] for comment.. bq. in the general case we can't leave it around forever because it will eventually expire on its own. Therefore we can't support arbitrary delays between the application completing and the log aggregation starting. Agree with you. > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4810) NM applicationpage cause internal error 500
Bibin A Chundatt created YARN-4810: -- Summary: NM applicationpage cause internal error 500 Key: YARN-4810 URL: https://issues.apache.org/jira/browse/YARN-4810 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Use url /node/application/ {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45) at org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) ... 44 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2670) Adding feedback capability to capacity scheduler from external systems
[ https://issues.apache.org/jira/browse/YARN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192982#comment-15192982 ] Ha Son Hai commented on YARN-2670: -- Is there any news on this JIRA? I'm also very interested in this. > Adding feedback capability to capacity scheduler from external systems > -- > > Key: YARN-2670 > URL: https://issues.apache.org/jira/browse/YARN-2670 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Mayank Bansal >Assignee: Mayank Bansal > > The sheer growth in data volume and Hadoop cluster size make it a significant > challenge to diagnose and locate problems in a production-level cluster > environment efficiently and within a short period of time. Often times, the > distributed monitoring systems are not capable of detecting a problem well in > advance when a large-scale Hadoop cluster starts to deteriorate in > performance or becomes unavailable. Thus, incoming workloads, scheduled > between the time when cluster starts to deteriorate and the time when the > problem is identified, suffer from longer execution times. As a result, both > reliability and throughput of the cluster reduce significantly. we address > this problem by proposing a system called Astro, which consists of a > predictive model and an extension to the Capacity scheduler. The predictive > model in Astro takes into account a rich set of cluster behavioral > information that are collected by monitoring processes and model them using > machine learning algorithms to predict future behavior of the cluster. The > Astro predictive model detects anomalies in the cluster and also identifies a > ranked set of metrics that have contributed the most towards the problem. The > Astro scheduler uses the prediction outcome and the list of metrics to decide > whether it needs to move and reduce workloads from the problematic cluster > nodes or to prevent additional workload allocations to them, in order to > improve both throughput and reliability of the cluster. > This JIRA is only for adding feedback capabilities to Capacity Scheduler > which can take feedback from external systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4809) De-duplicate container completion across schedulers
[ https://issues.apache.org/jira/browse/YARN-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192791#comment-15192791 ] Sunil G commented on YARN-4809: --- Hi [~kasha] I could help in taking this. Pls let me know if you have planned the same. > De-duplicate container completion across schedulers > --- > > Key: YARN-4809 > URL: https://issues.apache.org/jira/browse/YARN-4809 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Karthik Kambatla > > CapacityScheduler and FairScheduler implement containerCompleted the exact > same way. Duplication across the schedulers can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)