[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065517#comment-15065517 ] Hadoop QA commented on YARN-4350: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 47s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 3s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 15, now 15). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 19s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 15, now 15). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 42s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 55s {color} | {color:green} hadoop-yarn-applications-distributedshell in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 50s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 23s {color} | {color:green} hadoop-yarn-applications-distributedshell in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} |
[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065529#comment-15065529 ] Varun Saxena commented on YARN-4350: Committed this to feature-YARN-2928. Thanks [~Naganarasimha] for the contribution and [~sjlee0] for the reviews. > TestDistributedShell fails for V2 scenarios > --- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Fix For: YARN-2928 > > Attachments: YARN-4350-feature-YARN-2928.001.patch, > YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834
[ https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065532#comment-15065532 ] Jian He commented on YARN-4032: --- Hi [~kasha], YARN-4347 may have fixed this inconsistent issue that may cause RM to crash with NPE. > Corrupted state from a previous version can still cause RM to fail with NPE > due to same reasons as YARN-2834 > > > Key: YARN-4032 > URL: https://issues.apache.org/jira/browse/YARN-4032 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Jian He >Priority: Critical > Attachments: YARN-4032.prelim.patch > > > YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if > someone is upgrading from a previous version, the state can still be > inconsistent and then RM will still fail with NPE after upgrade to 2.6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Add ExecutionType to denote if a container execution is GUARANTEED or QUEUEABLE
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065547#comment-15065547 ] Karthik Kambatla commented on YARN-2882: Synced up with [~asuresh], [~kkaranasos] and [~subru] offline on this to discuss the commonalities with YARN-1011. The notion of *opportunistic* containers is common, and is governed by the following semantics: # A trusted external agent (RM or LocalRM or NM) can initiate/approve running an opportunistic container. # Additional policies on execution - queueable or over-subscription - is determined by the node's configuration. YARN-2877 would add the queueable flag and logic. YARN-1011 would add the over-subscription flag and logic. This logic may include having to monitor the usage of the node. # Only the RM can approve the promotion of an OPPORTUNISTIC container to a GUARANTEED container. In case YARN-1011, the RM instigates this directly. Haven't looked at the patch closely enough, but high-level comments: # Rename QUEUEABLE to OPPORTUNISTIC # Since a GUARANTEED container may be preempted, how about calling it REGULAR instead? # The ExecutionType is something Yarn decides on. Don't think the client API should include it. > Add ExecutionType to denote if a container execution is GUARANTEED or > QUEUEABLE > --- > > Key: YARN-2882 > URL: https://issues.apache.org/jira/browse/YARN-2882 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-2882-yarn-2877.001.patch, > YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, yarn-2882.patch > > > This JIRA introduces the notion of container types. > We propose two initial types of containers: guaranteed-start and queueable > containers. > Guaranteed-start are the existing containers, which are allocated by the > central RM and are instantaneously started, once allocated. > Queueable is a new type of container, which allows containers to be queued in > the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065522#comment-15065522 ] Varun Saxena commented on YARN-4350: Will commit this shortly > TestDistributedShell fails for V2 scenarios > --- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch, > YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065559#comment-15065559 ] Karthik Kambatla commented on YARN-4478: I am fine with using components too. > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065558#comment-15065558 ] Karthik Kambatla commented on YARN-4478: We ll likely always have failing unit tests that need fixing. Should we just use a label to track these instead of an umbrella JIRA? May be create additional labels for common failure kinds - timeouts etc. for better tracking and look-up? > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065570#comment-15065570 ] Karthik Kambatla commented on YARN-1856: Thanks [~vvasudev] for working on this, [~sidharta-s] and [~vinodkv] for the reviews. Excited to see this land. Just checking - is there a JIRA for using memory.oom_control? If we don't disable oom_control, using the new cgroups-based monitoring/enforcing would be a lot more stricter compared to the proc-fs based checks and could lead to several task/job failures on existing clusters. OTOH, we might want to enable oom_control for opportunistic containers to be used in YARN-2877 and YARN-1011. If there is no JIRA yet and you guys are caught up, I am happy to file one and work on it. > cgroups based memory monitoring for containers > -- > > Key: YARN-1856 > URL: https://issues.apache.org/jira/browse/YARN-1856 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Varun Vasudev > Fix For: 2.9.0 > > Attachments: YARN-1856.001.patch, YARN-1856.002.patch, > YARN-1856.003.patch, YARN-1856.004.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065599#comment-15065599 ] Naganarasimha G R commented on YARN-4350: - Thanks for the review and commit [~varun_saxena] & [~sjlee0], have added a comment in YARN-4385, regarding this issue. > TestDistributedShell fails for V2 scenarios > --- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Fix For: YARN-2928 > > Attachments: YARN-4350-feature-YARN-2928.001.patch, > YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4489) Limit flow runs returned while querying flows
Varun Saxena created YARN-4489: -- Summary: Limit flow runs returned while querying flows Key: YARN-4489 URL: https://issues.apache.org/jira/browse/YARN-4489 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065637#comment-15065637 ] Dian Fu commented on YARN-4100: --- Hi [~Naganarasimha], Very sorry for late response. It LGTM overall. Just a few small comments as follows: {quote} When "yarn.nodemanager.node-labels.provider" is configured with "config", "Script" {quote} {{S}} should be lower case for {{Script}}. {quote} When "yarn.nodemanager.node-labels.provider" is configured with "config" then {quote} A comma can be added before {{then}} {quote} which queries the Node labels. {quote} {{Node}} can be {{node}}. Actually {{node label}}, {{Node Label}}, {{Node label}}, {{node Label}} appears a lot of times in the doc, I think they should be consistent. {quote} In case of multiple lines have this pattern, then last one will be considered {quote} A period should be added at the end. {quote} Configured class needs to extend {quote} Two white space between {{Configured}} and {{class}} > Add Documentation for Distributed and Delegated-Centralized Node Labels > feature > --- > > Key: YARN-4100 > URL: https://issues.apache.org/jira/browse/YARN-4100 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodeLabel.html, YARN-4100.v1.001.patch, > YARN-4100.v1.002.patch > > > Add Documentation for Distributed Node Labels feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065649#comment-15065649 ] Uma Maheswara Rao G commented on YARN-4480: --- +1 committing it. {noformat} -1 asflicense 0m 26s Patch generated 1 ASF License warnings. {noformat} This is due to HDFS-9582 > Clean up some inappropriate imports > --- > > Key: YARN-4480 > URL: https://issues.apache.org/jira/browse/YARN-4480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kai Zheng > Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch > > > It was noticed there are some unnecessary dependency into Directory classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065598#comment-15065598 ] Naganarasimha G R commented on YARN-4385: - Faced one more intermittent failure in 2928 branch but not related to ATS v2 code {code} -- T E S T S --- Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 476.165 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 29.211 sec <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV1(TestDistributedShell.java:356) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:317) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:195) Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.703 sec - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.yarn.applications.distributedshell.TestDSAppMaster Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.508 sec - in org.apache.hadoop.yarn.applications.distributedshell.TestDSAppMaster Results : Failed tests: TestDistributedShell.testDSShellWithDomain:195->testDSShell:317->checkTimelineV1:356 expected:<2> but was:<3> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0 {code} {{TestDistributedShell.checkTimelineV1}} checks whether only 2 (requested) containers are being launched. But in reality more than 2 are getting launched. possible reasons for it are : * when RM has assigned additional containers and the Distributed shell AM is launching it. I had observed similar behavior of over assigning in MR also but MR AM takes care returning the extra apps assigned by the RM. Similar approach should exist in Distributed shell AM too. * container has been killed for some reason and extra Container is started Not sure which of these cases is causing the assigning of additional containers, to analyze this we require more RM and AM logs. Possible solutions are : * Instead of checking only 2 we can check for at least 2, so that test case will not fail if more than 2 containers are launched * Try to ensure not more than desired containers are launched even though RM allocates more containers > TestDistributedShell times out > -- > > Key: YARN-4385 > URL: https://issues.apache.org/jira/browse/YARN-4385 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tsuyoshi Ozawa >Assignee: Naganarasimha G R > Attachments: > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065651#comment-15065651 ] Hudson commented on YARN-4480: -- FAILURE: Integrated in Hadoop-trunk-Commit #9004 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9004/]) YARN-4480. Clean up some inappropriate imports. (Kai Zheng via (umamahesh: rev 0f82b5d878a76b1626c9e07b2fbb55ce2a79232a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java * hadoop-yarn-project/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java > Clean up some inappropriate imports > --- > > Key: YARN-4480 > URL: https://issues.apache.org/jira/browse/YARN-4480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kai Zheng > Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch > > > It was noticed there are some unnecessary dependency into Directory classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065655#comment-15065655 ] Uma Maheswara Rao G commented on YARN-4480: --- Committed to trunk and branch-2, Thanks Kai > Clean up some inappropriate imports > --- > > Key: YARN-4480 > URL: https://issues.apache.org/jira/browse/YARN-4480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kai Zheng > Fix For: 2.8.0 > > Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch > > > It was noticed there are some unnecessary dependency into Directory classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4472) Introduce additional states in the app and app attempt state machines to keep track of the upgrade process
[ https://issues.apache.org/jira/browse/YARN-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065363#comment-15065363 ] Steve Loughran commented on YARN-4472: -- If this is exposed in the {{YarnApplicationState}} it's going to break a lot of code. > Introduce additional states in the app and app attempt state machines to keep > track of the upgrade process > -- > > Key: YARN-4472 > URL: https://issues.apache.org/jira/browse/YARN-4472 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Marco Rabozzi > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065611#comment-15065611 ] Rohith Sharma K S commented on YARN-4478: - I agree that currently labels OR/AND components can be named as *Test*. Point of concern is when a QA report test failures , contributors/committers has to search for the test failures JIRA IDs and comment on their respective JIRA may be like "test failures are unrelated to this patch. test failure is tracked by YARN-" This is very paining task when there are multiple module test failures. Instead of remembering all the test failures JIRA, Umbrella JIRA would help to find easily. > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4472) Introduce additional states in the app and app attempt state machines to keep track of the upgrade process
[ https://issues.apache.org/jira/browse/YARN-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-4472: - Hadoop Flags: Incompatible change > Introduce additional states in the app and app attempt state machines to keep > track of the upgrade process > -- > > Key: YARN-4472 > URL: https://issues.apache.org/jira/browse/YARN-4472 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Marco Rabozzi > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4470) Application Master in-place upgrade
[ https://issues.apache.org/jira/browse/YARN-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065369#comment-15065369 ] Steve Loughran commented on YARN-4470: -- In SLIDER-787 we've already implemented AM upgrade. Specifically, we just have the AM commit suicide and rely on AM restart to bring itself back up, getting the list of containers back and then rebuilding our state. We also rely on the RM to update the HDFS and other tokens as well as the AM/RM token. As the NMs download the resources again, we pick up the new binaries. What we can't do currently is (a) change AM resource requirements or (b) avoid that AM restart being mistaken for a failure. YARN-3417 proposes a specific exit code there. Accordingly, I'm not convinced we need to do anything here other than treat a specific AM failure exit code/reported exit as a "restart is not a failure" It does require them AM to initiate the upgrade —but it needs to do this for container upgrades anyway. Without the AM doing that part of the process, you'd end up with the AM at, say, v1.3 and the containers at 1.2. The AM needs to think about version mismatch in AM/container communications, and how to upgrade the containers by selective restart. the clients don't need to worry about handoff across versions provided they don't cache URLs/IPC connections, but they need to recover those for AM failover anyway. Same for containers, which need to cope with the AM coming up somewhere else. We use the YARN-913 registry binding for that. The main enhancements of this proposal there are (a) side-by-side startup & handoff and (b) rollback. Rollback isn't necessarily something that an app can easily do: what happens if the upgrade AM fails in "that short time period" after changing some state in HDFS, ZK, the containers, etc: you may be able to rollback the binaries, but the persistent state can have changed. w.r.t side-by-side, again, there's that time window. In slider we build up our internal state on a restart based on the containers we get in AM registration, updating it as queued container failure events start coming in. We actually have to synchronize the AM rebuild process so that container callbacks don't come until that state has been rebuilt. If the AM came up alongside the existing one, it'd get confused pretty fast in the presence of container failures during this handoff period. Either it'd be told of them (state current, new container requests triggered) or not told of them (state inconsistent). You'd have to do a lot of work To summarise: even if this feature existed I don't think we'd move slider to it; all we'd like is the YARN-3417 exit code, the ability to restart in the same container (==no queuing delay) and the ability to request expanded AM resources. I could imagine actually separating the two: request a resize in the AM container, then, once granted, triggering the restart. Otherwise, we've got the complexity in the code for AM upgrades, with the hard part actually dealing with AM restart midway through rolling container upgrade, and rollback of container upgrades. I think before trying to implement this feature, have a go at implementing rolling upgrades in an existing app and see what's missing. > Application Master in-place upgrade > --- > > Key: YARN-4470 > URL: https://issues.apache.org/jira/browse/YARN-4470 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Attachments: AM in-place upgrade design. rev1.pdf > > > It would be nice if clients could ask for an AM in-place upgrade. > It will give to YARN the possibility to upgrade the AM, without losing the > work > done within its containers. This allows to deploy bug-fixes and new versions > of the AM incurring in long service downtimes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)