[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065517#comment-15065517
 ] 

Hadoop QA commented on YARN-4350:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
47s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
58s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 3s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 
generated 1 new issues (was 15, now 15). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 19s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_91 with JDK v1.7.0_91 
generated 1 new issues (was 15, now 15). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 42s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 55s 
{color} | {color:green} hadoop-yarn-applications-distributedshell in the patch 
passed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 50s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 23s 
{color} | {color:green} hadoop-yarn-applications-distributedshell in the patch 
passed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | 

[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065529#comment-15065529
 ] 

Varun Saxena commented on YARN-4350:


Committed this to feature-YARN-2928.
Thanks [~Naganarasimha] for the contribution and [~sjlee0] for the reviews.

> TestDistributedShell fails for V2 scenarios
> ---
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Fix For: YARN-2928
>
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834

2015-12-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065532#comment-15065532
 ] 

Jian He commented on YARN-4032:
---

Hi [~kasha], YARN-4347 may have fixed this inconsistent issue that may cause RM 
to crash with NPE.

> Corrupted state from a previous version can still cause RM to fail with NPE 
> due to same reasons as YARN-2834
> 
>
> Key: YARN-4032
> URL: https://issues.apache.org/jira/browse/YARN-4032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4032.prelim.patch
>
>
> YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if 
> someone is upgrading from a previous version, the state can still be 
> inconsistent and then RM will still fail with NPE after upgrade to 2.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Add ExecutionType to denote if a container execution is GUARANTEED or QUEUEABLE

2015-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065547#comment-15065547
 ] 

Karthik Kambatla commented on YARN-2882:


Synced up with [~asuresh], [~kkaranasos] and [~subru] offline on this to 
discuss the commonalities with YARN-1011.

The notion of *opportunistic* containers is common, and is governed by the 
following semantics:
# A trusted external agent (RM or LocalRM or NM) can initiate/approve running 
an opportunistic container. 
# Additional policies on execution - queueable or over-subscription - is 
determined by the node's configuration. YARN-2877 would add the queueable flag 
and logic. YARN-1011 would add the over-subscription flag and logic. This logic 
may include having to monitor the usage of the node.
# Only the RM can approve the promotion of an OPPORTUNISTIC container to a 
GUARANTEED container. In case YARN-1011, the RM instigates this directly. 

Haven't looked at the patch closely enough, but high-level comments:
# Rename QUEUEABLE to OPPORTUNISTIC
# Since a GUARANTEED container may be preempted, how about calling it REGULAR 
instead? 
# The ExecutionType is something Yarn decides on. Don't think the client API 
should include it. 

> Add ExecutionType to denote if a container execution is GUARANTEED or 
> QUEUEABLE
> ---
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2882-yarn-2877.001.patch, 
> YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065522#comment-15065522
 ] 

Varun Saxena commented on YARN-4350:


Will commit this shortly

> TestDistributedShell fails for V2 scenarios
> ---
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2015-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065559#comment-15065559
 ] 

Karthik Kambatla commented on YARN-4478:


I am fine with using components too. 

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2015-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065558#comment-15065558
 ] 

Karthik Kambatla commented on YARN-4478:


We ll likely always have failing unit tests that need fixing. Should we just 
use a label to track these instead of an umbrella JIRA? May be create 
additional labels for common failure kinds - timeouts etc. for better tracking 
and look-up? 

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2015-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065570#comment-15065570
 ] 

Karthik Kambatla commented on YARN-1856:


Thanks [~vvasudev] for working on this, [~sidharta-s] and [~vinodkv] for the 
reviews. Excited to see this land. 

Just checking - is there a JIRA for using memory.oom_control? If we don't 
disable oom_control, using the new cgroups-based monitoring/enforcing would be 
a lot more stricter compared to the proc-fs based checks and could lead to 
several task/job failures on existing clusters. OTOH, we might want to enable 
oom_control for opportunistic containers to be used in YARN-2877 and YARN-1011. 
If there is no JIRA yet and you guys are caught up, I am happy to file one and 
work on it. 

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Fix For: 2.9.0
>
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch, YARN-1856.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065599#comment-15065599
 ] 

Naganarasimha G R commented on YARN-4350:
-

Thanks for the review and commit [~varun_saxena] & [~sjlee0],
have added a comment in YARN-4385, regarding this issue.

> TestDistributedShell fails for V2 scenarios
> ---
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Fix For: YARN-2928
>
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4489) Limit flow runs returned while querying flows

2015-12-19 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4489:
--

 Summary: Limit flow runs returned while querying flows
 Key: YARN-4489
 URL: https://issues.apache.org/jira/browse/YARN-4489
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature

2015-12-19 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065637#comment-15065637
 ] 

Dian Fu commented on YARN-4100:
---

Hi [~Naganarasimha],
Very sorry for late response. It LGTM overall. Just a few small comments as 
follows:
{quote}
When "yarn.nodemanager.node-labels.provider" is configured with "config", 
"Script"
{quote}
{{S}} should be lower case for {{Script}}.
{quote}
When "yarn.nodemanager.node-labels.provider" is configured with "config" then
{quote}
A comma can be added before {{then}}
{quote}
which queries the Node labels.
{quote}
{{Node}} can be {{node}}. Actually {{node label}}, {{Node Label}}, {{Node 
label}}, {{node Label}} appears a lot of times in the doc, I think they should 
be consistent.
{quote}
In case of multiple lines have this pattern, then last one will be considered
{quote}
A period should be added at the end.
{quote}
Configured  class needs to extend
{quote}
Two white space between {{Configured}} and {{class}}




 

> Add Documentation for Distributed and Delegated-Centralized Node Labels 
> feature
> ---
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch, 
> YARN-4100.v1.002.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4480) Clean up some inappropriate imports

2015-12-19 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065649#comment-15065649
 ] 

Uma Maheswara Rao G commented on YARN-4480:
---

+1 committing it. 
{noformat}
-1  asflicense  0m 26s  Patch generated 1 ASF License warnings.
{noformat}
This is due to HDFS-9582

> Clean up some inappropriate imports
> ---
>
> Key: YARN-4480
> URL: https://issues.apache.org/jira/browse/YARN-4480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kai Zheng
> Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch
>
>
> It was noticed there are some unnecessary dependency into Directory classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4385) TestDistributedShell times out

2015-12-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065598#comment-15065598
 ] 

Naganarasimha G R commented on YARN-4385:
-

Faced one more intermittent failure in 2928 branch but not related to ATS v2 
code
{code}
--
 T E S T S
---
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 476.165 sec 
<<< FAILURE! - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 29.211 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV1(TestDistributedShell.java:356)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:317)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:195)

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.703 sec - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.yarn.applications.distributedshell.TestDSAppMaster
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.508 sec - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDSAppMaster

Results :

Failed tests: 
  
TestDistributedShell.testDSShellWithDomain:195->testDSShell:317->checkTimelineV1:356
 expected:<2> but was:<3>

Tests run: 16, Failures: 1, Errors: 0, Skipped: 0
{code}
{{TestDistributedShell.checkTimelineV1}} checks whether only 2 (requested) 
containers are being launched. But in reality more than 2 are getting launched. 
possible reasons for it are :
* when RM has assigned additional containers and the Distributed shell AM is 
launching it. I had observed similar behavior of over assigning in MR also but 
MR AM takes care returning the extra apps assigned by the RM. Similar approach 
should exist in Distributed shell AM too.
* container has been killed for some reason and extra Container is started

Not sure which of these cases is causing the assigning of additional 
containers, to analyze this we require more RM and AM logs.
Possible solutions are :
* Instead of checking only 2 we can check for at least 2, so that test case 
will not fail if more than 2 containers are launched
* Try to ensure not more than desired containers are launched even though RM 
allocates more containers 
 

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tsuyoshi Ozawa
>Assignee: Naganarasimha G R
> Attachments: 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4480) Clean up some inappropriate imports

2015-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065651#comment-15065651
 ] 

Hudson commented on YARN-4480:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9004 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9004/])
YARN-4480. Clean up some inappropriate imports. (Kai Zheng via (umamahesh: rev 
0f82b5d878a76b1626c9e07b2fbb55ce2a79232a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


> Clean up some inappropriate imports
> ---
>
> Key: YARN-4480
> URL: https://issues.apache.org/jira/browse/YARN-4480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kai Zheng
> Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch
>
>
> It was noticed there are some unnecessary dependency into Directory classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4480) Clean up some inappropriate imports

2015-12-19 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065655#comment-15065655
 ] 

Uma Maheswara Rao G commented on YARN-4480:
---

Committed to trunk and branch-2, Thanks Kai

> Clean up some inappropriate imports
> ---
>
> Key: YARN-4480
> URL: https://issues.apache.org/jira/browse/YARN-4480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kai Zheng
> Fix For: 2.8.0
>
> Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch
>
>
> It was noticed there are some unnecessary dependency into Directory classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4472) Introduce additional states in the app and app attempt state machines to keep track of the upgrade process

2015-12-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065363#comment-15065363
 ] 

Steve Loughran commented on YARN-4472:
--

If this is exposed in the {{YarnApplicationState}} it's going to break a lot of 
code.

> Introduce additional states in the app and app attempt state machines to keep 
> track of the upgrade process
> --
>
> Key: YARN-4472
> URL: https://issues.apache.org/jira/browse/YARN-4472
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Marco Rabozzi
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2015-12-19 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065611#comment-15065611
 ] 

Rohith Sharma K S commented on YARN-4478:
-

I agree that currently labels OR/AND components can be named as *Test*. 
Point of concern is when a QA report test failures , contributors/committers 
has to search for the test failures JIRA IDs and comment on their respective 
JIRA may be like "test failures are unrelated to this patch. test failure is 
tracked by YARN-" This is very paining task when there are multiple module 
test failures. Instead of remembering all the test failures JIRA, Umbrella JIRA 
would help to find easily.

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4472) Introduce additional states in the app and app attempt state machines to keep track of the upgrade process

2015-12-19 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4472:
-
Hadoop Flags: Incompatible change

> Introduce additional states in the app and app attempt state machines to keep 
> track of the upgrade process
> --
>
> Key: YARN-4472
> URL: https://issues.apache.org/jira/browse/YARN-4472
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Marco Rabozzi
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4470) Application Master in-place upgrade

2015-12-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065369#comment-15065369
 ] 

Steve Loughran commented on YARN-4470:
--

In SLIDER-787 we've already implemented AM upgrade. Specifically, we just have 
the AM commit suicide and rely on AM restart to bring itself back up, getting 
the list of containers back and then rebuilding our state. We also rely on the 
RM to update the HDFS and other tokens as well as the AM/RM token.

As the NMs download the resources again, we pick up the new binaries. What we 
can't do currently is (a) change AM resource requirements or (b) avoid that AM 
restart being mistaken for a failure. YARN-3417 proposes a specific exit code 
there.

Accordingly, I'm not convinced we need to do anything here other than treat a 
specific AM failure exit code/reported exit as a "restart is not a failure"

It does require them AM to initiate the upgrade —but it needs to do this for 
container upgrades anyway. Without the AM doing that part of the process, you'd 
end up with the AM at, say, v1.3 and the containers at 1.2. The AM needs to 
think about version mismatch in AM/container communications, and how to upgrade 
the containers by selective restart.

the clients don't need to worry about handoff across versions provided they 
don't cache URLs/IPC connections, but they need to recover those for AM 
failover anyway. Same for containers, which need to cope with the AM coming up 
somewhere else. We use the YARN-913 registry binding for that.

The main enhancements of this proposal there are (a) side-by-side startup & 
handoff and (b) rollback. Rollback isn't necessarily something that an app can 
easily do: what happens if the upgrade AM fails in "that short time period" 
after changing some state in HDFS, ZK, the containers, etc: you may be able to 
rollback the binaries, but the persistent state can have changed.

w.r.t side-by-side, again, there's that time window. In slider we build up our 
internal state on a restart based on the containers we get in AM registration, 
updating it as queued container failure events start coming in. We actually 
have to synchronize the AM rebuild process so that container callbacks don't 
come until that state has been rebuilt. If the AM came up alongside the 
existing one, it'd get confused pretty fast in the presence of container 
failures during this handoff period. Either it'd be told of them (state 
current, new container requests triggered)  or not told of them (state 
inconsistent). You'd have to do a lot of work

To summarise: even if this feature existed I don't think we'd move slider to 
it; all we'd like is the YARN-3417 exit code, the ability to restart in the 
same container (==no queuing delay) and the ability to request expanded AM 
resources. I could imagine actually separating the two: request a resize in the 
AM container, then, once granted, triggering the restart. Otherwise, we've got 
the complexity in the code for AM upgrades, with the hard part actually dealing 
with AM restart midway through rolling container upgrade, and rollback of 
container upgrades.

I think before trying to implement this feature, have a go at implementing 
rolling upgrades in an existing app and see what's missing.

> Application Master in-place upgrade
> ---
>
> Key: YARN-4470
> URL: https://issues.apache.org/jira/browse/YARN-4470
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Attachments: AM in-place upgrade design. rev1.pdf
>
>
> It would be nice if clients could ask for an AM in-place upgrade.
> It will give to YARN the possibility to upgrade the AM, without losing the 
> work
> done within its containers. This allows to deploy bug-fixes and new versions 
> of the AM incurring in long service downtimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)