date:20160821

[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430060#comment-15430060
 ] 

Hadoop QA commented on YARN-5327:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 55s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 0s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824762/YARN-5327.004.patch |
| JIRA Issue | YARN-5327 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 917d417a30bf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 115ecb5 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit |

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-08-21 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430059#comment-15430059
 ] 

Bibin A Chundatt commented on YARN-5545:


Thank you [~sunilg] for looking into issue. Had an offline discussion with 
[~Naganarasimha Garla] also. 

Its always better to handle the limits of application on overall partition 
# Application submitted can ask AM  resource from one partition and other 
resource from another partition. So limit should be on queue level
# User/tenant level limit for application should be based on queue.
# The configuration (maxclusterapplication) is cluster limit of application and 
when we partition to lower level its should be based on queue

*Approach*
Consider average of absolute percentage of all partition,but not average of 
absolute percentage per partition b ,Label 1 can be of 10% of 20 GB and default 
partition can be of 50% of 100GB.

# Get percentage capacity of queue as [ sum of resource of queue A all 
partition (*X*) / Total cluster resource n cluster (*Y*) ]= absolute percentage 
overall cluster (*Z*).
# max application of queue = *Z* * *maxclusterapplication*
# Have to update the max application always with NODE registration and removal.




> App submit failure on queue with label when default queue partition capacity 
> is zero
> 
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>   at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept

[jira] [Commented] (YARN-3998) Add support in the NodeManager to re-launch containers

2016-08-21 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430039#comment-15430039
 ] 

Jun Gong commented on YARN-3998:


Thanks [~asuresh] for pointing it out. I did not notice it. Yes, the 
CURRENT_VERSION_INFO need be updated, it might be enough to update minor 
version since it is compatible. I noticed that YARN-5049 which is committed 
after this issue has changed the CURRENT_VERSION_INFO, so we do not need change 
it in this issue again?

> Add support in the NodeManager to re-launch containers
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.9.0
>
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, 
> YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch, YARN-3998.09.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-21 Thread Sangeetha Abdu Jyothi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeetha Abdu Jyothi updated YARN-5327:

Attachment: YARN-5327.004.patch

> API changes required to support recurring reservations in the YARN 
> ReservationSystem
> 
>
> Key: YARN-5327
> URL: https://issues.apache.org/jira/browse/YARN-5327
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sangeetha Abdu Jyothi
> Attachments: YARN-5327.001.patch, YARN-5327.002.patch, 
> YARN-5327.003.patch, YARN-5327.004.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes needed 
> in ApplicationClientProtocol to accomplish it. Please refer to the design doc 
> in the parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-21 Thread Sangeetha Abdu Jyothi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeetha Abdu Jyothi updated YARN-5327:

Attachment: (was: YARN-5327.004.patch)

> API changes required to support recurring reservations in the YARN 
> ReservationSystem
> 
>
> Key: YARN-5327
> URL: https://issues.apache.org/jira/browse/YARN-5327
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sangeetha Abdu Jyothi
> Attachments: YARN-5327.001.patch, YARN-5327.002.patch, 
> YARN-5327.003.patch, YARN-5327.004.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes needed 
> in ApplicationClientProtocol to accomplish it. Please refer to the design doc 
> in the parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429982#comment-15429982
 ] 

Hadoop QA commented on YARN-5327:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s {color} 
| {color:red} YARN-5327 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824758/YARN-5327.004.patch |
| JIRA Issue | YARN-5327 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12845/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> API changes required to support recurring reservations in the YARN 
> ReservationSystem
> 
>
> Key: YARN-5327
> URL: https://issues.apache.org/jira/browse/YARN-5327
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sangeetha Abdu Jyothi
> Attachments: YARN-5327.001.patch, YARN-5327.002.patch, 
> YARN-5327.003.patch, YARN-5327.004.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes needed 
> in ApplicationClientProtocol to accomplish it. Please refer to the design doc 
> in the parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-21 Thread Sangeetha Abdu Jyothi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeetha Abdu Jyothi updated YARN-5327:

Attachment: YARN-5327.004.patch

> API changes required to support recurring reservations in the YARN 
> ReservationSystem
> 
>
> Key: YARN-5327
> URL: https://issues.apache.org/jira/browse/YARN-5327
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sangeetha Abdu Jyothi
> Attachments: YARN-5327.001.patch, YARN-5327.002.patch, 
> YARN-5327.003.patch, YARN-5327.004.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes needed 
> in ApplicationClientProtocol to accomplish it. Please refer to the design doc 
> in the parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3673) Create a FailoverProxy for Federation services

2016-08-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429927#comment-15429927
 ] 

Hadoop QA commented on YARN-3673:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 59s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
35s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 52s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
51s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
58s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
45s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 45s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 2 
new + 235 unchanged - 0 fixed = 237 total (was 235) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 17s 
{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 52s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824749/YARN-3673-YARN-2915-v3.patch
 |
| JIRA Issue | YARN-3673 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 028a955fdd21 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2915 / 8abb0a8 |
| Default Java | 1.8.0_101 |
| findbugs |

[jira] [Updated] (YARN-3673) Create a FailoverProxy for Federation services

2016-08-21 Thread Subru Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3673:
-
Attachment: YARN-3673-YARN-2915-v3.patch

Adding detailed code comments (v3) as suggested by [~jianhe]

> Create a FailoverProxy for Federation services
> --
>
> Key: YARN-3673
> URL: https://issues.apache.org/jira/browse/YARN-3673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3673-YARN-2915-v1.patch, 
> YARN-3673-YARN-2915-v2.patch, YARN-3673-YARN-2915-v3.patch
>
>
> This JIRA proposes creating a failover proxy for Federation based on the 
> cluster membership information in the StateStore that can be used by both 
> Router & AMRMProxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5543) ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

2016-08-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429879#comment-15429879
 ] 

Hadoop QA commented on YARN-5543:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 25s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824743/YARN-5543.001.patch |
| JIRA Issue | YARN-5543 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f75f3eb6fcad 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 115ecb5 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12843/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12843/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12843/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output |

[jira] [Updated] (YARN-5543) ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

2016-08-21 Thread Min Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Shen updated YARN-5543:
---
Attachment: YARN-5543.001.patch

Attaching the patch with the proposed changes.

> ResourceManager SchedulingMonitor could potentially terminate the preemption 
> checker thread
> ---
>
> Key: YARN-5543
> URL: https://issues.apache.org/jira/browse/YARN-5543
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.7.0, 2.6.1
>Reporter: Min Shen
> Attachments: YARN-5543.001.patch
>
>
> In SchedulingMonitor.java, when the service starts, it starts a checker 
> thread to perform Capacity Scheduler's preemption. However, the 
> implementation of this checker thread has the following issue:
> {code}
> while (!stopped && !Thread.currentThread().isInterrupted()) {
> 
> try {
>   Thread.sleep(monitorInterval)
> } catch (InterruptedException e) {
>   
>   break;
> }
> }
> {code}
> The above code snippet will terminate the checker thread whenever it is 
> interrupted. 
> We noticed in our cluster that this could lead to CapacityScheduler's 
> preemption disabled unexpectedly due to the checker thread getting terminated.
> We propose to use ScheduledExecutorService to improve the robustness of this 
> part of the code to ensure the liveness of CapacityScheduler's preemption 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-08-21 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429802#comment-15429802
 ] 

Sunil G commented on YARN-5545:
---

Thanks [~bibinchundatt] for reporting this. 

We need config for maximum application per queue per label if we need to solve 
the problem cleanly. For long term,  this may be better. With this,  we might 
also need to rebook on metrics, UI etc too. 
Otherwise we need to introduce few hacks when default cap is not configured. 
I prefer first option. Thoughts.?

> App submit failure on queue with label when default queue partition capacity 
> is zero
> 
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>   at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
>   ... 25 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5540) Capacity Scheduler spends too much time looking at empty priorities

2016-08-21 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429777#comment-15429777
 ] 

Arun Suresh edited comment on YARN-5540 at 8/21/16 4:03 PM:


This would affect the FairScheduler too actually..
looks like the {{AppSchedulingInfo::decResourceRequest()}} should be removing 
the empty HashMap it there are no entries returned against that priority / 
schedulerRequestKey.


was (Author: asuresh):
This would affect the FairScheduler too actually..
looks like the {{AppSchedulingInfo::decResourceRequest()}} should be removing 
the empty HashMap it there are no entries returned against that priority / 
schedulerKey.

> Capacity Scheduler spends too much time looking at empty priorities
> ---
>
> Key: YARN-5540
> URL: https://issues.apache.org/jira/browse/YARN-5540
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, fairscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Nathan Roberts
>Assignee: Jason Lowe
>
> We're starting to see the capacity scheduler run out of scheduling horsepower 
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many 
> more priorities (sometimes in the hundreds) than typical MR applications and 
> therefore the loop in the scheduler which examines every priority within 
> every running application, starts to be a hotspot. The priorities appear to 
> stay around forever, even when there is no remaining resource request at that 
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x7fc2d453e800 
> nid=0x22f3 runnable [0x7fc2a8be2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
> - eliminated <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
> - locked <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> - locked <0x0003006fcf60> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
> - locked <0x000300041e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5540) Capacity Scheduler spends too much time looking at empty priorities

2016-08-21 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429777#comment-15429777
 ] 

Arun Suresh commented on YARN-5540:
---

This would affect the FairScheduler too actually..
looks like the {{AppSchedulingInfo::decResourceRequest()}} should be removing 
the empty HashMap it there are no entries returned against that priority / 
schedulerKey.

> Capacity Scheduler spends too much time looking at empty priorities
> ---
>
> Key: YARN-5540
> URL: https://issues.apache.org/jira/browse/YARN-5540
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, fairscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Nathan Roberts
>Assignee: Jason Lowe
>
> We're starting to see the capacity scheduler run out of scheduling horsepower 
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many 
> more priorities (sometimes in the hundreds) than typical MR applications and 
> therefore the loop in the scheduler which examines every priority within 
> every running application, starts to be a hotspot. The priorities appear to 
> stay around forever, even when there is no remaining resource request at that 
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x7fc2d453e800 
> nid=0x22f3 runnable [0x7fc2a8be2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
> - eliminated <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
> - locked <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> - locked <0x0003006fcf60> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
> - locked <0x000300041e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5540) Capacity Scheduler spends too much time looking at empty priorities

2016-08-21 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5540:
--
Component/s: fairscheduler

> Capacity Scheduler spends too much time looking at empty priorities
> ---
>
> Key: YARN-5540
> URL: https://issues.apache.org/jira/browse/YARN-5540
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, fairscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Nathan Roberts
>Assignee: Jason Lowe
>
> We're starting to see the capacity scheduler run out of scheduling horsepower 
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many 
> more priorities (sometimes in the hundreds) than typical MR applications and 
> therefore the loop in the scheduler which examines every priority within 
> every running application, starts to be a hotspot. The priorities appear to 
> stay around forever, even when there is no remaining resource request at that 
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x7fc2d453e800 
> nid=0x22f3 runnable [0x7fc2a8be2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
> - eliminated <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
> - locked <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> - locked <0x0003006fcf60> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
> - locked <0x000300041e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information

2016-08-21 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429775#comment-15429775
 ] 

Arun Suresh commented on YARN-5049:
---

Thanks for brining this up [~jlowe]..
I am now wondering what the criteria is for updating the major and minor 
version number of the {{NMLevelDBStateStore}}.
This particular patch adds a new key suffix */queued* to the state store. Thus, 
the schema has changed but the old schema is still readable by the new version 
of NM. But roll back of the NM to the old version will not be possible (if it 
has queued atleast 1 container the container is in the queued at the time of 
the rollback)
Given the above, I feel this warrants just a minor version (instead of a major 
version update which is what this patch included), in which case the exception 
you specified will not be thrown (since minor versions are compatible).
If you agree, I can update the patch (or create a new JIRA) to just modify the 
minor version, and everything should be fine... else I will revert the patch 
from branch-2 until we have a migration script.


> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 2.9.0
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-08-21 Thread He Tianyi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429731#comment-15429731
 ] 

He Tianyi commented on YARN-4090:
-

Hi there.
I tried to backport this to 2.6.0 and seems deadlock occurs (or possibly 
non-fair sync).
Containers only get assigned periodically.

Any clue? Thanks.

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, YARN-4090.002.patch, sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429717#comment-15429717
 ] 

Bibin A Chundatt commented on YARN-5537:


Failure not related to patch attached. Jira is already available for the same

> Intermittent test failure of 
> TestAMRMClient#testAMRMClientWithContainerResourceChange
> -
>
> Key: YARN-5537
> URL: https://issues.apache.org/jira/browse/YARN-5537
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bibin A Chundatt
> Attachments: Failure.txt, Failure_allocate.txt, YARN-5537.0001.patch
>
>
> Refer to test report 
> https://builds.apache.org/job/PreCommit-YARN-Build/12692/testReport/
> {noformat}
> Running org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.018 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 1.183 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1019)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:909)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5546) NodeManager crashes due to SIGSEGV

2016-08-21 Thread Daniel Haviv (JIRA)

Daniel Haviv created YARN-5546:
--

 Summary: NodeManager crashes due to SIGSEGV
 Key: YARN-5546
 URL: https://issues.apache.org/jira/browse/YARN-5546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Daniel Haviv


NodeManager crash due to SIGSEGV.
hs_err includes the following java stack:
j  
org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fu
sesource/leveldbjni/internal/NativeSlice;)J+0
j  
org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesourc
e/leveldbjni/internal/NativeSlice;)V+11
j  
org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesour
ce/leveldbjni/internal/NativeBuffer;)V+18
j  
org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
j  
org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
j  
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeDeletionTask(ILorg/apache/hadoop/yarn/proto/YarnServerNodemanagerRecoveryProtos$De
letionServiceDeleteTaskProto;)V+32
j  
org.apache.hadoop.yarn.server.nodemanager.DeletionService.recordDeletionTaskInStateStore(Lorg/apache/hadoop/yarn/server/nodemanager/DeletionService$FileDeletionTask;
)V+245
j  
org.apache.hadoop.yarn.server.nodemanager.DeletionService.delete(Ljava/lang/String;Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/fs/Path;)V+44
j  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run()V+271
v  ~StubRoutines::call_stub

and the culprit seems to be :
# Problematic frame:
# C  [libleveldbjni-64-1-5625225739273738004.8+0x2aaac]  
leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType, char const*, 
unsigned long)+0x7c




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429713#comment-15429713
 ] 

Hadoop QA commented on YARN-5537:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
0s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 57s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestYarnClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824730/YARN-5537.0001.patch |
| JIRA Issue | YARN-5537 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux eaab4479470b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 0faee62 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12841/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12841/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12841/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12841/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Intermittent test failure of 
> TestAMRMClient#testAMRMClientWithContainerResourceChange
> -
>
>

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-08-21 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429708#comment-15429708
 ] 

Bibin A Chundatt commented on YARN-5545:


Submit application handled as below. Max application is handled based on 
capacity of queue capacity of default partition.
{{LeafQueue#submitApplication}}
{code}
  // Check submission limits for queues
  if (getNumApplications() >= getMaxApplications()) {
String msg = "Queue " + getQueuePath() + 
" already has " + getNumApplications() + " applications," +
" cannot accept submission of application: " + applicationId;
LOG.info(msg);
throw new AccessControlException(msg);
  }
{code}
 
{{LeafQueue#setupQueueConfigs}} max application is set based in default 
partition absolute capacity if max application per queue is not set.
{code}
maxApplications = conf.getMaximumApplicationsPerQueue(getQueuePath());
if (maxApplications < 0) {
  int maxSystemApps = conf.getMaximumSystemApplications();
  maxApplications =
  (int) (maxSystemApps * queueCapacities.getAbsoluteCapacity());
}
{code}

We should consider max of absolute capacity of all partition in this case.

Any thoughts??

> App submit failure on queue with label when default queue partition capacity 
> is zero
> 
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>   at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept

[jira] [Created] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-08-21 Thread Bibin A Chundatt (JIRA)

Bibin A Chundatt created YARN-5545:
--

 Summary: App submit failure on queue with label when default queue 
partition capacity is zero
 Key: YARN-5545
 URL: https://issues.apache.org/jira/browse/YARN-5545
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: capacity-scheduler.xml

Configure capacity scheduler 

yarn.scheduler.capacity.root.default.capacity=0
yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50


Submit application as below

./yarn jar 
../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
 sleep -Dmapreduce.job.node-label-expression=labelx 
-Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1

{noformat}
2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to 
submit application_1471670113386_0001 to YARN : 
org.apache.hadoop.security.AccessControlException: Queue root.default already 
has 0 applications, cannot accept submission of application: 
application_1471670113386_0001
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at 
org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
at 
org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
application_1471670113386_0001 to YARN : 
org.apache.hadoop.security.AccessControlException: Queue root.default already 
has 0 applications, cannot accept submission of application: 
application_1471670113386_0001
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286)
at 
org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
... 25 more
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-08-21 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5545:
---
Attachment: capacity-scheduler.xml

> App submit failure on queue with label when default queue partition capacity 
> is zero
> 
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>   at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
>   ... 25 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5537:
---
Attachment: Failure_allocate.txt

> Intermittent test failure of 
> TestAMRMClient#testAMRMClientWithContainerResourceChange
> -
>
> Key: YARN-5537
> URL: https://issues.apache.org/jira/browse/YARN-5537
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bibin A Chundatt
> Attachments: Failure.txt, Failure_allocate.txt, YARN-5537.0001.patch
>
>
> Refer to test report 
> https://builds.apache.org/job/PreCommit-YARN-Build/12692/testReport/
> {noformat}
> Running org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.018 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 1.183 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1019)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:909)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429703#comment-15429703
 ] 

Bibin A Chundatt edited comment on YARN-5537 at 8/21/16 12:47 PM:
--

*Analysis*
AM container allocated to node {{localhost:43931}} - container1
container2,container3,container4 allocated to node {{localhost:36489}}

Increase container request gets served once heartbeat is received from 
{{localhost:36489}}.
During random failure node heartbeat request is received from node 
{{localhost:36489}} during allocate request and increase request is served.




was (Author: bibinchundatt):
*Analysis*
AM container allocated to node {{localhost:43931}} - container1
container2,container3,container4 allocated to node {{localhost:36489}}

Increase container request gets served once heartbeat is received from 
{{localhost:36489}}.
During random failure node heartbeat request is received from node 
{{localhost:36489}} during allocate request and increase request is served.

{noformat}
2016-08-20 21:57:42,952 DEBUG [IPC Server handler 2 on 40133] ipc.Server: IPC 
Server handler 2 on 40133: 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
127.0.0.1:46570 Call#16 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
2016-08-20 21:57:42,952 DEBUG [IPC Server handler 2 on 40133] 
security.UserGroupInformation: PrivilegedAction 
as:appattempt_1471710454871_0001_01 (auth:TOKEN) 
from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2419)
2016-08-20 21:57:42,956 DEBUG [IPC Server handler 2 on 40133] 
rmcontainer.RMContainerImpl: Processing container_1471710454871_0001_01_04 
of type RELEASED
2016-08-20 21:57:42,957 INFO  [IPC Server handler 2 on 40133] 
rmcontainer.RMContainerImpl: container_1471710454871_0001_01_04 Container 
Transitioned from RUNNING to RELEASED
2016-08-20 21:57:42,958 INFO  [IPC Server handler 2 on 40133] 
resourcemanager.RMAuditLogger: USER=root  IP=127.0.0.1OPERATION=AM Released 
Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1471710454871_0001
CONTAINERID=container_1471710454871_0001_01_04  RESOURCE=
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher: Dispatching the event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.event.RMAppAttemptStatusupdateEvent.EventType:
 STATUS_UPDATE
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl: Processing event for 
appattempt_1471710454871_0001_01 of type STATUS_UPDATE
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher: Dispatching the event 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeCleanContainerEvent.EventType:
 CLEANUP_CONTAINER
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
rmnode.RMNodeImpl: Processing localhost:46667 of type CLEANUP_CONTAINER
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher: Dispatching the event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.event.RMAppAttemptContainerFinishedEvent.EventType:
 CONTAINER_FINISHED
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl: Processing event for 
appattempt_1471710454871_0001_01 of type CONTAINER_FINISHED
2016-08-20 21:57:42,958 DEBUG [IPC Server handler 2 on 40133] 
scheduler.SchedulerNode: Released container 
container_1471710454871_0001_01_04 of capacity  on 
host localhost:46667, which currently has 2 containers,  
used and  available, release resources=true
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.LeafQueue: User limit computation for root in queue default 
userLimitPercent=100 userLimitFactor=1.0 required:  
consumed:  user-limit-resource:  
queueCapacity:  qconsumed:  
consumedRatio: 0.0 currentCapacity:  activeUsers: 0 
clusterCapacity:  resourceByLabel:  usageratio: 0.25 Partition: 
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.LeafQueue: default used= numContainers=3 
user=root user-resources=
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.ParentQueue: completedContainer root: numChildQueue= 1, capacity=1.0, 
absoluteCapacity=1.0, usedResources=usedCapacity=0.25, 
numApps=1, numContainers=3, cluster=
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.ParentQueue: Re-sorting completed queue: default: capacity=1.0, 
absoluteCapacity=1.0, usedResources=, usedCapacity=0.25,

[jira] [Updated] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5537:
---
Attachment: YARN-5537.0001.patch

> Intermittent test failure of 
> TestAMRMClient#testAMRMClientWithContainerResourceChange
> -
>
> Key: YARN-5537
> URL: https://issues.apache.org/jira/browse/YARN-5537
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bibin A Chundatt
> Attachments: Failure.txt, YARN-5537.0001.patch
>
>
> Refer to test report 
> https://builds.apache.org/job/PreCommit-YARN-Build/12692/testReport/
> {noformat}
> Running org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.018 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 1.183 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1019)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:909)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429703#comment-15429703
 ] 

Bibin A Chundatt commented on YARN-5537:


*Analysis*
AM container allocated to node {{localhost:43931}} - container1
container2,container3,container4 allocated to node {{localhost:36489}}

Increase container request gets served once heartbeat is received from 
{{localhost:36489}}.
During random failure node heartbeat request is received from node 
{{localhost:36489}} during allocate request and increase request is served.

{noformat}
2016-08-20 21:57:42,952 DEBUG [IPC Server handler 2 on 40133] ipc.Server: IPC 
Server handler 2 on 40133: 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
127.0.0.1:46570 Call#16 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
2016-08-20 21:57:42,952 DEBUG [IPC Server handler 2 on 40133] 
security.UserGroupInformation: PrivilegedAction 
as:appattempt_1471710454871_0001_01 (auth:TOKEN) 
from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2419)
2016-08-20 21:57:42,956 DEBUG [IPC Server handler 2 on 40133] 
rmcontainer.RMContainerImpl: Processing container_1471710454871_0001_01_04 
of type RELEASED
2016-08-20 21:57:42,957 INFO  [IPC Server handler 2 on 40133] 
rmcontainer.RMContainerImpl: container_1471710454871_0001_01_04 Container 
Transitioned from RUNNING to RELEASED
2016-08-20 21:57:42,958 INFO  [IPC Server handler 2 on 40133] 
resourcemanager.RMAuditLogger: USER=root  IP=127.0.0.1OPERATION=AM Released 
Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1471710454871_0001
CONTAINERID=container_1471710454871_0001_01_04  RESOURCE=
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher: Dispatching the event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.event.RMAppAttemptStatusupdateEvent.EventType:
 STATUS_UPDATE
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl: Processing event for 
appattempt_1471710454871_0001_01 of type STATUS_UPDATE
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher: Dispatching the event 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeCleanContainerEvent.EventType:
 CLEANUP_CONTAINER
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
rmnode.RMNodeImpl: Processing localhost:46667 of type CLEANUP_CONTAINER
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher: Dispatching the event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.event.RMAppAttemptContainerFinishedEvent.EventType:
 CONTAINER_FINISHED
2016-08-20 21:57:42,958 DEBUG [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl: Processing event for 
appattempt_1471710454871_0001_01 of type CONTAINER_FINISHED
2016-08-20 21:57:42,958 DEBUG [IPC Server handler 2 on 40133] 
scheduler.SchedulerNode: Released container 
container_1471710454871_0001_01_04 of capacity  on 
host localhost:46667, which currently has 2 containers,  
used and  available, release resources=true
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.LeafQueue: User limit computation for root in queue default 
userLimitPercent=100 userLimitFactor=1.0 required:  
consumed:  user-limit-resource:  
queueCapacity:  qconsumed:  
consumedRatio: 0.0 currentCapacity:  activeUsers: 0 
clusterCapacity:  resourceByLabel:  usageratio: 0.25 Partition: 
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.LeafQueue: default used= numContainers=3 
user=root user-resources=
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.ParentQueue: completedContainer root: numChildQueue= 1, capacity=1.0, 
absoluteCapacity=1.0, usedResources=usedCapacity=0.25, 
numApps=1, numContainers=3, cluster=
2016-08-20 21:57:42,963 DEBUG [IPC Server handler 2 on 40133] 
capacity.ParentQueue: Re-sorting completed queue: default: capacity=1.0, 
absoluteCapacity=1.0, usedResources=, usedCapacity=0.25, 
absoluteUsedCapacity=0.25, numApps=1, numContainers=3
2016-08-20 21:57:42,967 WARN  [IPC Server handler 2 on 40133] 
scheduler.AbstractYarnScheduler: Error happens when checking increase request, 
Ignoring.. exception=
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Failed to 
get rmContainer for increase request, with 
container-id=container_1471710454871_0001_01_04
at

[jira] [Updated] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5537:
---
Attachment: Failure.txt

> Intermittent test failure of 
> TestAMRMClient#testAMRMClientWithContainerResourceChange
> -
>
> Key: YARN-5537
> URL: https://issues.apache.org/jira/browse/YARN-5537
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bibin A Chundatt
> Attachments: Failure.txt
>
>
> Refer to test report 
> https://builds.apache.org/job/PreCommit-YARN-Build/12692/testReport/
> {noformat}
> Running org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.018 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 1.183 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1019)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:909)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

2016-08-21 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-5537:
--

Assignee: Bibin A Chundatt

> Intermittent test failure of 
> TestAMRMClient#testAMRMClientWithContainerResourceChange
> -
>
> Key: YARN-5537
> URL: https://issues.apache.org/jira/browse/YARN-5537
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bibin A Chundatt
>
> Refer to test report 
> https://builds.apache.org/job/PreCommit-YARN-Build/12692/testReport/
> {noformat}
> Running org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.018 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 1.183 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1019)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:909)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-08-21 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429644#comment-15429644
 ] 

Sunil G commented on YARN-4945:
---

bq.My assertion is that regardless of what containers are already in the 
selectedCandidates list, the intra-queue preemption policy would always need to 
select more.
yes, I also meant that same. There are chances that intra-queue preemption 
logic may select a container whihc is already selected. So we will continue and 
deduct from intra-queue resourceToObtain and will continue. I added this point 
to emphasize that intra queue logic will not do anything to already selected 
container.

bq.we may want to consider intra-queue preemption configs for dead zone, 
natural completion,
Make sense. i will add this point

bq.Is this step calculating the total of preemptable resources for apps in this 
queue, per partition?
When we consider resource distribution in a queue, there can be resource over 
subscription consider the fact that there were no demand at that time when 
these resource were allocated to queue/app. Later at a point , few more apps 
came in and caused resource distribution variation based on priority or 
user-limit. In such cases, we will be  considering priority and user-limit  as 
separate. 
- priority : all pending resource requests for this app will become “resource 
to obtain for this app”
- user-limit: may be partial or full pending resource request will become  
“resource to obtain for this app”. This is depending on “user-limit_headroom - 
current_used”. This much can be considered as demand from this app.
I used pending because of the notion from scheduler. But in preemption world, 
that will be mapped to resourceTo Obtain. 

And yes, we consider this resourceToObtain per partition level and all 
calculations are done as per same.

bq.Is this saying that, when marking containers for preemption, if an app is 
under its user limit percent, its containers will not be marked?
I can clarify this. intra-queue preemption will first calculate 
resourceTOObtain from those apps which are of high priority (user-limit: those 
apps which are over-subscribing resource which crosses its user-limit-quota at 
given instance). From  these selected apps, we get how much pending  is there 
and thus will contribute as resourceToObtain (user-limit: in this case, we find 
those apps which are starving and not getting its user-limit-quota).

IN these cases, we will come across apps which is already met / more than its 
user-limit quota (for priority). So these apps will be skipped and it will be 
attribute to resourceToObtain.

bq.Perhaps these should be totally separate policies.
My idea is to come with IntraQueue framework and apply policies like priority 
and user-limit on top of that. So with this poc, i am coming with framework and 
priority preemption. user-limit can be be added as new policy on top of this 
framework. And it will be having the points which are mentioned by you. However 
for doc, it will be goof if we could have it common for priority and 
user-limit. And we can add the point which you have given in comment to doc. 
This will give a better insight for intra-queue preemption. Thoughts?

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5544) TestNodeBlacklistingOnAMFailures fails on trunk

2016-08-21 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429621#comment-15429621
 ] 

Varun Saxena commented on YARN-5544:


Looks likely.

> TestNodeBlacklistingOnAMFailures fails on trunk
> ---
>
> Key: YARN-5544
> URL: https://issues.apache.org/jira/browse/YARN-5544
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Sunil G
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.28 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures
> testNodeBlacklistingOnAMFailure(org.apache.hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures)
>   Time elapsed: 0.241 sec  <<< FAILURE!
> java.lang.AssertionError: AppAttemptState should still be SCHEDULED if 
> currentNode is blacklisted correctly expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailure(TestNodeBlacklistingOnAMFailures.java:110)
> {noformat}
> {noformat}
> java.lang.AssertionError: After blacklisting, AM should have run on the other 
> node expected:<127.0.0.2:2345> but was:<127.0.0.1:1234>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailure(TestNodeBlacklistingOnAMFailures.java:131)
> {noformat}
> https://builds.apache.org/job/PreCommit-YARN-Build/12840/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

[jira] [Commented] (YARN-3998) Add support in the NodeManager to re-launch containers

[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

[jira] [Commented] (YARN-3673) Create a FailoverProxy for Federation services

[jira] [Updated] (YARN-3673) Create a FailoverProxy for Federation services

[jira] [Commented] (YARN-5543) ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

[jira] [Updated] (YARN-5543) ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

[jira] [Comment Edited] (YARN-5540) Capacity Scheduler spends too much time looking at empty priorities

[jira] [Commented] (YARN-5540) Capacity Scheduler spends too much time looking at empty priorities

[jira] [Updated] (YARN-5540) Capacity Scheduler spends too much time looking at empty priorities

[jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information

[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

[jira] [Commented] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Created] (YARN-5546) NodeManager crashes due to SIGSEGV

[jira] [Commented] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

[jira] [Created] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

[jira] [Updated] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

[jira] [Updated] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Comment Edited] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Updated] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Commented] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Updated] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Assigned] (YARN-5537) Intermittent test failure of TestAMRMClient#testAMRMClientWithContainerResourceChange

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

[jira] [Commented] (YARN-5544) TestNodeBlacklistingOnAMFailures fails on trunk

31 matches

Site Navigation

Mail list logo

Footer information