[jira] [Commented] (YARN-4102) Add a "skip existing table" mode for timeline schema creator

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740238#comment-14740238
 ] 

Hadoop QA commented on YARN-4102:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  7s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 10s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 17s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 51s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 35s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m  0s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755269/YARN-4102-YARN-2928.004.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / e6afe26 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9085/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9085/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9085/console |


This message was automatically generated.

> Add a "skip existing table" mode for timeline schema creator
> 
>
> Key: YARN-4102
> URL: https://issues.apache.org/jira/browse/YARN-4102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4102-YARN-2928.001.patch, 
> YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch, 
> YARN-4102-YARN-2928.004.patch
>
>
> When debugging timeline POCs, we may need to create hbase tables that are 
> added in some ongoing patches. Right now, our schema creator will exit when 
> it hits one existing table. While this is a correct behavior with end users, 
> this introduces much trouble in debugging POCs: every time we have to disable 
> all existing tables, drop them, run the schema creator to generate all 
> tables, and regenerate all test data. 
> Maybe we'd like to add an "incremental" mode so that the creator will only 
> create non-existing tables? This is pretty handy in deploying our POCs. Of 
> course, consistency has to be kept in mind across tables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-10 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740221#comment-14740221
 ] 

Bibin A Chundatt commented on YARN-4126:


Hi [~jianhe]

The test classes required updation were as below 

TestClientRMService
TestRMDelegation
TokensTestRMWebServicesDelegationTokens

All testcases related to token were written in non secure mode.After the change 
made in fix for disabling token in non secure mode all token related test cases 
started failing.Because as per the new condition only in secure mode & 
AuthenticationMethod=kerberos will get the delegation token.

Also {{TestClientRMService}} contains testcases with non secure mode(12) and 
secure(8) mode. For testcases related to token  had to create state 
*Kerberos+Secured* . Method *initializeUserGroupSecureMode* was created for the 
same in which authentication Method as kerberos is set for 
{{UserGroupInformation}}.

*UserGroupInformation* state had to be set only for few test cases so 
initialization in before class was not a choice .
Any other suggestion if you are having please do share, will try my best .

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740214#comment-14740214
 ] 

Hadoop QA commented on YARN-2005:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 29s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 27s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m 27s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  54m 33s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 102m  1s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755286/YARN-2005.009.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f103a70 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9084/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9084/console |


This message was automatically generated.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch, YARN-2005.009.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740114#comment-14740114
 ] 

Hadoop QA commented on YARN-1651:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 19s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 23 new or modified test files. |
| {color:red}-1{color} | javac |   8m 18s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  11m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 30s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |  43m 12s | The patch has 177  line(s) 
that end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 48s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 46s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   9m 13s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 52s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | tools/hadoop tests |   1m  0s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |   6m 43s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  55m 30s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 171m 51s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient |
| Timed out tests | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755257/YARN-1651-6.YARN-1197.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-1197 / f86eae1 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/diffJavacWarnings.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9079/console |


This message was automatically generated.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-09-10 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740093#comment-14740093
 ] 

Jonathan Eagles commented on YARN-2513:
---

I'll need to have a look at this to see why two UI's don't work. 

> Host framework UIs in YARN for use with the ATS
> ---
>
> Key: YARN-2513
> URL: https://issues.apache.org/jira/browse/YARN-2513
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>  Labels: 2.6.1-candidate
> Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
> YARN-2513.v3.patch
>
>
> Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
> infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740069#comment-14740069
 ] 

Hadoop QA commented on YARN-4145:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   7m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m  4s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  74m  4s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755259/YARN-4145.001.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / f103a70 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9081/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9081/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9081/console |


This message was automatically generated.

> Make RMHATestBase abstract so its not run when running all tests under that 
> namespace
> -
>
> Key: YARN-4145
> URL: https://issues.apache.org/jira/browse/YARN-4145
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: YARN-4145.001.patch
>
>
> Make it abstract to avoid running it as a test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2015-09-10 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739974#comment-14739974
 ] 

Xianyin Xin commented on YARN-4120:
---

Link to YARN-4134, the two can be solved together.

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3985:

Attachment: YARN-3985.001.patch

Added a patch that calls into state store and unit test that verifies after 
recovery state the new RM gets the reservations saved from previous RM.

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-10 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739939#comment-14739939
 ] 

Anubhav Dhoot commented on YARN-3985:
-

Since updateReservation does an add and remove we do not have need to update 
reservation state in the state store. I can remove it if needed in either this 
or a separate patch.

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2005:

Attachment: YARN-2005.009.patch

Addressed feedback

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch, YARN-2005.009.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4102) Add a "skip existing table" mode for timeline schema creator

2015-09-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739910#comment-14739910
 ] 

Sangjin Lee commented on YARN-4102:
---

The latest patch (v.4) LGTM. Once jenkins is green, I'll commit it.

> Add a "skip existing table" mode for timeline schema creator
> 
>
> Key: YARN-4102
> URL: https://issues.apache.org/jira/browse/YARN-4102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4102-YARN-2928.001.patch, 
> YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch, 
> YARN-4102-YARN-2928.004.patch
>
>
> When debugging timeline POCs, we may need to create hbase tables that are 
> added in some ongoing patches. Right now, our schema creator will exit when 
> it hits one existing table. While this is a correct behavior with end users, 
> this introduces much trouble in debugging POCs: every time we have to disable 
> all existing tables, drop them, run the schema creator to generate all 
> tables, and regenerate all test data. 
> Maybe we'd like to add an "incremental" mode so that the creator will only 
> create non-existing tables? This is pretty handy in deploying our POCs. Of 
> course, consistency has to be kept in mind across tables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4102) Add a "skip existing table" mode for timeline schema creator

2015-09-10 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4102:

Attachment: YARN-4102-YARN-2928.004.patch

Sorry for the delay. Here's the updated patch. Thanks folks! 

> Add a "skip existing table" mode for timeline schema creator
> 
>
> Key: YARN-4102
> URL: https://issues.apache.org/jira/browse/YARN-4102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4102-YARN-2928.001.patch, 
> YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch, 
> YARN-4102-YARN-2928.004.patch
>
>
> When debugging timeline POCs, we may need to create hbase tables that are 
> added in some ongoing patches. Right now, our schema creator will exit when 
> it hits one existing table. While this is a correct behavior with end users, 
> this introduces much trouble in debugging POCs: every time we have to disable 
> all existing tables, drop them, run the schema creator to generate all 
> tables, and regenerate all test data. 
> Maybe we'd like to add an "incremental" mode so that the creator will only 
> create non-existing tables? This is pretty handy in deploying our POCs. Of 
> course, consistency has to be kept in mind across tables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-10 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3901:
-
Attachment: YARN-3901-YARN-2928.6.patch

Uploading patch v6 that addresses [~jrottinghuis] 's and [~sjlee0] 's 
discussion points about timestamp values being in milliseconds/nanoseconds. 

Each cell will now have a timestamp that will be multiplied with 1000. The 
timestamp of the cells in the flow run table will also include the last 3 
digits of the appId' id. That way we take care of collisions in this table.

The read function ColumnHelper#readResultsWithTimestamps function accordingly 
truncates the last 3 digits in the cell timestamp value. 

I checked that all the tests in timelineservice are passing. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing

2015-09-10 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739800#comment-14739800
 ] 

MENG DING commented on YARN-1644:
-

Hi, [~leftnoteasy]

There are 7 findbugs warnings, but they already existed before the patch. This 
patch does not generate new findbugs warnings. 

I took a quick look at some of the warnings:

* The warnings in {{NodeStatusPBImpl}} are most likely because 
getContainersUtilization/setContainersUtilization/getNodeUtilization/setNodeUtilization
 are not synchronized
* The warnings in {{WebServices}} are probably because of potential NPE.

I will open a ticket to fix them.
Meng

> RM-NM protocol changes and NodeStatusUpdater implementation to support 
> container resizing
> -
>
> Key: YARN-1644
> URL: https://issues.apache.org/jira/browse/YARN-1644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Fix For: YARN-1197
>
> Attachments: YARN-1644-YARN-1197.4.patch, 
> YARN-1644-YARN-1197.5.patch, YARN-1644-YARN-1197.6.patch, YARN-1644.1.patch, 
> YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace

2015-09-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4145:

Attachment: YARN-4145.001.patch

> Make RMHATestBase abstract so its not run when running all tests under that 
> namespace
> -
>
> Key: YARN-4145
> URL: https://issues.apache.org/jira/browse/YARN-4145
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: YARN-4145.001.patch
>
>
> Trivial patch to make it abstract



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace

2015-09-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4145:

Description: Make it abstract to avoid running it as a test  (was: Trivial 
patch to make it abstract)

> Make RMHATestBase abstract so its not run when running all tests under that 
> namespace
> -
>
> Key: YARN-4145
> URL: https://issues.apache.org/jira/browse/YARN-4145
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: YARN-4145.001.patch
>
>
> Make it abstract to avoid running it as a test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace

2015-09-10 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-4145:
---

 Summary: Make RMHATestBase abstract so its not run when running 
all tests under that namespace
 Key: YARN-4145
 URL: https://issues.apache.org/jira/browse/YARN-4145
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Minor


Trivial patch to make it abstract



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: (was: YARN-1651-6.YARN-1197.patch)

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-6.YARN-1197.patch

Found a test failure in ver.6 patch, removed/added the patch before anybody 
looking at it...

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-6.YARN-1197.patch

Uploaded ver.6 patch.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739742#comment-14739742
 ] 

Wangda Tan commented on YARN-1651:
--

Thanks review! [~jianhe].
bq. I think for now we can fail the allocate call explicitly on those very 
clear situations in checkAndNormalizeContainerChangeRequest ?, e.g. the 
situation that rmContainer doesn't exist That's more explicit to users. Digging 
through logs is not an easy thing for application writer.
Done, now we will check it in both AMS/Scheduler, exception will be thrown in 
AMS. Doing both check because AMS doesn't acquire scheduler lock, so it is 
still possbile that RMContainer state changed when adding to scheduler.

bq. RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list 
? and therefore NodeHeartBeatResponse and Impl change is not needed; similarly 
nmReportedIncreasedContainers can be a list.
This is to avoid AM decrease same container multiple times between same NM 
heartbeats, this is a rare edge case. Similar for NM reports 
increasedContainers, if we decouple NM heartbeat and scheduler allocation, we 
could have container increased multiple times between scheduler looks at NM.

bq. When decreasing a container, should it send RMNodeDecreaseContainerEvent 
too ?
Done, added test to confirm this as well.

bq. looks like when decreasing reservedIncreasedContainer, it will unreserve 
the whole extra reserved resource, should it only unreserve the extra resources 
being decresed ?
Decrease container is decrease resource of a container to lower than confirmed 
resource. If a container is 2G, AM asks to increase to 4G, it can only decrease 
it to less than 2G before increase issued. So I think we need to unreserve the 
whole extra reserved resource.

bq. In general, I think we should be able to decrease/increase a regular 
reserved container or a increasedReservedContainer ?
Container reservation is an internal state of scheduler, AM doesn't know about 
the reserved container at know, so far I think we don't need to expose that to 
user.

bq. allocate call is specifically marked as noLock, but now every allocate call 
holds the global scheduler lock which is too expensive. we can move 
decreaseContainer to application itself.
DecreaseContainer is as same as completedContainer, both acquire scheduler lock 
and queue lock. I think we can optimize it in the future, which we can add them 
to something like "pendingReleased" list, and will be traversed periodically.
I added comments to CS#allocate to explain about this, the "NoLock" is not 100% 
acurate.

And addressed all other comments.

[~mding]
Comment addressed.


> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3975) WebAppProxyServlet should not redirect to RM page if AHS is enabled

2015-09-10 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-3975:

Attachment: YARN-3975.7.patch

[~jlowe] Thanks for taking a look.
I have updated the patch and incorporated your comments. Can you please have 
another look?

> WebAppProxyServlet should not redirect to RM page if AHS is enabled
> ---
>
> Key: YARN-3975
> URL: https://issues.apache.org/jira/browse/YARN-3975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-3975.2.b2.patch, YARN-3975.3.patch, 
> YARN-3975.4.patch, YARN-3975.5.patch, YARN-3975.6.patch, YARN-3975.7.patch
>
>
> WebAppProxyServlet should be updated to handle the case when the appreport 
> doesn't have a tracking URL and the Application History Server is eanbled.
> As we would have already tried the RM and got the 
> ApplicationNotFoundException we should not direct the user to the RM app page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-10 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739708#comment-14739708
 ] 

Anubhav Dhoot commented on YARN-2005:
-

Added YARN-4144 to add the node that causes LaunchFailedTransition also to the 
AM blacklist

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4144) Add NM that causes LaunchFailedTransition to blacklist

2015-09-10 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-4144:
---

 Summary: Add NM that causes LaunchFailedTransition to blacklist
 Key: YARN-4144
 URL: https://issues.apache.org/jira/browse/YARN-4144
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


During discussion of YARN-2005 we need to add more cases where blacklisting can 
occur. This tracks adding any failures in launch via LaunchFailedTransition to 
also contribute to blacklisting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-10 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739700#comment-14739700
 ] 

Anubhav Dhoot commented on YARN-2005:
-

[~sunilg]  thats a good suggestion. Added a followup for this YARN-4143

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-10 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-4143:
---

 Summary: Optimize the check for AMContainer allocation needed by 
blacklisting and ContainerType
 Key: YARN-4143
 URL: https://issues.apache.org/jira/browse/YARN-4143
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


In YARN-2005 there are checks made to determine if the allocation is for an AM 
container. This happens in every allocate call and should be optimized away 
since it changes only once per SchedulerApplicationAttempt





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-4143:
---

Assignee: Anubhav Dhoot

> Optimize the check for AMContainer allocation needed by blacklisting and 
> ContainerType
> --
>
> Key: YARN-4143
> URL: https://issues.apache.org/jira/browse/YARN-4143
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> In YARN-2005 there are checks made to determine if the allocation is for an 
> AM container. This happens in every allocate call and should be optimized 
> away since it changes only once per SchedulerApplicationAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing

2015-09-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739679#comment-14739679
 ] 

Wangda Tan commented on YARN-1644:
--

[~mding], could you take a look at findbugs? I can reproduce it locally.

You can run "mvn clean findbugs:findbugs" under yarn-server-common. Please open 
a ticket to track the findbugs fix if you can reproduce it.

> RM-NM protocol changes and NodeStatusUpdater implementation to support 
> container resizing
> -
>
> Key: YARN-1644
> URL: https://issues.apache.org/jira/browse/YARN-1644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Fix For: YARN-1197
>
> Attachments: YARN-1644-YARN-1197.4.patch, 
> YARN-1644-YARN-1197.5.patch, YARN-1644-YARN-1197.6.patch, YARN-1644.1.patch, 
> YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739642#comment-14739642
 ] 

Hudson commented on YARN-4106:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #354 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/354/])
YARN-4106. NodeLabels for NM in distributed mode is not updated even after 
clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 
77666105b4557d5706e5844a4ca286917d966c5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739625#comment-14739625
 ] 

Hudson commented on YARN-4106:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2293 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2293/])
YARN-4106. NodeLabels for NM in distributed mode is not updated even after 
clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 
77666105b4557d5706e5844a4ca286917d966c5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java


> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-10 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739618#comment-14739618
 ] 

Anubhav Dhoot commented on YARN-2005:
-

[~He Tianyi] yes we are using the ContainerExitStatus in this. We can refine 
the conditions in a followup if needed.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-10 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739614#comment-14739614
 ] 

Anubhav Dhoot commented on YARN-2005:
-

Hi [~kasha] thanks for your comments.

2.4 - we do not need to update the systemBlacklist as its updated by the 
RMAppAttemptImpl#ScheduleTransition call every time to the complete list. 
11, 12 - The changes were needed because now we need a valid submission context 
for isWaitingForAMContainer.
9 - Is needed by the new test added in TestAMRestart.
8.3 - Yes i can file a follow up for that
Addressed rest of them

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-09-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3700:
--
Attachment: YARN-3700-branch-2.7.2.txt

Attaching 2.7.2 patch that I committed.

> ATS Web Performance issue at load time when large number of jobs
> 
>
> Key: YARN-3700
> URL: https://issues.apache.org/jira/browse/YARN-3700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: YARN-3700-branch-2.6.1.txt, YARN-3700-branch-2.7.2.txt, 
> YARN-3700.1.patch, YARN-3700.2.1.patch, YARN-3700.2.2.patch, 
> YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch
>
>
> Currently, we will load all the apps when we try to load the yarn 
> timelineservice web page. If we have large number of jobs, it will be very 
> slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-09-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3700:
--
Fix Version/s: 2.7.2

Just pulled this into branch-2.7 (release 2.7.2) as it already exists in 2.6.1.

branch-2 patch had merge conflicts. Ran compilation and 
TestApplicationHistoryClientService, 
TestApplicationHistoryManagerOnTimelineStore before the push.

> ATS Web Performance issue at load time when large number of jobs
> 
>
> Key: YARN-3700
> URL: https://issues.apache.org/jira/browse/YARN-3700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: YARN-3700-branch-2.6.1.txt, YARN-3700.1.patch, 
> YARN-3700.2.1.patch, YARN-3700.2.2.patch, YARN-3700.2.patch, 
> YARN-3700.3.patch, YARN-3700.4.patch
>
>
> Currently, we will load all the apps when we try to load the yarn 
> timelineservice web page. If we have large number of jobs, it will be very 
> slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739454#comment-14739454
 ] 

Hudson commented on YARN-4106:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2316 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2316/])
YARN-4106. NodeLabels for NM in distributed mode is not updated even after 
clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 
77666105b4557d5706e5844a4ca286917d966c5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt


> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so

2015-09-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2890:
--
Fix Version/s: 2.7.2

Just pulled this into branch-2.7 (release 2.7.2) as it already exists in 2.6.1.

branch-2 patch applies cleanly. Ran compilation and TestJobHistoryEventHandler, 
TestMRTimelineEventHandling, TestDistributedShell, TestMiniYarnCluster before 
the push.

> MiniYarnCluster should turn on timeline service if configured to do so
> --
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
> YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch, YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739428#comment-14739428
 ] 

Hadoop QA commented on YARN-4141:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  1 
new checkstyle issues (total was 33, now 34). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 37s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 31s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755171/0002-YARN-4141.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7766610 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9078/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9078/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9078/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9078/console |


This message was automatically generated.

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739383#comment-14739383
 ] 

Hudson commented on YARN-4106:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1106 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1106/])
YARN-4106. NodeLabels for NM in distributed mode is not updated even after 
clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 
77666105b4557d5706e5844a4ca286917d966c5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java
* hadoop-yarn-project/CHANGES.txt


> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739368#comment-14739368
 ] 

Sunil G commented on YARN-4140:
---

Yes [~leftnoteasy]
Thank you clarifying the same. This makes sense. 

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4142) add a way for an attempt to report an attempt failure

2015-09-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739361#comment-14739361
 ] 

Sunil G commented on YARN-4142:
---

Hi [~steve_l]
I have a doubt here. In NodeManager {{ContainerImpl}}, we set  diagnostics and 
exitcode for few error cases. So here  "application explicitly terminates an 
attempt" means AM kills by itself for some reasons, or AM container/attempt is 
killed by cli command.

> add a way for an attempt to report an attempt failure
> -
>
> Key: YARN-4142
> URL: https://issues.apache.org/jira/browse/YARN-4142
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>
> Currently AMs can report a failure with exit code and diagnostics text —but 
> only when exiting to a failed state. If the AM terminates for any other 
> reason there's no information held in the RM, just the logs somewhere —and we 
> know they don't always last.
> When an application explicitly terminates an attempt, it would be nice if it 
> could  optionally report something to the RM before it exited. The most 
> recent set of these could then be included in Application Reports, so 
> allowing client apps to count attempt failures and get exit details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739349#comment-14739349
 ] 

Wangda Tan commented on YARN-4140:
--

We force client doesn't set node-label expression (in YARN-2694) because we 
don't want client set different node-label-expression for different 
resourceName in a same priority (for priority=2, "rack-1"'s 
node-label-expression="x", but "*"'s node-label-expression="y"). Remember we 
count pendingResource by using "*" of each priority.

But we can normalize node-label-expression once they sent to scheduler.

Make sense?

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was s

[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739309#comment-14739309
 ] 

Hudson commented on YARN-4106:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #374 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/374/])
YARN-4106. NodeLabels for NM in distributed mode is not updated even after 
clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 
77666105b4557d5706e5844a4ca286917d966c5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java


> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4142) add a way for an attempt to report an attempt failure

2015-09-10 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4142:


 Summary: add a way for an attempt to report an attempt failure
 Key: YARN-4142
 URL: https://issues.apache.org/jira/browse/YARN-4142
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.8.0
Reporter: Steve Loughran


Currently AMs can report a failure with exit code and diagnostics text —but 
only when exiting to a failed state. If the AM terminates for any other reason 
there's no information held in the RM, just the logs somewhere —and we know 
they don't always last.

When an application explicitly terminates an attempt, it would be nice if it 
could  optionally report something to the RM before it exited. The most recent 
set of these could then be included in Application Reports, so allowing client 
apps to count attempt failures and get exit details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739290#comment-14739290
 ] 

Hudson commented on YARN-4106:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #368 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/368/])
YARN-4106. NodeLabels for NM in distributed mode is not updated even after 
clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 
77666105b4557d5706e5844a4ca286917d966c5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java


> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739282#comment-14739282
 ] 

Sunil G commented on YARN-4140:
---

HI [~leftnoteasy]

I have a doubt here. node-label expression is set in ANY by AM. Any reason why 
its not updated for node-local and rack-local there itself. Could you pls help 
to clarify.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4102) Add a "skip existing table" mode for timeline schema creator

2015-09-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739270#comment-14739270
 ] 

Sangjin Lee commented on YARN-4102:
---

Hi [~gtCarrera9], it looks good to me too. Do you mind fixing that one little 
checkstyle issue, though? Then I think we can commit this.

> Add a "skip existing table" mode for timeline schema creator
> 
>
> Key: YARN-4102
> URL: https://issues.apache.org/jira/browse/YARN-4102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4102-YARN-2928.001.patch, 
> YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch
>
>
> When debugging timeline POCs, we may need to create hbase tables that are 
> added in some ongoing patches. Right now, our schema creator will exit when 
> it hits one existing table. While this is a correct behavior with end users, 
> this introduces much trouble in debugging POCs: every time we have to disable 
> all existing tables, drop them, run the schema creator to generate all 
> tables, and regenerate all test data. 
> Maybe we'd like to add an "incremental" mode so that the creator will only 
> create non-existing tables? This is pretty handy in deploying our POCs. Of 
> course, consistency has to be kept in mind across tables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API

2015-09-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739204#comment-14739204
 ] 

Naganarasimha G R commented on YARN-3717:
-

Thanks for the feedback [~leftnoteasy],
bq. We can improve this in later patches.
Ok will set in CLI and Webui as "" and REST will have null

bq. This is more important to me, now we cannot do this through REST API, which 
will block effort of YARN-3368 to support showing labels metrics as well.
Will immediately start working on this after I finish this jira.


> Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739172#comment-14739172
 ] 

Hudson commented on YARN-4106:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8430 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8430/])
YARN-4106. NodeLabels for NM in distributed mode is not updated even after 
clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 
77666105b4557d5706e5844a4ca286917d966c5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-09-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739121#comment-14739121
 ] 

Sunil G commented on YARN-4091:
---

Thank you [~leftnoteasy] for sharing the thoughts.

Yes. the REST framework looks fine. But after the first response update as 
"pending fetching", a second REST query has to be done to see the real result. 
Or we can dump this information as logs. I feel getting information back as 
REST o/p is more better and we utilize this framework in new UI.  Hence timing 
of the second REST query is important as the intended node heartbeat has to 
happen (or by the time query comes, more heartbeats from same node would have 
come). Showing an aggregate debug information till second query is good, but I 
fear about the load on RM and the data produced. With a timelimit (or min count 
of number of heartbeats to debug) can help in this case. Thoughts?

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Improvement on debugdiagnostic information - YARN.pdf
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739113#comment-14739113
 ] 

Wangda Tan commented on YARN-4106:
--

bq. ... may be Labels Manager can support additional method which adds the 
missing labels first and then updates the mapping
Doing this could be hard to manage: for example, how to deal with node label 
removal, you can do that when reference count of a label becomes zero, but 
resource request could be rejected if we remove a existing label. I would not 
prefer to add more flexibility to node partition, since it will likely break 
something.

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-10 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4141:
--
Attachment: 0002-YARN-4141.patch

Meantime, attaching a new patch by addressing point 1. We will wait for input 
for point 2.

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739097#comment-14739097
 ] 

Naganarasimha G R commented on YARN-4106:
-

bq.  Without a centralized node partition collection, capacity planning will be 
not straightforward.
yes idea is to still have this collection but only difference being if the 
labels sent from NM is not present @ RM, then may be Labels Manager can support 
additional method which adds the missing labels first and then updates the 
mapping. thoughts?
Yes it makes more sense when we support node constraint, but if we want to 
support more flexibility then we can think of supporting this too.

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739078#comment-14739078
 ] 

Wangda Tan commented on YARN-4106:
--

[~Naganarasimha], 
bq. Wangda Tan As far as this patch its fine but was wondering to increase 
usability do we need to support YARN-2728, Support for disabling the 
Centralized NodeLabel validation in Distributed Node Label Configuration setup ?
Since we only support node partition, and node partition relates to capacity 
planning, etc. Without a centralized node partition collection, capacity 
planning will be not straightforward.

I think YARN-2728 makes more sense when we support node constraint.

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739074#comment-14739074
 ] 

Sunil G commented on YARN-4141:
---

HI [~rohithsharma]
Thank you for the comments. I have one input for second comment. 
As we are not updating priority here, its not success, correct?. Hence I put as 
failure.

How do u feel?

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739064#comment-14739064
 ] 

Naganarasimha G R commented on YARN-4106:
-

+1, Applied the latest patch, ran the test cases and applied YARN-2729 on top 
of this patch and script was also running successfully. Latest patch LGTM. 
[~leftnoteasy] As far as this patch its fine but was wondering to increase 
usability do we need to support YARN-2728, ??Support for disabling the 
Centralized NodeLabel validation in Distributed Node Label Configuration 
setup?? ?

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739052#comment-14739052
 ] 

Wangda Tan commented on YARN-4106:
--

+1 to latest patch. 

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API

2015-09-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739046#comment-14739046
 ] 

Wangda Tan commented on YARN-3717:
--

[~Naganarasimha],
bq. Well for a naive user will atleast know to what to look at. Or how about 
your idea of (For example, showing queue's label when the app's label doesn't 
set, etc.)
I think we should do this -- "showing queue's label when the app's label 
doesn't set..", but I think this may need some effort, I haven't thought about 
it, it may need some changes in the scheduler side so RMApp can get label of 
queue. We can improve this in later patches.

bq. Ok, Will raise and start working on them. please inform if the priorty is 
more so that can finish them faster.
This is more important to me, now we cannot do this through REST API, which 
will block effort of YARN-3368 to support showing labels metrics as well.

Thanks,

> Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739036#comment-14739036
 ] 

Vrushali C commented on YARN-3901:
--

Hi [~gtCarrera9]

The start and end times for a flow run can be evaluated if you have all the 
known start times and end times of applications in that flow run and min/max 
timestamp can be evaluated. Hence this can be determined from the flow run 
table. But in the flow activity table, the purpose is to note that a flow was 
"active" on that day, meaning an application in that flow either started, 
completed or was running on that day. So when Joep and I had reviewed my patch 
together we realized that calculating the min/max in the flow activity table 
wont work for apps that span day boundaries and so in his comment on Aug 29th, 
there is a note "No timestamp needed in FlowActivity table. Runs can start one 
day and end another. Probably start without, add later if needed." That meant 
we did not need the coprocessor to determine min or max in the flow activity 
table. Hence I removed it.

HTH
Vrushali



> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738973#comment-14738973
 ] 

Jason Lowe commented on YARN-2410:
--

+1 for the latest patch.  Committing this.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v10.patch, 
> YARN-2410-v11.patch, YARN-2410-v2.patch, YARN-2410-v3.patch, 
> YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch, 
> YARN-2410-v7.patch, YARN-2410-v8.patch, YARN-2410-v9.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738865#comment-14738865
 ] 

Hadoop QA commented on YARN-4131:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 19s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 44s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 13s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests | 107m 50s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  1s | Tests passed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   2m  2s | Tests failed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 38s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  54m 21s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 233m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754928/YARN-4131-v1.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7b5b2c5 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9077/console |


This message was automatically generated.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738830#comment-14738830
 ] 

MENG DING commented on YARN-1651:
-

Hi, [~leftnoteasy]

One comment I forgot to post is that we may want to synchronize the 
RMContainerImpl.getAllocatedResource() call? Because the container resource may 
be updated at any time, e.g:
{code:title=RMContainerImpl.java}
   @Override
   public Resource getAllocatedResource() {
-return container.getResource();
+try {
+  readLock.lock();
+  return Resources.clone(container.getResource());
+} finally {
+  readLock.unlock();
+}
   }
{code}

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority

2015-09-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738821#comment-14738821
 ] 

Sunil G commented on YARN-4068:
---

Thank you very much [~Naganarasimha Garla].

> Support appUpdated event in TimelineV2 to publish details for movetoqueue, 
> change in priority
> -
>
> Key: YARN-4068
> URL: https://issues.apache.org/jira/browse/YARN-4068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sunil G
>Assignee: Sunil G
>
> YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to 
> track and port appUpdated changes in V2 for
> - movetoqueue
> - updateAppPriority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-10 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738773#comment-14738773
 ] 

Nathan Roberts commented on YARN-2410:
--

Thanks for the additional code comments. +1


> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v10.patch, 
> YARN-2410-v11.patch, YARN-2410-v2.patch, YARN-2410-v3.patch, 
> YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch, 
> YARN-2410-v7.patch, YARN-2410-v8.patch, YARN-2410-v9.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-09-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738728#comment-14738728
 ] 

Junping Du commented on YARN-313:
-

Thanks [~elgoiri] for updating the patch! Current patch LGTM in overall but 
just a few NITs:
1. After think again, we should mark new added API as Evolving instead of 
Stable, like: RefreshResourcesRequest, RefreshResourcesResponse.
2. Tests for PB implementation of RefreshResourcesRequest and 
RefreshResourcesResponse needed to be added to TestPBImplRecords.java like 
other protocol records.
3. Fix checkstyle issues reported by Jenkins (ignore the first one as we can do 
nothing on this).

> Add Admin API for supporting node resource configuration in command line
> 
>
> Key: YARN-313
> URL: https://issues.apache.org/jira/browse/YARN-313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Junping Du
>Assignee: Inigo Goiri
>Priority: Critical
> Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
> YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
> YARN-313-v6.patch, YARN-313-v7.patch, YARN-313-v8.patch, YARN-313-v9.patch
>
>
> We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" 
> to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare

2015-09-10 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4134:
--
Attachment: YARN-4134.003.patch

A tiny fix.

> FairScheduler preemption stops at queue level that all child queues are not 
> over their fairshare
> 
>
> Key: YARN-4134
> URL: https://issues.apache.org/jira/browse/YARN-4134
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4134.001.patch, YARN-4134.002.patch, 
> YARN-4134.003.patch
>
>
> Now FairScheudler uses a choose-a-candidate method to select a container from 
> leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}},
> {code}
> readLock.lock();
> try {
>   for (FSQueue queue : childQueues) {
> if (candidateQueue == null ||
> comparator.compare(queue, candidateQueue) > 0) {
>   candidateQueue = queue;
> }
>   }
> } finally {
>   readLock.unlock();
> }
> // Let the selected queue choose which of its container to preempt
> if (candidateQueue != null) {
>   toBePreempted = candidateQueue.preemptContainer();
> }
> {code}
> a candidate child queue is selected. However, if the queue's usage isn't over 
> it's fairshare, preemption will not happen:
> {code}
> if (!preemptContainerPreCheck()) {
>   return toBePreempted;
> }
> {code}
>  A scenario:
> {code}
> root
>/\
>   queue1   queue2
>/\
>   queue2.3, (  queue2.4  )
> {code}
> suppose there're 8 containers, and queues at any level have the same weight. 
> queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their 
> fairshare. Now we submit an app in queue2.4 with 4 containers needs, it 
> should preempt 2 from queue2.3, but the candidate-containers selection 
> procedure will stop at queue1, so none of the containers will be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2609) Example of use for the ReservationSystem

2015-09-10 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738670#comment-14738670
 ] 

Bibin A Chundatt commented on YARN-2609:


Minor comment from my side.
If parameters not passed.
{code}
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.hadoop.examples.ReservationClientDemo.run(ReservationClientDemo.java:95)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
{code}
# Parameter check can be done
# Usage of example class too will be good to add

> Example of use for the ReservationSystem
> 
>
> Key: YARN-2609
> URL: https://issues.apache.org/jira/browse/YARN-2609
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Minor
> Attachments: YARN-2609.docx, YARN-2609.patch
>
>
> This JIRA provides a simple new example in mapreduce-examples that request a 
> reservation and submit a Pi computation in the reservation. This is meant 
> just to show how to interact with the reservation system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare

2015-09-10 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4134:
--
Attachment: YARN-4134.002.patch

Remove testing remnant.

> FairScheduler preemption stops at queue level that all child queues are not 
> over their fairshare
> 
>
> Key: YARN-4134
> URL: https://issues.apache.org/jira/browse/YARN-4134
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4134.001.patch, YARN-4134.002.patch
>
>
> Now FairScheudler uses a choose-a-candidate method to select a container from 
> leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}},
> {code}
> readLock.lock();
> try {
>   for (FSQueue queue : childQueues) {
> if (candidateQueue == null ||
> comparator.compare(queue, candidateQueue) > 0) {
>   candidateQueue = queue;
> }
>   }
> } finally {
>   readLock.unlock();
> }
> // Let the selected queue choose which of its container to preempt
> if (candidateQueue != null) {
>   toBePreempted = candidateQueue.preemptContainer();
> }
> {code}
> a candidate child queue is selected. However, if the queue's usage isn't over 
> it's fairshare, preemption will not happen:
> {code}
> if (!preemptContainerPreCheck()) {
>   return toBePreempted;
> }
> {code}
>  A scenario:
> {code}
> root
>/\
>   queue1   queue2
>/\
>   queue2.3, (  queue2.4  )
> {code}
> suppose there're 8 containers, and queues at any level have the same weight. 
> queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their 
> fairshare. Now we submit an app in queue2.4 with 4 containers needs, it 
> should preempt 2 from queue2.3, but the candidate-containers selection 
> procedure will stop at queue1, so none of the containers will be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare

2015-09-10 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4134:
--
Attachment: YARN-4134.001.patch

Upload a patch for preview.

> FairScheduler preemption stops at queue level that all child queues are not 
> over their fairshare
> 
>
> Key: YARN-4134
> URL: https://issues.apache.org/jira/browse/YARN-4134
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4134.001.patch
>
>
> Now FairScheudler uses a choose-a-candidate method to select a container from 
> leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}},
> {code}
> readLock.lock();
> try {
>   for (FSQueue queue : childQueues) {
> if (candidateQueue == null ||
> comparator.compare(queue, candidateQueue) > 0) {
>   candidateQueue = queue;
> }
>   }
> } finally {
>   readLock.unlock();
> }
> // Let the selected queue choose which of its container to preempt
> if (candidateQueue != null) {
>   toBePreempted = candidateQueue.preemptContainer();
> }
> {code}
> a candidate child queue is selected. However, if the queue's usage isn't over 
> it's fairshare, preemption will not happen:
> {code}
> if (!preemptContainerPreCheck()) {
>   return toBePreempted;
> }
> {code}
>  A scenario:
> {code}
> root
>/\
>   queue1   queue2
>/\
>   queue2.3, (  queue2.4  )
> {code}
> suppose there're 8 containers, and queues at any level have the same weight. 
> queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their 
> fairshare. Now we submit an app in queue2.4 with 4 containers needs, it 
> should preempt 2 from queue2.3, but the candidate-containers selection 
> procedure will stop at queue1, so none of the containers will be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738637#comment-14738637
 ] 

Jian He edited comment on YARN-4126 at 9/10/15 12:15 PM:
-

[~bibinchundatt], thanks for working on this !
calling initializeUserGroupSecureMode everywhere in all related test cases does 
not seem like an elegant solution. why is this call needed?
Could you do it in a more clean way?


was (Author: jianhe):
[~bibindeve...@gmail.com], thanks for working on this !
calling initializeUserGroupSecureMode everywhere in all related test cases does 
not seem like an elegant solution. why is this call needed?
Could you do it in a more clean way?

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738637#comment-14738637
 ] 

Jian He commented on YARN-4126:
---

[~bibindeve...@gmail.com], thanks for working on this !
calling initializeUserGroupSecureMode everywhere in all related test cases does 
not seem like an elegant solution. why is this call needed?
Could you do it in a more clean way?

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738611#comment-14738611
 ] 

Hadoop QA commented on YARN-4081:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 18s | Pre-patch YARN-3926 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 32s | The applied patch generated  1 
new checkstyle issues (total was 10, now 3). |
| {color:green}+1{color} | whitespace |   0m  7s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 38s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755106/YARN-4081-YARN-3926.008.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-3926 / 1dbd8e3 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9076/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9076/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9076/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9076/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9076/console |


This message was automatically generated.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch, 
> YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, 
> YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, 
> YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch, 
> YARN-4081-YARN-3926.008.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738588#comment-14738588
 ] 

Junping Du commented on YARN-4131:
--

I think we need a coordination between YARN-445 and YARN-4131 work. May be an 
offline call meeting could be more feasible. Will send invitation to related 
people.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738568#comment-14738568
 ] 

Jian He edited comment on YARN-1651 at 9/10/15 10:45 AM:
-

bq. I think we may need add such information to AMRMProtocol to make sure AM 
will be notified. For now, we can keep them as-is. Users can still get such 
information from RM logs.
I think for now we can fail the allocate call explicitly on those very clear 
situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation 
that rmContainer doesn't exist That's more explicit to users. Digging through 
logs is not an easy thing for application writer.

thanks for updating, Wangda ! some more comments focusing on decreasing code 
path.

- this may be not correct, because reserve event can happen on RESERVE state 
too, i.e. reReservation
{code}
  if (container.getState() != RMContainerState.NEW) {
container.hasIncreaseReservation = true;
  }
{code}
 - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? 
and therefore NodeHeartBeatResponse and Impl change is not needed; similarly 
nmReportedIncreasedContainers can be a list.
 - When decreasing a container, should it send RMNodeDecreaseContainerEvent too 
?
 - revert ContainerManagerImpl change
 - Remove SchedulerApplicationAttempt#getIncreaseRequests
 - In AbstractYarnScheduler#deceraseContainers() move 
checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same 
place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for 
consistency.
- this if condition is not needed.
{code}
  public boolean unreserve(Priority priority,
  FiCaSchedulerNode node, RMContainer rmContainer) {
if (rmContainer.hasIncreaseReservation()) {
  rmContainer.cancelIncreaseReservation();
}
{code}
 - looks like when decreasing reservedIncreasedContainer, it will unreserve the 
*whole* extra reserved resource, should it only unreserve the extra resources 
being decresed ?
 - In general, I think we should be able to decrease/increase a regular 
reserved container or a increasedReservedContainer ? 
- In ParentQueue, this null check is not needed.
{code}
  @Override
  public void decreaseContainer(Resource clusterResource,
  SchedContainerChangeRequest decreaseRequest,
  FiCaSchedulerApp app) {
if (app != null) {
{code}

- allocate call is specifically marked as noLock, but now every allocate call 
holds the global scheduler lock which is too expensive. we can move 
decreaseContainer to application itself.  
{code}   protected synchronized void decreaseContainer( {code}
It is also now holding queue Lock on allocate, which is also expensive, because 
that means a bunch of  AMs calling allocate very frequently can effectively 
block queue's execuation.  


was (Author: jianhe):
bq. I think we may need add such information to AMRMProtocol to make sure AM 
will be notified. For now, we can keep them as-is. Users can still get such 
information from RM logs.
I think for now we can fail the allocate call explicitly on those very clear 
situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation 
that rmContainer doesn't exist That's more explicit to users. Digging through 
logs is not an easy thing for application writer.

thanks for updating, Wangda ! some more comments focusing on decreasing code 
path.

- this may be not correct, because reserve event can happen on RESERVE state 
too, i.e. reReservation
{code}
  if (container.getState() != RMContainerState.NEW) {
container.hasIncreaseReservation = true;
  }
{code}
 - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? 
and therefore NodeHeartBeatResponse and Impl change is not needed; similarly 
nmReportedIncreasedContainers can be a list.
 - When decreasing a container, should it send RMNodeDecreaseContainerEvent too 
?
 - revert ContainerManagerImpl change
 - Remove SchedulerApplicationAttempt#getIncreaseRequests
 - In AbstractYarnScheduler#deceraseContainers() move 
checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same 
place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for 
consistency.
- this if condition is not needed.
{code}
  public boolean unreserve(Priority priority,
  FiCaSchedulerNode node, RMContainer rmContainer) {
if (rmContainer.hasIncreaseReservation()) {
  rmContainer.cancelIncreaseReservation();
}
{code}
 - looks like when decreasing reservedIncreasedContainer, it will unreserve the 
*whole* extra reserved resource, should it only unreserve the extra resources 
being decresed ?
 - In general, I think we should be able to decrease/increase a regular 
reserved container or a increasedReservedContainer ? 
- In ParentQueue, this null check is not needed.
{code}
  @Override
  public void decreaseContain

[jira] [Comment Edited] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738568#comment-14738568
 ] 

Jian He edited comment on YARN-1651 at 9/10/15 10:46 AM:
-

bq. I think we may need add such information to AMRMProtocol to make sure AM 
will be notified. For now, we can keep them as-is. Users can still get such 
information from RM logs.
I think for now we can fail the allocate call explicitly on those very clear 
situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation 
that rmContainer doesn't exist That's more explicit to users. Digging through 
logs is not an easy thing for application writer.

thanks for updating, Wangda ! some more comments focusing on decreasing code 
path.

- this may be not correct, because reserve event can happen on RESERVE state 
too, i.e. reReservation
{code}
  if (container.getState() != RMContainerState.NEW) {
container.hasIncreaseReservation = true;
  }
{code}
 - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? 
and therefore NodeHeartBeatResponse and Impl change is not needed; similarly 
nmReportedIncreasedContainers can be a list.
 - When decreasing a container, should it send RMNodeDecreaseContainerEvent too 
?
 - revert ContainerManagerImpl change
 - Remove SchedulerApplicationAttempt#getIncreaseRequests
 - In AbstractYarnScheduler#deceraseContainers() move 
checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same 
place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for 
consistency.
- this if condition is not needed.
{code}
  public boolean unreserve(Priority priority,
  FiCaSchedulerNode node, RMContainer rmContainer) {
if (rmContainer.hasIncreaseReservation()) {
  rmContainer.cancelIncreaseReservation();
}
{code}
 - looks like when decreasing reservedIncreasedContainer, it will unreserve the 
*whole* extra reserved resource, should it only unreserve the extra resources 
being decresed ?
 - In general, I think we should be able to decrease/increase a regular 
reserved container or a increasedReservedContainer ? 
- In ParentQueue, this null check is not needed.
{code}
  @Override
  public void decreaseContainer(Resource clusterResource,
  SchedContainerChangeRequest decreaseRequest,
  FiCaSchedulerApp app) {
if (app != null) {
{code}

- allocate call is specifically marked as noLock, but now every allocate call 
holds the global scheduler lock which is too expensive. we can move 
decreaseContainer to application itself.  
{code}   protected synchronized void decreaseContainer( {code}
It is also now holding queue Lock on allocate, which is also expensive, because 
that means a bunch of  AMs calling allocate very frequently can effectively 
block the queues'  execuation.  


was (Author: jianhe):
bq. I think we may need add such information to AMRMProtocol to make sure AM 
will be notified. For now, we can keep them as-is. Users can still get such 
information from RM logs.
I think for now we can fail the allocate call explicitly on those very clear 
situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation 
that rmContainer doesn't exist That's more explicit to users. Digging through 
logs is not an easy thing for application writer.

thanks for updating, Wangda ! some more comments focusing on decreasing code 
path.

- this may be not correct, because reserve event can happen on RESERVE state 
too, i.e. reReservation
{code}
  if (container.getState() != RMContainerState.NEW) {
container.hasIncreaseReservation = true;
  }
{code}
 - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? 
and therefore NodeHeartBeatResponse and Impl change is not needed; similarly 
nmReportedIncreasedContainers can be a list.
 - When decreasing a container, should it send RMNodeDecreaseContainerEvent too 
?
 - revert ContainerManagerImpl change
 - Remove SchedulerApplicationAttempt#getIncreaseRequests
 - In AbstractYarnScheduler#deceraseContainers() move 
checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same 
place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for 
consistency.
- this if condition is not needed.
{code}
  public boolean unreserve(Priority priority,
  FiCaSchedulerNode node, RMContainer rmContainer) {
if (rmContainer.hasIncreaseReservation()) {
  rmContainer.cancelIncreaseReservation();
}
{code}
 - looks like when decreasing reservedIncreasedContainer, it will unreserve the 
*whole* extra reserved resource, should it only unreserve the extra resources 
being decresed ?
 - In general, I think we should be able to decrease/increase a regular 
reserved container or a increasedReservedContainer ? 
- In ParentQueue, this null check is not needed.
{code}
  @Override
  public void decreaseCo

[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738568#comment-14738568
 ] 

Jian He commented on YARN-1651:
---

bq. I think we may need add such information to AMRMProtocol to make sure AM 
will be notified. For now, we can keep them as-is. Users can still get such 
information from RM logs.
I think for now we can fail the allocate call explicitly on those very clear 
situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation 
that rmContainer doesn't exist That's more explicit to users. Digging through 
logs is not an easy thing for application writer.

thanks for updating, Wangda ! some more comments focusing on decreasing code 
path.

- this may be not correct, because reserve event can happen on RESERVE state 
too, i.e. reReservation
{code}
  if (container.getState() != RMContainerState.NEW) {
container.hasIncreaseReservation = true;
  }
{code}
 - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? 
and therefore NodeHeartBeatResponse and Impl change is not needed; similarly 
nmReportedIncreasedContainers can be a list.
 - When decreasing a container, should it send RMNodeDecreaseContainerEvent too 
?
 - revert ContainerManagerImpl change
 - Remove SchedulerApplicationAttempt#getIncreaseRequests
 - In AbstractYarnScheduler#deceraseContainers() move 
checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same 
place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for 
consistency.
- this if condition is not needed.
{code}
  public boolean unreserve(Priority priority,
  FiCaSchedulerNode node, RMContainer rmContainer) {
if (rmContainer.hasIncreaseReservation()) {
  rmContainer.cancelIncreaseReservation();
}
{code}
 - looks like when decreasing reservedIncreasedContainer, it will unreserve the 
*whole* extra reserved resource, should it only unreserve the extra resources 
being decresed ?
 - In general, I think we should be able to decrease/increase a regular 
reserved container or a increasedReservedContainer ? 
- In ParentQueue, this null check is not needed.
{code}
  @Override
  public void decreaseContainer(Resource clusterResource,
  SchedContainerChangeRequest decreaseRequest,
  FiCaSchedulerApp app) {
if (app != null) {
{code}

- allocate call is specifically marked as noLock, but now every allocate call 
holds the global scheduler lock which is too expensive. we can move 
decreaseContainer to application itself.  
{code}   protected synchronized void decreaseContainer( {code}
It is also now holding queue Lock on allocate, which is also expensive, because 
that means a bunch of malicious AMs can effectively block queue's execuation.  

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4081) Add support for multiple resource types in the Resource class

2015-09-10 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4081:

Attachment: YARN-4081-YARN-3926.008.patch

Attaching a new version of the patch without the web services changes. 
[~leftnoteasy] had concerns that we don't have existing tests to make sure teh 
web services changes won't break existing APIs. This will lead to failing unit 
tests which will be addressed in later patches(once we add unit tests to 
validate we won't break the REST API).

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch, 
> YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, 
> YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, 
> YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch, 
> YARN-4081-YARN-3926.008.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-10 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738559#comment-14738559
 ] 

Varun Saxena commented on YARN-4075:


Ok, will rebase the patch. Maybe after reviewing 3901 and 4074

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.POC.1.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738377#comment-14738377
 ] 

Steve Loughran commented on YARN-4131:
--

For fault injection /chaos monkey I want containers killed without warning, so 
as to test how the app and its AM handle it. It should look exactly like any of 
the infrastructure failures: container exit, Yarn OOM event, pre-emption, node 
failure, ...

signalling is meant to give the AM the opportunity to send events —like a clean 
shutdown signal— to apps

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority

2015-09-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738364#comment-14738364
 ] 

Naganarasimha G R commented on YARN-4068:
-

already linked  YARN-4129 

> Support appUpdated event in TimelineV2 to publish details for movetoqueue, 
> change in priority
> -
>
> Key: YARN-4068
> URL: https://issues.apache.org/jira/browse/YARN-4068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sunil G
>Assignee: Sunil G
>
> YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to 
> track and port appUpdated changes in V2 for
> - movetoqueue
> - updateAppPriority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738348#comment-14738348
 ] 

Hadoop QA commented on YARN-4111:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  1 
new checkstyle issues (total was 299, now 300). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 19s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755071/YARN-4111_2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f153710 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9075/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9075/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9075/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9075/console |


This message was automatically generated.

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4136) LinuxContainerExecutor loses info when forwarding ResourceHandlerException

2015-09-10 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738326#comment-14738326
 ] 

Varun Vasudev commented on YARN-4136:
-

+1 for the patch. I'll commit this tomorrow if no one objects.

> LinuxContainerExecutor loses info when forwarding ResourceHandlerException
> --
>
> Key: YARN-4136
> URL: https://issues.apache.org/jira/browse/YARN-4136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Bibin A Chundatt
>Priority: Trivial
> Attachments: 0001-YARN-4136.patch
>
>
> The Linux container executor {{launchContainer}} method throws 
> {{ResourceHandlerException}} when there are problems setting up the container 
> -but these aren't propagated in the raised IOE. They should be nested with 
> the string value included in the message text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738293#comment-14738293
 ] 

Sunil G commented on YARN-4111:
---

Hi [~nijel]

one minor nit:

{{RMAppKilledAttemptEvent}}  is used for both RMApp and RMAppAttempt. Name is 
slightly confusing. I think we can use this only for RMApp. Also in 
RMAppAttempt, {{RMAppFailedAttemptEvent}} is changed to 
{{RMAppKilledAttemptEvent}}. Could we generalize RMAppFailedAttemptEvent for 
both Failed and Killed, and it can also take diagnostics.

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)