[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-08-28 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720988#comment-14720988
 ] 

Varun Saxena commented on YARN-4075:


Yup it can progress. 

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-28 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720963#comment-14720963
 ] 

Shiwei Guo commented on YARN-3933:
--

Thanks for add me to the contributer list, so exciting! 

I have noticed Jenkins' complain and submitted a new patch in 
[YARN-4089|https://issues.apache.org/jira/browse/YARN-4089]. And unfortunately 
it still not confirm to the QA standard. I'm working on it to add a unit test 
for this issue, and will submit it to here soon.

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720962#comment-14720962
 ] 

Hadoop QA commented on YARN-4093:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 17s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 13s | The applied patch generated  2 
new checkstyle issues (total was 24, now 26). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 27s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 52s | Tests passed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   1m 56s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  52m 47s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 110m 39s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-common |
| Failed unit tests | hadoop.yarn.api.TestPBImplRecords |
| Failed build | hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753110/YARN-4093.v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e2c9b28 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8940/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8940/console |


This message was automatically generated.

> Encapsulate additional group information in the AM to RM heartbeat
> --
>
> Key: YARN-4093
> URL: https://issues.apache.org/jira/browse/YARN-4093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, yarn
>Affects Versions: 2.7.1
>Reporter: Robert Grandl
>Assignee: Robert Grandl
>  Labels: patch
> Attachments: AllocateRequest_extension.docx, YARN-4093.patch, 
> YARN-4093.v2.patch
>
>
> In this JIRA we propose to enhance the AM RM protocol with a new message 
> which encapsulates additional  information about group of tasks. The RM 
> scheduler will benefit of the additional information to take better decisions 
> at the scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720961#comment-14720961
 ] 

Hadoop QA commented on YARN-3528:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753117/YARN-3528-branch2.patch
 |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / e2c9b28 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8941/console |


This message was automatically generated.

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-branch2.patch, 
> YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-28 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3528:
---
Attachment: YARN-3528-branch2.patch

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-branch2.patch, 
> YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-28 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720958#comment-14720958
 ] 

Brahma Reddy Battula commented on YARN-3528:


[~rkanter] uploaded the branch-2 patch.. thanks..

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-branch2.patch, 
> YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720937#comment-14720937
 ] 

Hadoop QA commented on YARN-3920:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 55s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m 25s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753105/YARN-3920.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e2c9b28 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8939/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8939/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8939/console |


This message was automatically generated.

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat

2015-08-28 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-4093:

Attachment: YARN-4093.v2.patch

> Encapsulate additional group information in the AM to RM heartbeat
> --
>
> Key: YARN-4093
> URL: https://issues.apache.org/jira/browse/YARN-4093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, yarn
>Affects Versions: 2.7.1
>Reporter: Robert Grandl
>Assignee: Robert Grandl
>  Labels: patch
> Attachments: AllocateRequest_extension.docx, YARN-4093.patch, 
> YARN-4093.v2.patch
>
>
> In this JIRA we propose to enhance the AM RM protocol with a new message 
> which encapsulates additional  information about group of tasks. The RM 
> scheduler will benefit of the additional information to take better decisions 
> at the scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-28 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.005.patch

Fixed whitespace

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720877#comment-14720877
 ] 

Hadoop QA commented on YARN-4093:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  6s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  8s | The applied patch generated  1 
new checkstyle issues (total was 24, now 25). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 45s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 57s | Tests failed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   2m  0s | Tests failed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  58m 37s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 117m 48s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-common |
| Failed unit tests | hadoop.yarn.client.TestYarnApiClasses |
|   | hadoop.yarn.api.TestPBImplRecords |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753093/YARN-4093.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e2c9b28 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8938/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8938/console |


This message was automatically generated.

> Encapsulate additional group information in the AM to RM heartbeat
> --
>
> Key: YARN-4093
> URL: https://issues.apache.org/jira/browse/YARN-4093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, yarn
>Affects Versions: 2.7.1
>Reporter: Robert Grandl
>Assignee: Robert Grandl
>  Labels: patch
> Attachments: AllocateRequest_extension.docx, YARN-4093.patch
>
>
> In this JIRA we propose to enhance the AM RM protocol with a new message 
> which encapsulates additional  information about group of tasks. The RM 
> scheduler will benefit of the additional information to take better decisions 
> at the scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-08-28 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720837#comment-14720837
 ] 

Joep Rottinghuis commented on YARN-3901:


Reviewed / discussed 1.patch with [~vrushalic]
Comments may sound cryptic to others, but roughly we discussed these changes to 
make things generic (and more clear / reusable for the long run):

No timestamp needed in FlowActivity table. Runs can start one day and end 
another.
Probably start without, add later if needed. 

Min/Max how does the use of app-id be needed?
FlowScanner currentMinCell should not consider app ID.
If there is a start time for an app id, and then later another start, we should 
still keep the min, not the latest value.

UI based on FlowActivity can enumerate active flows for that day, plus show 
number of runs, and # of distinct versions.

Update javadoc on FlowRunKey.

FlowRunTable add increment and decrement for number of running apps (during 
start and app end).

MIN, MAX, SUM, SUM_FINAL should be AggOps

Aggregation dimension = metric name (stored in column)
Aggregation compaction dimension = application id

For store, make the Attributes... the last argument.
An attribute is a tuple of String, byte[]

The MIN AggregationOperation should have a createAttribute method that takes
an AggCompactionDimension as argument and return an Attribute.

Assumption is that all the cells in a put are the same operation.

In the general coprocessor, read the attribute (does not have to be unique).
Always add a tag witht eh aggregationCompaction dimension.
Set compaction tag only if compaction needs to be done (if the operation is 
SUM_FINAL).



> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat

2015-08-28 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-4093:

Summary: Encapsulate additional group information in the AM to RM heartbeat 
 (was: Encapsulate additional information through AM to RM heartbeat)

> Encapsulate additional group information in the AM to RM heartbeat
> --
>
> Key: YARN-4093
> URL: https://issues.apache.org/jira/browse/YARN-4093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, yarn
>Affects Versions: 2.7.1
>Reporter: Robert Grandl
>Assignee: Robert Grandl
>  Labels: patch
> Attachments: AllocateRequest_extension.docx, YARN-4093.patch
>
>
> In this JIRA we propose to enhance the AM RM protocol with a new message 
> which encapsulates additional  information about group of tasks. The RM 
> scheduler will benefit of the additional information to take better decisions 
> at the scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4093) Encapsulate additional information through AM to RM heartbeat

2015-08-28 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-4093:

Attachment: YARN-4093.patch

> Encapsulate additional information through AM to RM heartbeat
> -
>
> Key: YARN-4093
> URL: https://issues.apache.org/jira/browse/YARN-4093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, yarn
>Affects Versions: 2.7.1
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: AllocateRequest_extension.docx, YARN-4093.patch
>
>
> In this JIRA we propose to enhance the AM RM protocol with a new message 
> which encapsulates additional  information about group of tasks. The RM 
> scheduler will benefit of the additional information to take better decisions 
> at the scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4093) Encapsulate additional information through AM to RM heartbeat

2015-08-28 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-4093:

Attachment: AllocateRequest_extension.docx

Added a proposed design doc

> Encapsulate additional information through AM to RM heartbeat
> -
>
> Key: YARN-4093
> URL: https://issues.apache.org/jira/browse/YARN-4093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, yarn
>Affects Versions: 2.7.1
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: AllocateRequest_extension.docx
>
>
> In this JIRA we propose to enhance the AM RM protocol with a new message 
> which encapsulates additional  information about group of tasks. The RM 
> scheduler will benefit of the additional information to take better decisions 
> at the scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4093) Encapsulate additional information through AM to RM heartbeat

2015-08-28 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-4093:

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-2745

> Encapsulate additional information through AM to RM heartbeat
> -
>
> Key: YARN-4093
> URL: https://issues.apache.org/jira/browse/YARN-4093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, yarn
>Affects Versions: 2.7.1
>Reporter: Robert Grandl
>Assignee: Robert Grandl
>
> In this JIRA we propose to enhance the AM RM protocol with a new message 
> which encapsulates additional  information about group of tasks. The RM 
> scheduler will benefit of the additional information to take better decisions 
> at the scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4093) Encapsulate additional information through AM to RM heartbeat

2015-08-28 Thread Robert Grandl (JIRA)
Robert Grandl created YARN-4093:
---

 Summary: Encapsulate additional information through AM to RM 
heartbeat
 Key: YARN-4093
 URL: https://issues.apache.org/jira/browse/YARN-4093
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, yarn
Reporter: Robert Grandl
Assignee: Robert Grandl


In this JIRA we propose to enhance the AM RM protocol with a new message which 
encapsulates additional  information about group of tasks. The RM scheduler 
will benefit of the additional information to take better decisions at the 
scheduling time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720708#comment-14720708
 ] 

Hadoop QA commented on YARN-4092:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 57s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 15s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client compilation is broken. |
| {color:red}-1{color} | findbugs |   0m 30s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common compilation is broken. |
| {color:red}-1{color} | findbugs |   0m 46s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 compilation is broken. |
| {color:green}+1{color} | findbugs |   0m 46s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:red}-1{color} | yarn tests |   0m 15s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  0s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  54m  2s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 56s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-yarn-client |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753069/YARN-4092.2.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / cbb2495 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8936/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8936/console |


This message was automatically generated.

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720668#comment-14720668
 ] 

Jason Lowe commented on YARN-3942:
--

Yeah that's going to be tricky, especially if we need to move most of the code 
into YARN.  Haven't had time to give this much thought, but the only way I can 
think of to keep most of the functionality in YARN is to have the timeline 
client be able to specify when a new "session" starts (i.e.: entity file writer 
should start writing to a new file and user provides some clue/hint as to what 
to name the file).  We can then have a plugin on the entity file server side 
that allows apps to override the getTimelineStoreForRead functionality.

If that was in place then the Tez side could start a new session (dag file) 
each time the dag changed.  The Tez-specific plugin on the timeline server side 
could then translate dag/vertex/task/attempt IDs into the appropriate dag file 
to cache.  There would still be some questions as to how the timeline store 
cache would be managed on the server side and how to support multiple 
framework-specific plugins simultaneously.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-08-28 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720667#comment-14720667
 ] 

Joep Rottinghuis commented on YARN-3862:


If we need to retrieve exactly known columns (and in addition we know if it is 
a metric, or a config value etc) then we can add these to the scan (or get) 
directly through
{code}
addColumn(byte [] family, byte [] qualifier)
{code}

For ColumnPrefixFilter is also clear. That is just restricting which rows are 
returned (it filters the keys).
The confusion starts with org.apache.hadoop.hbase.filter.QualifierFilter. That 
can be used to retrieve only some columns, specifically when combined with a 
WhileMatchFilter.

In addition we have the consideration whether we want to push these limits down 
to HBase (which is preferable) or whether we want to just pull back everything 
from HBase and restrict what we serialize in the result.

I think it would be cleaner to have a direct separate API (method argument) to 
be able to specify which columns to retrieve. If we then add specific values to 
the scan, or prefix patterns to a filter is up to the implementation.

> Decide which contents to retrieve and send back in response in TimelineReader
> -
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3862-YARN-2928.wip.01.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4092:

Attachment: YARN-4092.3.patch

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2928) YARN Timeline Service: Next generation

2015-08-28 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C reassigned YARN-2928:


Assignee: Vrushali C  (was: Sangjin Lee)

> YARN Timeline Service: Next generation
> --
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720574#comment-14720574
 ] 

Hitesh Shah commented on YARN-3942:
---

[~jlowe] [~rajesh.balamohan] observed that the timeline server was running out 
of memory in a certain scenario. In this scenario, we are using Hive-on-Tez but 
Hive re-uses the application to run 100s of DAGs/queries (doAs=false with 
perimeter security using say Ranger or Sentry). The EntityFileStore sizes a 
cache based on the no. of applications it can cache but in the above scenario, 
even a single app could be very large. Ideally, if each "dag" was in a separate 
file and all of its entries treated as a single cache entity - that would 
probably work better but making this generic enough may be a bit tricky.

Any suggestions here? 



> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-28 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720571#comment-14720571
 ] 

Robert Kanter commented on YARN-3528:
-

It looks like this doesn't apply cleanly to branch-2.  [~brahmareddy], can you 
create a branch-2 version of the patch?

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-28 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3528:

Affects Version/s: (was: 3.0.0)
   2.8.0
 Target Version/s: 2.8.0  (was: 3.0.0)
   Issue Type: Improvement  (was: Bug)

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4092:

Attachment: YARN-4092.2.patch

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720560#comment-14720560
 ] 

Xuan Gong commented on YARN-4092:
-

added a test case

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4080) Capacity planning for long running services on YARN

2015-08-28 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720538#comment-14720538
 ] 

Subru Krishnan commented on YARN-4080:
--

[~mding], your proposal looks interesting and thanks for taking a look at 
YARN-1051. You are right that the main use case of the reservation system is to 
address SLAs but it can be used for capacity planning for long running services 
by specifying start time as now and deadline as infinity. This should provide 
more predictability for long running services as you can handle dynamic 
resource requirements of a service as YARN-1051 allows expressing time varying 
capacity. Additionally in combination with YARN-2877, you should be able to 
achieve the dynamic host based reservation mechanics you have proposed.

> Capacity planning for long running services on YARN
> ---
>
> Key: YARN-4080
> URL: https://issues.apache.org/jira/browse/YARN-4080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, resourcemanager
>Reporter: MENG DING
>
> YARN-1197 addresses the functionality of container resource resize. One major 
> use case of this feature is for long running services managed by Slider to 
> dynamically flex up and down resource allocation of individual components 
> (e.g., HBase region server), based on application metrics/alerts obtained 
> through third-party monitoring and policy engine. 
> One key issue with increasing container resource at any point of time is that 
> the additional resource needed by the application component may not be 
> available *on the specific node*. In this case, we need to rely on preemption 
> logic to reclaim the required resource back from other (preemptable) 
> applications running on the same node. But this may not be possible today 
> because:
> * preemption doesn't consider constraints of pending resource requests, such 
> as hard locality requirements, user limits, etc (being addressed in YARN-2154 
> and possibly in YARN-3769?) 
> * there may not be any preemptable container available due to the fact that 
> no queue is over its guaranteed capacity.
> What we need, ideally, is a way for YARN to support future capacity planning 
> of long running services. At the minimum, we need to provide a way to let 
> YARN know about the resource usage prediction/pattern of a long running 
> service. And given this knowledge, YARN should be able to preempt resources 
> from other applications to accommodate the resource needs of the long running 
> service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720446#comment-14720446
 ] 

Hadoop QA commented on YARN-4092:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m  7s | The patch appears to introduce 2 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  50m 10s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 26s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestMaxRunningAppsEnforcer
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753032/YARN-4092.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / beb65c9 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8935/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8935/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8935/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8935/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8935/console |


This message was automatically generated.

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-08-28 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720406#comment-14720406
 ] 

Li Lu commented on YARN-4075:
-

Hi [~varun_saxena], is this JIRA still blocked by YARN-4074, or it can progress 
since some of the interface discussions are reaching agreements? Thanks! 

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-08-28 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720392#comment-14720392
 ] 

Vrushali C commented on YARN-3901:
--

Hi [~gtCarrera9]

Thanks for the first review pass! To answer your questions:
bq.  IIUC, we now directly write data into the flow run related tables upon 
application start, finish, and periodic flush, and we only perform the 
aggregations in our coprocessors
Yes, data is written to flow run and flow activity tables in a quick simple 
write but the correct values to be returned are determined at read time AND 
(TBD) at flush/compaction time. During flush/compaction, the data from various 
cells will be 'merged' into fewer number of cells so that next read calls are 
faster.

bq.  How are those coprocessor connected. Is it through an Hbase configuration 
externally, or there're some lines set them up in this patch that I missed 
(which is quite possible)?

During table creation time, we specify the coprocessor class. This can also be 
done later by alter table command as desired.

bq. I noticed you're performing aggregation work in the coprocessor 
(FlowScanner), this is slightly different to the approach in YARN-3816 (app 
level aggregation). My hunch is that we may need some sort of common APIs for 
aggregating metrics, so that we can centralize the aggregation logic? Or, why 
is the flow run level aggregation significantly different to app level 
aggregation (so that we cannot share the same aggregation logic)?

There are some differences between the two aggregations, I think. Not sure if 
the classes can be reused without complicating development efforts. For the PoC 
I would like to focus on these tables independently. We could file follow up 
jiras to refactor the code as we see fit when the whole picture emerges, does 
that sound good?

Keep the questions coming, thanks!


> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-08-28 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720377#comment-14720377
 ] 

Li Lu commented on YARN-3901:
-

Hi [~vrushalic], thanks for the work! While looking at the latest patch, I have 
some general questions. IIUC, we now directly write data into the flow run 
related tables upon application start, finish, and periodic flush, and we only 
perform the aggregations in our coprocessors? I remember this design and I 
think this looks fine, but w.r.t the coprocessors, I'm unclear about:
# How are those coprocessor connected. Is it through an Hbase configuration 
externally, or there're some lines set them up in this patch that I missed 
(which is quite possible)? 
# I noticed you're performing aggregation work in the coprocessor 
(FlowScanner), this is slightly different to the approach in YARN-3816 (app 
level aggregation). My hunch is that we may need some sort of common APIs for 
aggregating metrics, so that we can centralize the aggregation logic? Or, why 
is the flow run level aggregation significantly different to app level 
aggregation (so that we cannot share the same aggregation logic)?

I'll keep looking at this patch later today, more comments may come during the 
weekend or next Monday. 

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720320#comment-14720320
 ] 

Xuan Gong commented on YARN-4092:
-

The purpose: when the RM finds there is no active RM at that time, it will send 
a request to itself with a delay.

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4092:

Attachment: YARN-4092.1.patch

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-28 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720310#comment-14720310
 ] 

Varun Vasudev commented on YARN-4082:
-

Thanks for the patch [~leftnoteasy].

Couple of minor fixes -
1.
{code}
+  public void incUsedResource(String nodeLabel, Resource resourceToInc, 
SchedulerApplicationAttempt application) {
{code}
and
{code}
+  public void decUsedResource(String nodeLabel, Resource resourceToDec, 
SchedulerApplicationAttempt application) {
{code}
need to formatted for line length.

2.
{code}
+String newPartition;
+if (newLabels.isEmpty()) {
+  newPartition = RMNodeLabelsManager.NO_LABEL;
+} else {
+  newPartition = newLabels.iterator().next();
+}
+
+String oldPartition = node.getPartition();
{code}
Can you add a comment explaining that only one label is allowed per node? Also, 
can you move this code outside the for loop? Seems un-neccessary to evaluate it 
for every application.

Rest of the patch looks good to me.

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4082.1.patch, YARN-4082.2.patch
>
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-28 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-4092:
---

 Summary: RM HA UI redirection needs to be fixed when both RMs are 
in standby mode
 Key: YARN-4092
 URL: https://issues.apache.org/jira/browse/YARN-4092
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong


In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720249#comment-14720249
 ] 

Hudson commented on YARN-1556:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #308 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/308/])
YARN-1556. NPE getting application report with a null appId. Contributed by 
Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-28 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720246#comment-14720246
 ] 

Li Lu commented on YARN-4074:
-

bq. One thing I forgot to mention is that the current POC patch is a diff 
against the patch for YARN-3901, to be able to isolate the changes for this 
JIRA. 
Thanks for reminding this! I'll take a look at it shortly. 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-08-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720243#comment-14720243
 ] 

Sunil G commented on YARN-4091:
---

Thank you [~Naganarasimha] for linking this issue. yes, this will be a subset 
here.

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Improvement on debugdiagnostic information - YARN.pdf
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720242#comment-14720242
 ] 

Sangjin Lee commented on YARN-4058:
---

Done. Thanks.

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: YARN-2928
>
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720234#comment-14720234
 ] 

Hudson commented on YARN-1556:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2246 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2246/])
YARN-1556. NPE getting application report with a null appId. Contributed by 
Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720233#comment-14720233
 ] 

Sunil G commented on YARN-3970:
---

Thank you [~Naganarasimha]

Yes. We can have the improvement to save an extra save to statestore. And the 
fix for the same looks good.

Few comments

1. updateAppPriority --> updateApplicationPriority. I prefer we can have full 
expanded name here as its a separate class to identify a web app object.
2. {{priority.getPriority() != targetPriority.getPriority()}} We could use 
{{!priority.equals(targetPriority)}}
3. 
{code}
+AppPriority effectivePriority = new AppPriority(
+app.getApplicationSubmissionContext().getPriority().getPriority());
{code}
If {{app.getApplicationSubmissionContext().getPriority()}} is NULL, we will get 
n NPE here.


> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720212#comment-14720212
 ] 

Junping Du commented on YARN-4058:
--

Yes. Adding a new commit to correct hadoop-yarn/CHANGES.txt is the right way.

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: YARN-2928
>
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720204#comment-14720204
 ] 

Sangjin Lee commented on YARN-4058:
---

Thanks for finding that [~djp].

I'd love to edit that commit, but then that would disrupt you all again because 
I need to force push. How about adding a new commit that fixes 
hadoop-yarn/CHANGES.txt?

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: YARN-2928
>
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720101#comment-14720101
 ] 

Hadoop QA commented on YARN-3970:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 52s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  6s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  5 
new checkstyle issues (total was 164, now 169). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 18s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753002/YARN-3970.20150828-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / beb65c9 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8934/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8934/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8934/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8934/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8934/console |


This message was automatically generated.

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720075#comment-14720075
 ] 

Hudson commented on YARN-1556:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2265 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2265/])
YARN-1556. NPE getting application report with a null appId. Contributed by 
Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3260) NPE if AM attempts to register before RM processes launch event

2015-08-28 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3260:

Assignee: (was: Naganarasimha G R)

> NPE if AM attempts to register before RM processes launch event
> ---
>
> Key: YARN-3260
> URL: https://issues.apache.org/jira/browse/YARN-3260
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>
> The RM on one of our clusters was running behind on processing 
> AsyncDispatcher events, and this caused AMs to fail to register due to an 
> NPE.  The AM was launched and attempting to register before the 
> RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token 
> had not been generated yet.  The NPE occurred because the 
> ApplicationMasterService tried to encode the missing token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720017#comment-14720017
 ] 

Junping Du commented on YARN-4058:
--

[~sjlee0], the JIRA number is not correct in your commits. Please update it to 
a correct one.

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: YARN-2928
>
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719329#comment-14719329
 ] 

Hudson commented on YARN-1556:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1049 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1049/])
YARN-1556. NPE getting application report with a null appId. Contributed by 
Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* hadoop-yarn-project/CHANGES.txt


> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719318#comment-14719318
 ] 

Junping Du commented on YARN-3933:
--

I cancel the patch, please see Jenkins' comments: "The patch file was not named 
according to hadoop's naming conventions". Basically, you should rename your 
patch with prefix as "YARN-3933" and with postfix as ".patch".

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719319#comment-14719319
 ] 

Hudson commented on YARN-1556:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #321 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/321/])
YARN-1556. NPE getting application report with a null appId. Contributed by 
Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* hadoop-yarn-project/CHANGES.txt


> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719313#comment-14719313
 ] 

Junping Du commented on YARN-3933:
--

This is not a lucky thing but a default behavior for YARN to schedule resource. 
Negative available resource just mark resource commit (consumption + 
reservation) is larger than current system resources which means YARN support 
resource over-commitment which get supported in mostly modern OS or Distributed 
OS. I just comment on YARN-4067 which seems to be an invalid JIRA to me.

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3933:
-
Assignee: Shiwei Guo  (was: Lavkesh Lahngir)

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719298#comment-14719298
 ] 

Junping Du commented on YARN-3933:
--

Looks like you are not YARN contributors yet, adding you to this elite group. :)

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3933:
-
Summary: Race condition when calling 
AbstractYarnScheduler.completedContainer.  (was: Resources(both core and 
memory) are being negative)

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719292#comment-14719292
 ] 

Junping Du commented on YARN-3933:
--

Hi [~guoshiwei], we should just update the description and title for this JIRA 
instead of creating a new one. No worry. I will mark YARN-4089 as duplicated 
one for this JIRA and assign this JIRA to you given you would like to work on 
this and already have patch to fix it.

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4067) available resource could be set negative

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719279#comment-14719279
 ] 

Junping Du commented on YARN-4067:
--

I don't think we should cap negative value to zero in any case. In some cases, 
YARN provide flexible resource model to allow resource to do over commit. Just 
like OS can claim/allocate more memory resource than physical one - backed by 
virtual memory mechanism, YARN's resource model is also flexible here - backed 
by mechanisms like: resource/container preemption, dynamic resource 
configuration (YARN-291), etc. We never have assumption that available resource 
couldn't be negative value, and this negative value can notify YARN to balance 
resource consumption again in some way. 
Thus, I propose to resolve this JIRA as Not A problem or Invalid.

> available resource could be set negative
> 
>
> Key: YARN-4067
> URL: https://issues.apache.org/jira/browse/YARN-4067
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4067.patch
>
>
> as mentioned in YARN-4045 by [~leftnoteasy], available memory could be 
> negative due to reservation, propose to use componentwiseMax to 
> updateQueueStatistics in order to cap negative value to zero



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-3933) Resources(both core and memory) are being negative

2015-08-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3933:
-
Comment: was deleted

(was: So I should better open a new issue instead?)

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3970) REST api support for Application Priority

2015-08-28 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3970:

Attachment: YARN-3970.20150828-1.patch

Hi [~sunilg] & [~rohithsharma], Attaching the first patch as per the prev 
discussion and also there was one issue in 
CapacityScheduler.updateApplicationPriority, suppose already application is 
running with the Cluster Max Priority and user specifies some priority greater 
than MaxPriority, unnecessarily  RMStatestore update and queue's treeset is 
updated with MaxPriority again. 

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718641#comment-14718641
 ] 

Junping Du commented on YARN-4087:
--

Patch LGTM.
bq. +1, if fail-fast hasn't been in any prior release and we are not 
drastically altering the behavior.
I believe fail-fast just involve recently. However, the default behavior when 
RM/NM state store get failed could be different from previous releases: it 
failed NM/RM daemons previously, now we could tolerant it keep running with log 
some error messages. We should definitely note this in our release notes. Also, 
may be we should mark this JIRA as incompatible (for behavior)?


> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch, YARN-4087.2.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718597#comment-14718597
 ] 

Jason Lowe commented on YARN-4088:
--

bq. See the problem with slower heartbeats is that if the tasks are 
short-running, there will be a cluster-wide throughput drop due to the feedback 
delay.
The nodemanager will do an out-of-band heartbeat if a container is killed, and 
IMHO should do the same when a container completes (not sure what's so special 
about killed vs. exiting wrt. scheduling).  Of course you can still get storms 
of heartbeats even though you explicitly tuned down the heartbeat interval if 
the cluster is churning containers at a very fast rate.


> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1556:
-
Priority: Minor  (was: Trivial)

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718568#comment-14718568
 ] 

Hudson commented on YARN-1556:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #316 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/316/])
YARN-1556. NPE getting application report with a null appId. Contributed by 
Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718525#comment-14718525
 ] 

Hudson commented on YARN-1556:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8363 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8363/])
YARN-1556. NPE getting application report with a null appId. Contributed by 
Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718497#comment-14718497
 ] 

Steve Loughran commented on YARN-4085:
--

+1 some YARN_CORES value which has had whatever vcore => phycore mapping the 
cluster has applied

> Generate file with container resource limits in the container work dir
> --
>
> Key: YARN-4085
> URL: https://issues.apache.org/jira/browse/YARN-4085
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
>
> Currently, a container doesn't know what resource limits are being imposed on 
> it. It would be helpful if the NM generated a simple file in the container 
> work dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718483#comment-14718483
 ] 

Junping Du commented on YARN-1556:
--

+1. Patch LGTM. Will fix whitespace issue reported by Mr Jenkins in commit.

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718352#comment-14718352
 ] 

Hadoop QA commented on YARN-1556:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 54s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m 33s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752960/YARN-1556.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e166c03 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8933/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8933/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8933/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8933/console |


This message was automatically generated.

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
>

[jira] [Assigned] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2015-08-28 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin reassigned YARN-4090:
-

Assignee: Xianyin Xin

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-1556:
--
Component/s: client

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-1556:
--
Affects Version/s: 2.7.1

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-1556:
--
Attachment: YARN-1556.patch

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718251#comment-14718251
 ] 

Weiwei Yang commented on YARN-1556:
---

I recently run into this problem. So I created a patch to resolve this problem. 
Please kindly help to review. Thanks.

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1556) NPE getting application report with a null appId

2015-08-28 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned YARN-1556:
-

Assignee: Weiwei Yang  (was: haosdent)

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Trivial
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718231#comment-14718231
 ] 

Steve Loughran commented on YARN-4085:
--

Make it an env var, maybe one per limit (if unset for that var == not limited; 
allows for new resource limiting to be added later (YARN_CONTAINER_LIMIT_IO ... 
)

> Generate file with container resource limits in the container work dir
> --
>
> Key: YARN-4085
> URL: https://issues.apache.org/jira/browse/YARN-4085
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
>
> Currently, a container doesn't know what resource limits are being imposed on 
> it. It would be helpful if the NM generated a simple file in the container 
> work dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718228#comment-14718228
 ] 

Steve Loughran commented on YARN-4083:
--

+1 for some dynamicness, either the AM declares this or ZK does the heavy 
lifting (the YARN registry can publish the info)

What's the security story here? That is: how do AM IP filters know when to 
bounce an HTTP Request over to the proxy?

> Add a discovery mechanism for the scheduler addresss
> 
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
> HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
> address. This JIRA proposes the addition of an explicit discovery mechanism 
> for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-08-28 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718157#comment-14718157
 ] 

Robert Metzger commented on YARN-3337:
--

For those looking for very simple YARN chaos monkey which is working similar as 
[~steve_l] described here, I have something here: 
https://github.com/rmetzger/yarn-chaos-monkey
It is not running within the AM.
In order to kill the containers, I'm basically ssh'ing into the remote host and 
kill the process.

Maybe the link is helpful for somebody who immediately needs such a tool.

> Provide YARN chaos monkey
> -
>
> Key: YARN-3337
> URL: https://issues.apache.org/jira/browse/YARN-3337
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>
> To test failure resilience today you either need custom scripts or implement 
> Chaos Monkey-like logic in your application (SLIDER-202). 
> Killing AMs and containers on a schedule & probability is the core activity 
> here, one that could be handled by a CLI App/client lib that does this. 
> # entry point to have a startup delay before acting
> # frequency of chaos wakeup/polling
> # probability to AM failure generation (0-100)
> # probability of non-AM container kill
> # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718143#comment-14718143
 ] 

Sunil G commented on YARN-3970:
---

Hi Naga.
Yes.  I could see that you are planning to use {{getClientRMService}}.
That saves the direct api invocation to AbstractYarnScheduler. This looks
fine as all validation are handled.
Thank You.

On Fri, Aug 28, 2015, 12:38 PM Naganarasimha G R (JIRA) 



> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4065) container-executor error should include effective user id

2015-08-28 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-4065:
-

Assignee: Casey Brotherton

> container-executor error should include effective user id
> -
>
> Key: YARN-4065
> URL: https://issues.apache.org/jira/browse/YARN-4065
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Casey Brotherton
>Assignee: Casey Brotherton
>Priority: Trivial
>
> When container-executor fails to access it's config file, the following 
> message will be thrown:
> {code}
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container executor initialization is : 24
> ExitCodeException exitCode=24: Invalid conf file provided : 
> /etc/hadoop/conf/container-executor.cfg
> {code}
> The real problem may be a change in the container-executor not running as set 
> uid root.
> From:
> https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html
> {quote}
> The container-executor program must be owned by root and have the permission 
> set ---sr-s---.
> {quote}
> The error message could be improved by printing out the effective user id 
> with the error message, and possibly the executable trying to access the 
> config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-28 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718133#comment-14718133
 ] 

Naganarasimha G R commented on YARN-3970:
-

Thanks for replying [~sunilg],
bq. And it's also good to verify whether app is in accepted state or running 
state before invoking scheduler api to change priority
On calling {{rm.getClientRMService().updateApplicationPriority()}} the above 
check will be taken care inside it. Also all acl related checks also will be 
handled.

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)