[jira] [Comment Edited] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855396#comment-16855396
 ] 

Peter Bacsko edited comment on YARN-9595 at 6/4/19 6:50 AM:


Thanks for committing this quickly [~tangzhankun]


was (Author: pbacsko):
Thanks for commit this quickly [~tangzhankun]

> FPGA plugin: NullPointerException in 
> FpgaNodeResourceUpdateHandler.updateConfiguredResource()
> -
>
> Key: YARN-9595
> URL: https://issues.apache.org/jira/browse/YARN-9595
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9595-001.patch
>
>
> YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
> {{currentFpgaInfo}} is not set, resulting in an NPE being thrown:
> {noformat}
> 2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
> {noformat}
> The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} 
> if the following condition is true:
> {noformat}
> if (allowed == null || allowed.equalsIgnoreCase(
> YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
>   return list;
> } else if (allowed.matches("(\\d,)*\\d")){
> ...
> {noformat}
> Solution is simple: initialize it in both code-paths.
> Unit tests should be enhanced to verify that it's set properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855396#comment-16855396
 ] 

Peter Bacsko commented on YARN-9595:


Thanks for commit this quickly [~tangzhankun]

> FPGA plugin: NullPointerException in 
> FpgaNodeResourceUpdateHandler.updateConfiguredResource()
> -
>
> Key: YARN-9595
> URL: https://issues.apache.org/jira/browse/YARN-9595
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9595-001.patch
>
>
> YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
> {{currentFpgaInfo}} is not set, resulting in an NPE being thrown:
> {noformat}
> 2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
> {noformat}
> The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} 
> if the following condition is true:
> {noformat}
> if (allowed == null || allowed.equalsIgnoreCase(
> YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
>   return list;
> } else if (allowed.matches("(\\d,)*\\d")){
> ...
> {noformat}
> Solution is simple: initialize it in both code-paths.
> Unit tests should be enhanced to verify that it's set properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9578) Add limit/actions/summarize options for app activities REST API

2019-06-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855392#comment-16855392
 ] 

Hadoop QA commented on YARN-9578:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 1 new + 
28 unchanged - 0 fixed = 29 total (was 28) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
41s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities |
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9578 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12970766/YARN-9578.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d4770726f920 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personalit

[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855366#comment-16855366
 ] 

Hadoop QA commented on YARN-9580:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
57s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
51s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:63396be |
| JIRA Issue | YARN-9580 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12970765/YARN-9580.branch-3.2.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 797c6d086e44 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.2 / 2f01204 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24214/testReport/ |
| Max. process+thread count | 914 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24214/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was aut

[jira] [Updated] (YARN-9578) Add limit/actions/summarize options for app activities REST API

2019-06-03 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9578:
---
Attachment: YARN-9578.004.patch

> Add limit/actions/summarize options for app activities REST API
> ---
>
> Key: YARN-9578
> URL: https://issues.apache.org/jira/browse/YARN-9578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9578.001.patch, YARN-9578.002.patch, 
> YARN-9578.003.patch, YARN-9578.004.patch
>
>
> Currently all completed activities of specified application in cache will be 
> returned for application activities REST API. Most results may be redundant 
> in some scenarios which only need a few latest results, for example, perhaps 
> only one result is needed to be shown on UI for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855289#comment-16855289
 ] 

Tao Yang commented on YARN-9580:


Sorry about forgetting to check imports in UT class.

Attached v2 patch for branch-3.2 to correct it.

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch, YARN-9580.branch-3.2.001.patch, 
> YARN-9580.branch-3.2.002.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9580:
---
Attachment: YARN-9580.branch-3.2.002.patch

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch, YARN-9580.branch-3.2.001.patch, 
> YARN-9580.branch-3.2.002.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.

2019-06-03 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855280#comment-16855280
 ] 

Tao Yang commented on YARN-8995:


Thanks [~zhuqi] for the patch.

I prefer not maintain a global map (Map eventTypeRecord) which will 
be updated twice (in & out) for every event, after all it's necessary only when 
something goes wrong which could rarely happen. I think count events in 
realtime may be enough, Thoughts?

For the latest event, also we can record it only when necessary, for example, 
use a boolean flag to control whether to record the next event and should 
record one event at a time.

{quote}

now i hard code to 5000

{quote}

I suppose it should be configurable, you can set 5000 as default.

{quote}

if we need print the event type size in order?

{quote}

I'm not sure what you mean, for example: "E1:3,E2:2,E1:1,..." when event types 
in queue are "E1,E1,E1,E2,E2,E1,..." ? I think it's unnecessary if it is.

> Log the event type of the too big AsyncDispatcher event queue size, and add 
> the information to the metrics. 
> 
>
> Key: YARN-8995
> URL: https://issues.apache.org/jira/browse/YARN-8995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-8995.001.patch
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855277#comment-16855277
 ] 

Weiwei Yang commented on YARN-9580:
---

Hi [~Tao Yang]

There seems to have issues in the patch for branch-3.2, could you please take a 
look?

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch, YARN-9580.branch-3.2.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855274#comment-16855274
 ] 

Hadoop QA commented on YARN-9580:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
24s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
49s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
46s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 46s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
50s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
30s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
37s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 55s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:63396be |
| JIRA Issue | YARN-9580 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12970763/YARN-9580.branch-3.2.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1f1e5c31ef5e 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.2 / 2f01204 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/24213/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-YARN-Build/24213/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager

[jira] [Comment Edited] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855221#comment-16855221
 ] 

Tao Yang edited comment on YARN-9580 at 6/4/19 2:51 AM:


Sure, thanks [~cheersyang] for the review and commit.

I have found other problems about reservation and commented above, can you take 
a look and I think we should have a discuss about reservation and make it able 
to work when multi-node enabled.

Created YARN-9598 to track remaining problems and we can have a discuss over 
there.


was (Author: tao yang):
Sure, thanks [~cheersyang] for the review and commit.

I have found another problems about reservation and commented above, can you 
take a look and I think we should have a discuss about reservation and make it 
able to work when multi-node enabled.

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch, YARN-9580.branch-3.2.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-03 Thread Tao Yang (JIRA)
Tao Yang created YARN-9598:
--

 Summary: Make reservation work well when multi-node enabled
 Key: YARN-9598
 URL: https://issues.apache.org/jira/browse/YARN-9598
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Tao Yang
Assignee: Tao Yang


This issue is to solve problems about reservation when multi-node enabled:
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation in unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.
 # Scheduler iterates all nodes and try to allocate for reserved container in 
LeafQueue#allocateFromReservedContainer. Here there are two problems:
 ** The node of reserved container should be taken as candidates instead of all 
nodes when calling FiCaSchedulerApp#assignContainers, otherwise later scheduler 
may generate a reservation-fulfilled proposal on another node, which will 
always be rejected in FiCaScheduler#commonCheckContainerAllocation.
 ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
null even if it's just skipped, it will break the normal scheduling process for 
this leaf queue because of the if clause in LeafQueue#assignContainers: "if 
(null != assignment) \{ return assignment;}"
 # Nodes which have been reserved should be skipped when iterating candidates 
in RegularContainerAllocator#allocate, otherwise scheduler may generate 
allocation or reservation proposal on these node which will always be rejected 
in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened YARN-9580:
---

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch, YARN-9580.branch-3.2.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9580:
---
Attachment: YARN-9580.branch-3.2.001.patch

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch, YARN-9580.branch-3.2.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855235#comment-16855235
 ] 

Hudson commented on YARN-9595:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16658 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16658/])
YARN-9595. FPGA plugin: NullPointerException in (ztang: rev 
606061aa147dc6d619d6240b7ea31d8f8f220e5d)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/TestFpgaDiscoverer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/FpgaDiscoverer.java


> FPGA plugin: NullPointerException in 
> FpgaNodeResourceUpdateHandler.updateConfiguredResource()
> -
>
> Key: YARN-9595
> URL: https://issues.apache.org/jira/browse/YARN-9595
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9595-001.patch
>
>
> YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
> {{currentFpgaInfo}} is not set, resulting in an NPE being thrown:
> {noformat}
> 2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
> {noformat}
> The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} 
> if the following condition is true:
> {noformat}
> if (allowed == null || allowed.equalsIgnoreCase(
> YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
>   return list;
> } else if (allowed.matches("(\\d,)*\\d")){
> ...
> {noformat}
> Solution is simple: initialize it in both code-paths.
> Unit tests should be enhanced to verify that it's set properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855228#comment-16855228
 ] 

Zhankun Tang commented on YARN-9595:


Thanks, Peter! LGTM. +1. Committing shortly.

> FPGA plugin: NullPointerException in 
> FpgaNodeResourceUpdateHandler.updateConfiguredResource()
> -
>
> Key: YARN-9595
> URL: https://issues.apache.org/jira/browse/YARN-9595
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9595-001.patch
>
>
> YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
> {{currentFpgaInfo}} is not set, resulting in an NPE being thrown:
> {noformat}
> 2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
> {noformat}
> The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} 
> if the following condition is true:
> {noformat}
> if (allowed == null || allowed.equalsIgnoreCase(
> YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
>   return list;
> } else if (allowed.matches("(\\d,)*\\d")){
> ...
> {noformat}
> Solution is simple: initialize it in both code-paths.
> Unit tests should be enhanced to verify that it's set properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855221#comment-16855221
 ] 

Tao Yang commented on YARN-9580:


Sure, thanks [~cheersyang] for the review and commit.

I have found another problems about reservation and commented above, can you 
take a look and I think we should have a discuss about reservation and make it 
able to work when multi-node enabled.

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9597) Memory efficiency in speculator

2019-06-03 Thread Ahmed Hussein (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855155#comment-16855155
 ] 

Ahmed Hussein commented on YARN-9597:
-

By looking at previous Jira reports, it is clear that "memory leakage" is a 
persistent bug.

My concern is how many Hadoop modules have hidden bloating data structures? If 
this is a common issue, would it be useful to investigate test-cases/tools to 
detect memory leaks in Hadoop?

[~ste...@apache.org] are you aware of ongoing efforts in that direction?

> Memory efficiency in speculator 
> 
>
> Key: YARN-9597
> URL: https://issues.apache.org/jira/browse/YARN-9597
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Priority: Minor
>
> The data structures in speculator and runtime-estimator are bloating. Data 
> elements such as (taskID, TA-ID, task stats, tasks speculated, tasks 
> finished..etc) are added to the concurrent maps but never removed.
> For long running jobs, there are couple of issues:
>  # memory leakage: the speculator memory usage increases over time. 
>  # performance: keeping large structures in the heap affects the performance 
> due to locality and cache misses.
> *Suggested Fixes:*
> - When a TA transitions to {{MoveContainerToSucceededFinishingTransition}}, 
> the TA notifies the speculator. The latter handles the event by cleaning the 
> internal structure accordingly.
> - When a task transitions is failed/killed, the speculator is notified to 
> clean the internal data structure.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9597) Memory efficiency in speculator

2019-06-03 Thread Ahmed Hussein (JIRA)
Ahmed Hussein created YARN-9597:
---

 Summary: Memory efficiency in speculator 
 Key: YARN-9597
 URL: https://issues.apache.org/jira/browse/YARN-9597
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ahmed Hussein


The data structures in speculator and runtime-estimator are bloating. Data 
elements such as (taskID, TA-ID, task stats, tasks speculated, tasks 
finished..etc) are added to the concurrent maps but never removed.

For long running jobs, there are couple of issues:
 # memory leakage: the speculator memory usage increases over time. 
 # performance: keeping large structures in the heap affects the performance 
due to locality and cache misses.

*Suggested Fixes:*

- When a TA transitions to {{MoveContainerToSucceededFinishingTransition}}, the 
TA notifies the speculator. The latter handles the event by cleaning the 
internal structure accordingly.
- When a task transitions is failed/killed, the speculator is notified to clean 
the internal data structure.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855127#comment-16855127
 ] 

Hadoop QA commented on YARN-9595:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
27s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9595 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12970696/YARN-9595-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e2a84b474983 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 277e9a8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24211/testReport/ |
| Max. process+thread count | 447 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24211/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> FPGA plugin: NullPointerException in 
> FpgaN

[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-06-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855115#comment-16855115
 ] 

Hadoop QA commented on YARN-9596:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
37s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
37s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 37s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 22 new + 110 unchanged - 0 fixed = 132 total (was 110) 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
37s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m  
7s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 38s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9596 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12970748/YARN-9596.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2a8c4ac1e613 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 277e9a8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/24212/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-YARN-Build/24212/artifact/out/patch-c

[jira] [Assigned] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-06-03 Thread Muhammad Samir Khan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammad Samir Khan reassigned YARN-9596:
-

Assignee: Muhammad Samir Khan

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596.001.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-06-03 Thread Muhammad Samir Khan (JIRA)
Muhammad Samir Khan created YARN-9596:
-

 Summary: QueueMetrics has incorrect metrics when labelled 
partitions are involved
 Key: YARN-9596
 URL: https://issues.apache.org/jira/browse/YARN-9596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Muhammad Samir Khan
 Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
2019-06-03 at 4.44.15 PM.png

After YARN-6467, QueueMetrics should only be tracking metrics for the default 
partition. However, the metrics are incorrect when labelled partitions are 
involved.

Steps to reproduce

==
 # Configure capacity-scheduler.xml with label configuration
 # Add label "test" to cluster and replace label on node1 to be "test"
 # Note down "totalMB" at 
/ws/v1/cluster/metrics
 # Start first job on test queue.
 # Start second job on default queue (does not work if the order of two jobs is 
swapped).
 # While the two applications are running, the "totalMB" at 
/ws/v1/cluster/metrics will go down by the 
amount of MB used by the first job (screenshots attached).

Alternately:

In 
TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
 add the following line at the end of the test before rm1.close():

CSQueue rootQueue = cs.getRootQueue();
assertEquals(10*GB,
 rootQueue.getMetrics().getAvailableMB() + 
rootQueue.getMetrics().getAllocatedMB());

There are two nodes of 10GB each and only one of them have a non-default label. 
The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-06-03 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854941#comment-16854941
 ] 

Eric Payne commented on YARN-8625:
--

[~Prabhu Joseph], in order to put these changes into branch-2.8 and branch-2.7, 
I will need to put them into branch-2 and branch-2.9 first. The patch for trunk 
backports cleanly to branch-2, but it does not compile in branch-2. So, I will 
need another patch for branch-2 specifically.
{noformat}
[ERROR] 
/home/ericp/hadoop/source/Apache/YARN-8625/branch-2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java:[117,48]
 cannot find symbol
[ERROR]   symbol:   method getResourceSecondsMap()
[ERROR]   location: class 
org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport
{noformat}

The branch-2.8 and branch-2.7 don't apply cleanly to branch-2. 

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> YARN-8625-branch-2.7.001.patch, YARN-8625-branch-2.8.001.patch, yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6055) ContainersMonitorImpl need be adjusted when NM resource changed.

2019-06-03 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854940#comment-16854940
 ] 

Íñigo Goiri commented on YARN-6055:
---

{{TestContainerSchedulerQueuing#testQueueShedding()}} is a little worrisome.
I think we should make it a little more resilient and probably wait for the 
value.
[~abmodi] any thoughts here?

> ContainersMonitorImpl need be adjusted when NM resource changed.
> 
>
> Key: YARN-6055
> URL: https://issues.apache.org/jira/browse/YARN-6055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: YARN-6055.000.patch, YARN-6055.001.patch, 
> YARN-6055.002.patch, YARN-6055.003.patch, YARN-6055.004.patch
>
>
> Per Ravi's comments in YARN-4832, we need to check some limits in 
> containerMonitorImpl to make sure it get updated also when Resource updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9569) Auto-created leaf queues do not honor cluster-wide min/max memory/vcores

2019-06-03 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854763#comment-16854763
 ] 

Suma Shivaprasad commented on YARN-9569:


+1. Patch LGTM

> Auto-created leaf queues do not honor cluster-wide min/max memory/vcores
> 
>
> Key: YARN-9569
> URL: https://issues.apache.org/jira/browse/YARN-9569
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
> Attachments: YARN-9569.001.patch, YARN-9569.002.patch
>
>
> Auto-created leaf queues do not honor cluster-wide settings for maximum 
> CPU/vcores allocation.
> To reproduce:
>  # Set auto-create-child-queue.enabled=true for a parent queue.
>  # Set leaf-queue-template.maximum-allocation-mb=16384.
>  # Set yarn.resource-types.memory-mb.maximum-allocation=16384 in 
> resource-types.xml
>  # Launch a YARN app with a container requesting 16 GB RAM.
>  
> This scenario should work, but instead you get an error similar to this:
> {code:java}
> java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger 
> than the cluster setting for queue root.auto.test max allocation per queue: 
>  cluster setting:    {code}
>  
> This seems to be caused by this code in 
> ManagedParentQueue.getLeafQueueConfigs:
> {code:java}
> CapacitySchedulerConfiguration leafQueueConfigTemplate = new
> CapacitySchedulerConfiguration(new Configuration(false), false);{code}
>  
> This initializes a new leaf queue configuration that does not read 
> resource-types.xml (or any other config). Later, this 
> CapacitySchedulerConfiguration instance calls 
> ResourceUtils.fetchMaximumAllocationFromConfig()  from its 
> getMaximumAllocationPerQueue() method and passes itself as the configuration 
> to use. Since the resource types are not present, ResourceUtils falls back to 
> compiled-in defaults of 8GB RAM, 4 cores.
>  
> I was able to work around this with a custom AutoCreatedQueueManagementPolicy 
> implementation which does something like this in init() and reinitialize():
> {code:java}
> for (Map.Entry entry : this.scheduler.getConfiguration()) {
> if (entry.getKey().startsWith("yarn.resource-types")) {
>   parentQueue.getLeafQueueTemplate().getLeafQueueConfigs()
> .set(entry.getKey(), entry.getValue());
>   }
> }
> {code}
> However, this is obviously a very hacky way to solve the problem.
> I can submit a proper patch if someone can provide some direction as to the 
> best way to proceed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854721#comment-16854721
 ] 

Hudson commented on YARN-9580:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16654 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16654/])
YARN-9580. Fulfilled reservation information in assignment is lost when (wwei: 
rev bd2590d71ba1f3db1c686f7afeaf51382f8d8a2f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerMultiNodes.java


> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854701#comment-16854701
 ] 

Weiwei Yang commented on YARN-9580:
---

Committed to trunk. [~Tao Yang], can you please provide a patch for branch-3.2 
too?

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9580.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-06-03 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854697#comment-16854697
 ] 

Weiwei Yang commented on YARN-9580:
---

[~Tao Yang], thanks for the patch, it makes sense to me. +1.

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9580.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9595:
---
Attachment: YARN-9595-001.patch

> FPGA plugin: NullPointerException in 
> FpgaNodeResourceUpdateHandler.updateConfiguredResource()
> -
>
> Key: YARN-9595
> URL: https://issues.apache.org/jira/browse/YARN-9595
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9595-001.patch
>
>
> YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
> {{currentFpgaInfo}} is not set, resulting in an NPE being thrown:
> {noformat}
> 2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
> {noformat}
> The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} 
> if the following condition is true:
> {noformat}
> if (allowed == null || allowed.equalsIgnoreCase(
> YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
>   return list;
> } else if (allowed.matches("(\\d,)*\\d")){
> ...
> {noformat}
> Solution is simple: initialize it in both code-paths.
> Unit tests should be enhanced to verify that it's set properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9595:
---
Description: 
YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
{{currentFpgaInfo}} is not set, resulting in an NPE being thrown:

{noformat}
2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: Service 
NodeManager failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
{noformat}

The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} if 
the following condition is true:

{noformat}
if (allowed == null || allowed.equalsIgnoreCase(
YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
  return list;
} else if (allowed.matches("(\\d,)*\\d")){
...
{noformat}

Solution is simple: initialize it in both code-paths.

Unit tests should be enhanced to verify that it's set properly.

  was:
YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
{{currentFpgaInfo}} is not set, resulting in an NPE being thrown:

{noformat}
2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: Service 
NodeManager failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
{noformat}

The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} if 
the following condition is true:

{noformat}
if (allowed == null || allowed.equalsIgnoreCase(
YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
  return list;
} else if (allowed.matches("(\\d,)*\\d")){
...
{noformat}

Solution is simple, it should always be initialized, just like before.

Unit tests should be enhanced to verify that it's set properly.


> FPGA plugin: NullPointerException in 
> FpgaNodeResourceUpdateHandler.updateConfiguredResource()
> -
>
> Key: YARN-9595
> URL: https://issues.apache.org/jira/browse/YARN-9595
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
> {{currentFpgaInfo}} is not set, resulting in an NPE being thrown:
> {noformat}
> 2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesVia

[jira] [Commented] (YARN-9587) TestDistributedShell#testDSShellWithoutDomainV2 fails intermittently

2019-06-03 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854649#comment-16854649
 ] 

Prabhu Joseph commented on YARN-9587:
-

TestDistributedShell logs are filled with
{code:java}
2019-05-31 17:54:08,972 WARN  [IPC Server Responder] ipc.Server 
(Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468){code}

> TestDistributedShell#testDSShellWithoutDomainV2 fails intermittently
> 
>
> Key: YARN-9587
> URL: https://issues.apache.org/jira/browse/YARN-9587
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell, test
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> *TestDistributedShell#testDSShellWithoutDomainV2 fails intermittently*
> {code}
> ERROR] 
> testDSShellWithoutDomainV2(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 71.567 s  <<< ERROR!
> java.io.FileNotFoundException: File does not exist: 
> /tmp/junit4945469048766836979/junit6000197014225386358/entities/yarn_cluster/jenkins/DistributedShell/1/1559078030997/application_1559078027605_0001/YARN_CONTAINER/container_1559078027605_0001_01_01.thist.tmp
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2377)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2372)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:627)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:459)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:318)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2(TestDistributedShell.java:314)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(F

[jira] [Commented] (YARN-9573) DistributedShell cannot specify LogAggregationContext

2019-06-03 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854648#comment-16854648
 ] 

Prabhu Joseph commented on YARN-9573:
-

No [~adam.antal], will analyze this as part of YARN-9587.

> DistributedShell cannot specify LogAggregationContext
> -
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch, YARN-9573.002.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9595) FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

2019-06-03 Thread Peter Bacsko (JIRA)
Peter Bacsko created YARN-9595:
--

 Summary: FPGA plugin: NullPointerException in 
FpgaNodeResourceUpdateHandler.updateConfiguredResource()
 Key: YARN-9595
 URL: https://issues.apache.org/jira/browse/YARN-9595
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Peter Bacsko
Assignee: Peter Bacsko


YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes 
{{currentFpgaInfo}} is not set, resulting in an NPE being thrown:

{noformat}
2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: Service 
NodeManager failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
{noformat}

The problem is that in {{FpgaDiscoverer}}, we don't set {{currentFpgaInfo}} if 
the following condition is true:

{noformat}
if (allowed == null || allowed.equalsIgnoreCase(
YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
  return list;
} else if (allowed.matches("(\\d,)*\\d")){
...
{noformat}

Solution is simple, it should always be initialized, just like before.

Unit tests should be enhanced to verify that it's set properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9573) DistributedShell cannot specify LogAggregationContext

2019-06-03 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854624#comment-16854624
 ] 

Adam Antal commented on YARN-9573:
--

Added patchset v2 with removed \t characters and with a UT added as [~snemeth] 
suggested.

I'd recommend fixing this because it has taken for me a LOT of overhead to 
search in that 800 MB of log file only because some stack trace is thrown in 
every 2-3 millisec. Do you have an open issue for this, [~Prabhu Joseph]?

> DistributedShell cannot specify LogAggregationContext
> -
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch, YARN-9573.002.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9573) DistributedShell cannot specify LogAggregationContext

2019-06-03 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9573:
-
Attachment: YARN-9573.002.patch

> DistributedShell cannot specify LogAggregationContext
> -
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch, YARN-9573.002.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9557) Application fails in diskchecker when ReadWriteDiskValidator is configured.

2019-06-03 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854573#comment-16854573
 ] 

Bibin A Chundatt commented on YARN-9557:


Thank you [~BilwaST] for patch.

Latest patch looks good to  me. Will wait for a day before committing.

> Application fails in diskchecker when ReadWriteDiskValidator is configured.
> ---
>
> Key: YARN-9557
> URL: https://issues.apache.org/jira/browse/YARN-9557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
> Environment: Configure:
> 
>  yarn.nodemanager.disk-validator
>  read-write
>  
>Reporter: Anuruddh Nayak
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9557-001.patch, YARN-9557-002.patch, 
> YARN-9557-003.patch
>
>
> Application fails to execute successfully when ReadWriteDiskValidator is 
> configured.
> {code:java}
> 
> yarn.nodemanager.disk-validator
> read-write
> 
> {code}
> {noformat}
> Exception thrown while starting Container:
> java.io.IOException: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> Disk Check failed!
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1233)
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Disk Check 
> failed!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:82)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:255)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:312)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198)
>  ... 2 more
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> /opt/HA/AN0805/nmlocal/usercache/dsperf/appcache/application_1557736108162_0009/filecache/11
>  is not a directory!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:50)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9578) Add limit/actions/summarize options for app activities REST API

2019-06-03 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854553#comment-16854553
 ] 

Tao Yang edited comment on YARN-9578 at 6/3/19 12:49 PM:
-

For better serving to users, we should support more options for app activities 
REST API:
 # limit - set a limit of the result, user often need only 1 or a few latest 
activities, this option can lower the cost from both server and client sides in 
some scenarios.
 # actions - the required actions of app activities including UPDATE and GET, 
some scenarios such as app attempt UI would like to refresh and show the latest 
activities, perhaps will trigger updating at first then get latest activities 
after a while, customized actions can drop unnecessary actions to cut the cost.
 # summarize - whether app activities in multiple scheduling processes need to 
be summarized, it's useful when multi-node disabled since only one node can be 
considered in a single scheduling process, enable this may give us a summary 
with diagnostics on all nodes for better debugging.


was (Author: tao yang):
For better serving to users, we should support more options for app activities 
REST API:
 # limit - set a limit of the result, user often need only 1 or a few latest 
activities, this option can lower the cost from both server and client sides in 
some scenarios.
 # actions - the required actions of app activities including UPDATE and GET, 
some scenarios such as app attempt UI would like to refresh and show the latest 
activities, perhaps will trigger updating at first then get latest activities 
after a while, customized actions can drop unnecessary actions to cut the cost.
 # summarize - whether app activities in multiple scheduling processes need to 
be summarized, it's useful when multi-node disabled since only one node can be 
considered in a single scheduling process. 

> Add limit/actions/summarize options for app activities REST API
> ---
>
> Key: YARN-9578
> URL: https://issues.apache.org/jira/browse/YARN-9578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9578.001.patch, YARN-9578.002.patch, 
> YARN-9578.003.patch
>
>
> Currently all completed activities of specified application in cache will be 
> returned for application activities REST API. Most results may be redundant 
> in some scenarios which only need a few latest results, for example, perhaps 
> only one result is needed to be shown on UI for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9525) IFile format is not working against s3a remote folder

2019-06-03 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854556#comment-16854556
 ] 

Adam Antal edited comment on YARN-9525 at 6/3/19 12:47 PM:
---

Setting the rollover size to 0 still fails because of the following reason:

In {{LogAggregationIndexedFileController$initializeWriterInRolling}} when we 
initialize the writer:
{code:java}
// recreate checksum file if needed before aggregate the logs
if (overwriteCheckSum) {
  final long currentAggregatedLogFileLength = fc
  .getFileStatus(aggregatedLogFile).getLen();
  FSDataOutputStream checksumFileOutputStream = null;
  try {
checksumFileOutputStream = fc.create(remoteLogCheckSumFile,
EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE),
new Options.CreateOpts[] {});
String fileName = aggregatedLogFile.getName();
checksumFileOutputStream.writeInt(fileName.length());
checksumFileOutputStream.write(fileName.getBytes(
Charset.forName("UTF-8")));
checksumFileOutputStream.writeLong(
currentAggregatedLogFileLength);
checksumFileOutputStream.flush();
  } finally {
IOUtils.cleanupWithLogger(LOG, checksumFileOutputStream);
  }
{code}

We fail on the getFileStatus, because we want to get the status of the file we 
just wrote, and against try to catch its length. I wonder if we only do this 
because of the length - this information can be calculated while writing, and 
thus there would be no need to query if through 
{{S3AFileSystem$s3GetFileStatus}}.


was (Author: adam.antal):
Setting the rollover size to 0 still fails because of the following reason:

In {{LogAggregationIndexedFileController$initializeWriterInRolling}} we attempt 
the following after writing out the part of the log succesfully:
{code:java}
// recreate checksum file if needed before aggregate the logs
if (overwriteCheckSum) {
  final long currentAggregatedLogFileLength = fc
  .getFileStatus(aggregatedLogFile).getLen();
  FSDataOutputStream checksumFileOutputStream = null;
  try {
checksumFileOutputStream = fc.create(remoteLogCheckSumFile,
EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE),
new Options.CreateOpts[] {});
String fileName = aggregatedLogFile.getName();
checksumFileOutputStream.writeInt(fileName.length());
checksumFileOutputStream.write(fileName.getBytes(
Charset.forName("UTF-8")));
checksumFileOutputStream.writeLong(
currentAggregatedLogFileLength);
checksumFileOutputStream.flush();
  } finally {
IOUtils.cleanupWithLogger(LOG, checksumFileOutputStream);
  }
{code}

We fail on the getFileStatus, because we want to get the status of the file we 
just wrote, and against try to catch its length. I wonder if we only do this 
because of the length - this information can be calculated while writing, and 
thus there would be no need to query if through 
{{S3AFileSystem$s3GetFileStatus}}.

> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runW

[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder

2019-06-03 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854556#comment-16854556
 ] 

Adam Antal commented on YARN-9525:
--

Setting the rollover size to 0 still fails because of the following reason:

In {{LogAggregationIndexedFileController$initializeWriterInRolling}} we attempt 
the following after writing out the part of the log succesfully:
{code:java}
// recreate checksum file if needed before aggregate the logs
if (overwriteCheckSum) {
  final long currentAggregatedLogFileLength = fc
  .getFileStatus(aggregatedLogFile).getLen();
  FSDataOutputStream checksumFileOutputStream = null;
  try {
checksumFileOutputStream = fc.create(remoteLogCheckSumFile,
EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE),
new Options.CreateOpts[] {});
String fileName = aggregatedLogFile.getName();
checksumFileOutputStream.writeInt(fileName.length());
checksumFileOutputStream.write(fileName.getBytes(
Charset.forName("UTF-8")));
checksumFileOutputStream.writeLong(
currentAggregatedLogFileLength);
checksumFileOutputStream.flush();
  } finally {
IOUtils.cleanupWithLogger(LOG, checksumFileOutputStream);
  }
{code}

We fail on the getFileStatus, because we want to get the status of the file we 
just wrote, and against try to catch its length. I wonder if we only do this 
because of the length - this information can be calculated while writing, and 
thus there would be no need to query if through 
{{S3AFileSystem$s3GetFileStatus}}.

> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.y

[jira] [Commented] (YARN-9578) Add limit/actions/summarize options for app activities REST API

2019-06-03 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854553#comment-16854553
 ] 

Tao Yang commented on YARN-9578:


For better serving to users, we should support more options for app activities 
REST API:
 # limit - set a limit of the result, user often need only 1 or a few latest 
activities, this option can lower the cost from both server and client sides in 
some scenarios.
 # actions - the required actions of app activities including UPDATE and GET, 
some scenarios such as app attempt UI would like to refresh and show the latest 
activities, perhaps will trigger updating at first then get latest activities 
after a while, customized actions can drop unnecessary actions to cut the cost.
 # summarize - whether app activities in multiple scheduling processes need to 
be summarized, it's useful when multi-node disabled since only one node can be 
considered in a single scheduling process. 

> Add limit/actions/summarize options for app activities REST API
> ---
>
> Key: YARN-9578
> URL: https://issues.apache.org/jira/browse/YARN-9578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9578.001.patch, YARN-9578.002.patch, 
> YARN-9578.003.patch
>
>
> Currently all completed activities of specified application in cache will be 
> returned for application activities REST API. Most results may be redundant 
> in some scenarios which only need a few latest results, for example, perhaps 
> only one result is needed to be shown on UI for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9578) Add limit/actions/summarize options for app activities REST API

2019-06-03 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9578:
---
Summary: Add limit/actions/summarize options for app activities REST API  
(was: Add limit option to control number of results for app activities REST API)

> Add limit/actions/summarize options for app activities REST API
> ---
>
> Key: YARN-9578
> URL: https://issues.apache.org/jira/browse/YARN-9578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9578.001.patch, YARN-9578.002.patch, 
> YARN-9578.003.patch
>
>
> Currently all completed activities of specified application in cache will be 
> returned for application activities REST API. Most results may be redundant 
> in some scenarios which only need a few latest results, for example, perhaps 
> only one result is needed to be shown on UI for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9557) Application fails in diskchecker when ReadWriteDiskValidator is configured.

2019-06-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854502#comment-16854502
 ] 

Hadoop QA commented on YARN-9557:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9557 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12970663/YARN-9557-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0e1bf7dee046 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 59719dc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24208/testReport/ |
| Max. process+thread count | 307 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24208/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Application fails in diskchecker when

[jira] [Commented] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart

2019-06-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854463#comment-16854463
 ] 

Hudson commented on YARN-8906:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16653 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16653/])
YARN-8906. [UI2] NM hostnames not displayed correctly in Node Heatmap (sunilg: 
rev 59719dc560cf67f485d8e5b4a6f0f38ef97d536b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/nodes-heatmap.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/nodes-heatmap.js


> [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
> 
>
> Key: YARN-8906
> URL: https://issues.apache.org/jira/browse/YARN-8906
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Akhil PB
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: Node_Heatmap_Chart.png, Node_Heatmap_Chart_Fixed.png, 
> YARN-8906.001.patch, YARN-8906.002.patch
>
>
> Hostnames displayed on the Node Heatmap Chart look garbled and are not 
> clearly visible. Attached screenshot.
> cc [~akhilpb]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9557) Application fails in diskchecker when ReadWriteDiskValidator is configured.

2019-06-03 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9557:

Attachment: YARN-9557-003.patch

> Application fails in diskchecker when ReadWriteDiskValidator is configured.
> ---
>
> Key: YARN-9557
> URL: https://issues.apache.org/jira/browse/YARN-9557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
> Environment: Configure:
> 
>  yarn.nodemanager.disk-validator
>  read-write
>  
>Reporter: Anuruddh Nayak
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9557-001.patch, YARN-9557-002.patch, 
> YARN-9557-003.patch
>
>
> Application fails to execute successfully when ReadWriteDiskValidator is 
> configured.
> {code:java}
> 
> yarn.nodemanager.disk-validator
> read-write
> 
> {code}
> {noformat}
> Exception thrown while starting Container:
> java.io.IOException: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> Disk Check failed!
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1233)
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Disk Check 
> failed!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:82)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:255)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:312)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198)
>  ... 2 more
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> /opt/HA/AN0805/nmlocal/usercache/dsperf/appcache/application_1557736108162_0009/filecache/11
>  is not a directory!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:50)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart

2019-06-03 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854440#comment-16854440
 ] 

Sunil Govindan commented on YARN-8906:
--

Thanks [~akhilpb]

> [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
> 
>
> Key: YARN-8906
> URL: https://issues.apache.org/jira/browse/YARN-8906
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Akhil PB
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: Node_Heatmap_Chart.png, Node_Heatmap_Chart_Fixed.png, 
> YARN-8906.001.patch, YARN-8906.002.patch
>
>
> Hostnames displayed on the Node Heatmap Chart look garbled and are not 
> clearly visible. Attached screenshot.
> cc [~akhilpb]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart

2019-06-03 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854439#comment-16854439
 ] 

Sunil Govindan commented on YARN-8906:
--

+1

> [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
> 
>
> Key: YARN-8906
> URL: https://issues.apache.org/jira/browse/YARN-8906
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Akhil PB
>Priority: Major
> Attachments: Node_Heatmap_Chart.png, Node_Heatmap_Chart_Fixed.png, 
> YARN-8906.001.patch, YARN-8906.002.patch
>
>
> Hostnames displayed on the Node Heatmap Chart look garbled and are not 
> clearly visible. Attached screenshot.
> cc [~akhilpb]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9594) Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED

2019-06-03 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854436#comment-16854436
 ] 

Bibin A Chundatt commented on YARN-9594:


Good catch .. +1 for the patch 

> Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED
> --
>
> Key: YARN-9594
> URL: https://issues.apache.org/jira/browse/YARN-9594
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-9594_1.patch
>
>
> It seems that we miss a break in switch-case
> {code:java}
> case RECOVERY_COMPLETED:
>   startPendingContainers(maxOppQueueLength <= 0);
>   metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
>  queuedGuaranteedContainers.size());
> //break;missed
> default:
>   LOG.error("Unknown event arrived at ContainerScheduler: "
> + event.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9594) Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED

2019-06-03 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-9594:

Description: 
It seems that we miss a break in switch-case
{code:java}
case RECOVERY_COMPLETED:
  startPendingContainers(maxOppQueueLength <= 0);
  metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
 queuedGuaranteedContainers.size());
//break;missed
default:
  LOG.error("Unknown event arrived at ContainerScheduler: "
+ event.toString());
{code}

  was:
It seems that we miss a break in switch-case
{code:java}
case RECOVERY_COMPLETED:
  startPendingContainers(maxOppQueueLength <= 0);
  metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
 queuedGuaranteedContainers.size());
//break;
default:
  LOG.error("Unknown event arrived at ContainerScheduler: "
+ event.toString());
{code}


> Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED
> --
>
> Key: YARN-9594
> URL: https://issues.apache.org/jira/browse/YARN-9594
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-9594_1.patch
>
>
> It seems that we miss a break in switch-case
> {code:java}
> case RECOVERY_COMPLETED:
>   startPendingContainers(maxOppQueueLength <= 0);
>   metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
>  queuedGuaranteedContainers.size());
> //break;missed
> default:
>   LOG.error("Unknown event arrived at ContainerScheduler: "
> + event.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9594) Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED

2019-06-03 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-9594:

Description: 
It seems that we miss a break in switch-case
{code:java}
case RECOVERY_COMPLETED:
  startPendingContainers(maxOppQueueLength <= 0);
  metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
 queuedGuaranteedContainers.size());
//break;
default:
  LOG.error("Unknown event arrived at ContainerScheduler: "
+ event.toString());
{code}

  was:
Seem that we forget a break
{code:java}
case RECOVERY_COMPLETED:
  startPendingContainers(maxOppQueueLength <= 0);
  metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
 queuedGuaranteedContainers.size());
//break;
default:
  LOG.error("Unknown event arrived at ContainerScheduler: "
+ event.toString());
{code}


> Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED
> --
>
> Key: YARN-9594
> URL: https://issues.apache.org/jira/browse/YARN-9594
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-9594_1.patch
>
>
> It seems that we miss a break in switch-case
> {code:java}
> case RECOVERY_COMPLETED:
>   startPendingContainers(maxOppQueueLength <= 0);
>   metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
>  queuedGuaranteedContainers.size());
> //break;
> default:
>   LOG.error("Unknown event arrived at ContainerScheduler: "
> + event.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8199) Logging fileSize of log files under NM Local Dir

2019-06-03 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854350#comment-16854350
 ] 

Prabhu Joseph commented on YARN-8199:
-

[~giovanni.fumarola] Thanks for checking this.

1. I think Application Team can control their log files with Log4j settings - 
size based or time based. They can also configure rolling aggregation for log 
files to HDFS when the job is running and so NM removes the rolled files from 
Local Directory. The Cluster wide option to control / truncate log files can 
affect certain jobs which are Long Running and Jobs which are DEBUG enabled. 

2. This debug option is to help the Administrators who tries to debug the 
application which consumes more space in NM Log Directory. A application which 
currently writes verbose log by 
mistake is tough to debug by an Admin. This patch helps to easily find the same
by grepping the NM log files.


> Logging fileSize of log files under NM Local Dir
> 
>
> Key: YARN-8199
> URL: https://issues.apache.org/jira/browse/YARN-8199
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: 0001-YARN-8199.patch, 0002-YARN-8199.patch, 
> YARN-8199-003.patch
>
>
> Logging fileSize of log files like syslog, stderr, stdout under NM Local Dir 
> by NodeManager before the cleanup will help to find the application which has 
> written too verbose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8499) ATS v2 Generic TimelineStorageMonitor

2019-06-03 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-8499:

Attachment: YARN-8499-010.patch

> ATS v2 Generic TimelineStorageMonitor
> -
>
> Key: YARN-8499
> URL: https://issues.apache.org/jira/browse/YARN-8499
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Sunil Govindan
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: atsv2
> Attachments: YARN-8499-001.patch, YARN-8499-002.patch, 
> YARN-8499-003.patch, YARN-8499-004.patch, YARN-8499-005.patch, 
> YARN-8499-006.patch, YARN-8499-007.patch, YARN-8499-008.patch, 
> YARN-8499-009.patch, YARN-8499-010.patch
>
>
> Post YARN-8302, Hbase connection issues are handled in ATSv2. However this 
> could be made general by introducing an api in storage interface and 
> implementing in each of the storage as per the store semantics.
>  
> cc [~rohithsharma] [~vinodkv] [~vrushalic]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8947) [UI2] Active User info missing from UI2

2019-06-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854281#comment-16854281
 ] 

Hudson commented on YARN-8947:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16652 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16652/])
YARN-8947. [UI2] Active User info missing from UI2. Contributed by Akhil 
(sunilg: rev 7f46dda513fb79c349acb73bdb90b689df9cc18d)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-queue/capacity-queue.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/styles/app.scss
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-user.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-queue/apps.hbs


> [UI2] Active User info missing from UI2
> ---
>
> Key: YARN-8947
> URL: https://issues.apache.org/jira/browse/YARN-8947
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: Active_User_Info_RM_UI1.png, 
> Active_User_Info_RM_UI2_Fixed.png, Active_User_Info_RM_UI2_Fixed_2.png, 
> YARN-8947.001.patch, YARN-8947.002.patch, YARN-8947.003.patch, 
> YARN-8947.004.patch
>
>
> UI1 Scheduler section has Active User info. Where it shows Active users and 
> Application scheduled.
> UI2 is missing that information. There is no way to get a summary of apps as 
> per User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org