[jira] [Commented] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-12-09 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456719#comment-17456719 ] Eric Badger commented on YARN-11018: I think [~epayne] is probably more qualified to review this

[jira] [Commented] (YARN-9818) test_docker_util.cc:test_add_mounts doesn't correctly test for parent dir of container-executor.cfg

2021-10-11 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427319#comment-17427319 ] Eric Badger commented on YARN-9818: --- I believe when you do a native build there is a file created called

[jira] [Commented] (YARN-9818) test_docker_util.cc:test_add_mounts doesn't correctly test for parent dir of container-executor.cfg

2021-10-11 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427309#comment-17427309 ] Eric Badger commented on YARN-9818: --- This is from a few years ago so I don't quite remember the details,

[jira] [Commented] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.

2021-09-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419340#comment-17419340 ] Eric Badger commented on YARN-10935: Also thanks to [~ahussein] for the additional review! > AM

[jira] [Updated] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.

2021-09-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10935: --- Fix Version/s: 3.1.5 3.2.4 2.10.2 Thanks for the additional

[jira] [Updated] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.

2021-09-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10935: --- Fix Version/s: 3.3.2 3.4.0 [~epayne], looks like it's clean back to branch-3.3.

[jira] [Commented] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.

2021-09-14 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415243#comment-17415243 ] Eric Badger commented on YARN-10935: [~epayne], +1 the patch looks good to me. However, trunk

[jira] [Commented] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-22 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385635#comment-17385635 ] Eric Badger commented on YARN-10860: Thanks, [~zhuqi]! > Make max container per heartbeat configs

[jira] [Commented] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-21 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385024#comment-17385024 ] Eric Badger commented on YARN-10860: [~zhuqi], thanks for the review and commit! And thanks

[jira] [Commented] (YARN-10867) YARN should expose a ENV used to map a custom device into docker container

2021-07-20 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384513#comment-17384513 ] Eric Badger commented on YARN-10867:

[jira] [Updated] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-20 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10860: --- Attachment: (was: YARN-10860.001.patch) > Make max container per heartbeat configs refreshable >

[jira] [Updated] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-20 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10860: --- Attachment: YARN-10860.001.patch > Make max container per heartbeat configs refreshable >

[jira] [Updated] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10860: --- Attachment: YARN-10860.branch-2.10.001.patch > Make max container per heartbeat configs refreshable

[jira] [Updated] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10860: --- Attachment: YARN-10860.001.patch > Make max container per heartbeat configs refreshable >

[jira] [Created] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-19 Thread Eric Badger (Jira)
Eric Badger created YARN-10860: -- Summary: Make max container per heartbeat configs refreshable Key: YARN-10860 URL: https://issues.apache.org/jira/browse/YARN-10860 Project: Hadoop YARN Issue

[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340347#comment-17340347 ] Eric Badger commented on YARN-10761: Thanks for the patch, [~zhuqi]. Is there a reason we need to

[jira] [Commented] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

2021-05-05 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339816#comment-17339816 ] Eric Badger commented on YARN-10745: Hi [~dmmkr], thanks for the patch. Overall I think it has

[jira] [Commented] (YARN-10648) NM local logs are not cleared after uploading to hdfs

2021-05-04 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339350#comment-17339350 ] Eric Badger commented on YARN-10648: The patch looks good, but I'll wait for [~grepas], [~rkanter],

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-29 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335804#comment-17335804 ] Eric Badger commented on YARN-9927: --- {noformat} +// Test multi thread dispatcher +

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-29 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335726#comment-17335726 ] Eric Badger commented on YARN-10707: Thanks for the updates, [~zhuqi]! +1 I've committed this to

[jira] [Updated] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-29 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10707: --- Fix Version/s: 3.3.1 3.4.0 > Support custom resources in ResourceUtilization, and

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-04-27 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333560#comment-17333560 ] Eric Badger commented on YARN-10493: bq. In theory we could change that if there is a benefit in your

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-27 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333520#comment-17333520 ] Eric Badger commented on YARN-10707: Thanks for the update, [~zhuqi]! The content looks good, I just

[jira] [Commented] (YARN-7713) Add parallel copying of directories into FSDownload

2021-04-27 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333483#comment-17333483 ] Eric Badger commented on YARN-7713: --- Thanks for taking this up, [~ChrisKarampeazis]. I noticed that you

[jira] [Assigned] (YARN-7713) Add parallel copying of directories into FSDownload

2021-04-27 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reassigned YARN-7713: - Assignee: Christos Karampeazis-Papadakis > Add parallel copying of directories into FSDownload >

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-04-27 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1773#comment-1773 ] Eric Badger commented on YARN-10493: What I'm saying on the split thing is that in the current state

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-26 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332739#comment-17332739 ] Eric Badger commented on YARN-10707: Thanks for the updated patch, [~zhuqi]! It's much cleaner and

[jira] [Updated] (YARN-10749) Can't remove all node labels after add node label without nodemanager port, broken by YARN-10647

2021-04-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10749: --- Fix Version/s: 3.2.3 2.10.2 3.1.5 3.3.1

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-04-20 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326150#comment-17326150 ] Eric Badger commented on YARN-10493: Thanks for the latest patch. I tested out the patch along with

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-20 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326097#comment-17326097 ] Eric Badger commented on YARN-10707: Thanks for the patch, [~zhuqi]. To decrease the size of the

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-20 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326048#comment-17326048 ] Eric Badger commented on YARN-10743: I don't really have a big issue with adding this as an option

[jira] [Updated] (YARN-10723) Change CS nodes page in UI to support custom resource.

2021-04-20 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10723: --- Fix Version/s: 3.2.3 3.1.5 3.3.1 3.4.0

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10460: --- Fix Version/s: 2.10.2 3.1.5 Thanks for the review, [~Jim_Brennan]. The spotbugs

[jira] [Commented] (YARN-10715) Remove hardcoded resource values (e.g. GPU/FPGA) in code.

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325387#comment-17325387 ] Eric Badger commented on YARN-10715: Finally getting around to looking at this and I don't think

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325343#comment-17325343 ] Eric Badger commented on YARN-10743: I have the same concern as [~Jim_Brennan]. If the flink logs are

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325329#comment-17325329 ] Eric Badger commented on YARN-10460: Posting a branch-2.10 patch that doesn't use a lambda expression

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10460: --- Attachment: YARN-10460-branch-2.10.002.patch > Upgrading to JUnit 4.13 causes tests in

[jira] [Comment Edited] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325291#comment-17325291 ] Eric Badger edited comment on YARN-10460 at 4/19/21, 8:26 PM: -- Thanks for

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10460: --- Fix Version/s: 3.2.3 Thanks for the review, [~Jim_Brennan]! I've committed the 3.2 patch to

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325231#comment-17325231 ] Eric Badger commented on YARN-10460: The unit tests seem unrelated and don't fail for me locally.

[jira] [Commented] (YARN-10723) Change CS nodes page in UI to support custom resource.

2021-04-19 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325210#comment-17325210 ] Eric Badger commented on YARN-10723: Looks like it still never ran. [~zhuqi], can you re-upload the

[jira] [Commented] (YARN-10723) Change CS nodes page in UI to support custom resource.

2021-04-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324097#comment-17324097 ] Eric Badger commented on YARN-10723: Precommit never ran on the latest patch, so I cancelled the

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324085#comment-17324085 ] Eric Badger commented on YARN-10460: Reopening and attaching a patch for branch-3.2 that puts

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10460: --- Attachment: YARN-10460-branch-3.2.002.patch > Upgrading to JUnit 4.13 causes tests in

[jira] [Reopened] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened YARN-10460: > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail >

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10460: --- I backported this to branch-3.3. There's a merge conflict with branch-3.2 that I'm looking into.

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10460: --- Fix Version/s: 3.3.1 > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail >

[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-04-09 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10503: --- Fix Version/s: 3.3.1 Thanks for the patch, [~zhuqi]. +1 committed to branch-3.3. This has now been

[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-04-08 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10503: --- Fix Version/s: 3.4.0 Thanks for the updates, [~zhuqi]. +1 on patch 10. And thanks for the reviews,

[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-04-08 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10702: --- Fix Version/s: 3.2.3 3.1.5 Thanks for the additional patches, [~Jim_Brennan]. I

[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-04-06 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10702: --- Fix Version/s: 3.3.1 3.4.0 Thanks for the patch, [~Jim_Brennan]. I've committed

[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-04-05 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315191#comment-17315191 ] Eric Badger commented on YARN-10702: [~Jim_Brennan], thanks for the patch. +1 I've committed this to

[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-29 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10501: --- Fix Version/s: 2.10.2 Thanks for the patch/patience [~caozhiqiang]. Finally HadoopQA is back to

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-26 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309788#comment-17309788 ] Eric Badger commented on YARN-10503: Thanks for the update, [~zhuqi]. This might be a little too

[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-26 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309640#comment-17309640 ] Eric Badger commented on YARN-10501: bq. Backporting HADOOP-16870 to branch-2.10 should mitigate this

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-25 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309058#comment-17309058 ] Eric Badger commented on YARN-10503: Thanks for the patch, [~zhuqi]! Here are a few comments

[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-25 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309028#comment-17309028 ] Eric Badger commented on YARN-10501: [~aajisaka], can you help out here? The Yetus bug is blocking

[jira] [Updated] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.

2021-03-25 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10713: --- Fix Version/s: 3.3.1 3.4.0 Thanks for the patch, [~zhuqi]. I tested this out on

[jira] [Commented] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.

2021-03-25 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308817#comment-17308817 ] Eric Badger commented on YARN-10713: [~zhuqi], I very much appreciate the patches and am trying to

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-24 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308187#comment-17308187 ] Eric Badger commented on YARN-10503: I'm fine with moving the effort of removing hardcoded resource

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-24 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308180#comment-17308180 ] Eric Badger commented on YARN-10493: I did have a weird umask set. Reverting back to the default

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-24 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308173#comment-17308173 ] Eric Badger commented on YARN-10493: Hmm, must be a default umask issue or something on my testing

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-24 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308169#comment-17308169 ] Eric Badger commented on YARN-10493: {noformat} [ebadger@foo hadoop]$ hadoop fs -ls /runc-root/*/*/*

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-24 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308167#comment-17308167 ] Eric Badger commented on YARN-10493: {noformat} 2021-03-24 20:21:56,225 WARN [Public Localizer]

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-24 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308165#comment-17308165 ] Eric Badger commented on YARN-10493: Yea, I think that would be a good improvement to the plugin

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307481#comment-17307481 ] Eric Badger commented on YARN-10493: Additionally, I've run into some issues while testing.

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307467#comment-17307467 ] Eric Badger commented on YARN-10493: [~MatthewSharp], thanks for the PR. Just starting to take a look

[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307457#comment-17307457 ] Eric Badger commented on YARN-10517: [~epayne], this change looks reasonable to me, but I'd like to

[jira] [Commented] (YARN-10707) Support gpu in ResourceUtilization, and update Node GPU Utilization to use.

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307427#comment-17307427 ] Eric Badger commented on YARN-10707: Similar to my

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307421#comment-17307421 ] Eric Badger commented on YARN-10503: bq. Do we want to treat GPUs and FPGAs like that? In other parts

[jira] [Comment Edited] (YARN-9618) NodeListManager event improvement

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307413#comment-17307413 ] Eric Badger edited comment on YARN-9618 at 3/23/21, 8:52 PM: - bq. Actually,

[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307413#comment-17307413 ] Eric Badger commented on YARN-9618: --- bq. Actually, why we use an other async dispatcher here is try to

[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-23 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307379#comment-17307379 ] Eric Badger commented on YARN-10704: I'm not very familiar with the new YARN UI v2. Will this change

[jira] [Updated] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10701: --- Fix Version/s: 3.3.1 3.4.0 +1. Thanks for the patch, [~zhuqi]. I've committed

[jira] [Comment Edited] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304456#comment-17304456 ] Eric Badger edited comment on YARN-10616 at 3/18/21, 9:22 PM: -- The issue

[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304456#comment-17304456 ] Eric Badger commented on YARN-10616: The issue with graceful decommissioning is that you have to edit

[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304333#comment-17304333 ] Eric Badger commented on YARN-10495: I would suggest using a dockerfile with the same OS version as

[jira] [Updated] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10703: --- Fix Version/s: 3.3.1 I've also committed this to branch-3.3. This has now been committed to trunk

[jira] [Updated] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10692: --- Fix Version/s: 3.3.1 I cherry-picked this to branch-3.3 I would like all of the GPU stuff to go back

[jira] [Commented] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304313#comment-17304313 ] Eric Badger commented on YARN-10703: +1 I've committed this to trunk (3.4) > Fix potential null

[jira] [Updated] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10703: --- Fix Version/s: 3.4.0 > Fix potential null pointer error of gpuNodeResourceUpdateHandler in >

[jira] [Updated] (YARN-10688) ClusterMetrics should support GPU capacity related metrics.

2021-03-17 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10688: --- Fix Version/s: 3.2.3 3.3.1 3.4.0 Thanks for the updated patch,

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302956#comment-17302956 ] Eric Badger commented on YARN-10503: One initial question I have is whether we should generalize this

[jira] [Commented] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.

2021-03-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302931#comment-17302931 ] Eric Badger commented on YARN-10692: [~zhuqi], it looks like the unit test failure from Hadoop QA is

[jira] [Commented] (YARN-10688) ClusterMetrics should support GPU capacity related metrics.

2021-03-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302893#comment-17302893 ] Eric Badger commented on YARN-10688: {noformat} @Metric("Vcore Utilization") MutableGaugeLong

[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302864#comment-17302864 ] Eric Badger commented on YARN-10616: bq. For the "updateNodeResource" issue, one question is that is

[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302860#comment-17302860 ] Eric Badger commented on YARN-9618: --- The patch looks reasonable to me. Agree with [~gandras] that some

[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302782#comment-17302782 ] Eric Badger commented on YARN-10501: [~aajisaka], [~ahussein], most recent builds are failing due to

[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2021-03-16 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302761#comment-17302761 ] Eric Badger commented on YARN-10495: [~angerszhu], I don't think it's a good idea to ship glibc with

[jira] [Commented] (YARN-10690) ClusterMetrics should support GPU utilization related metrics.

2021-03-15 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302009#comment-17302009 ] Eric Badger commented on YARN-10690: [~zhuqi], can we convert the related JIRAs to be subtasks of

[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-15 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17301993#comment-17301993 ] Eric Badger commented on YARN-10501: [~caozhiqiang], it doesn't need to be merged to 2.10.1. It has

[jira] [Commented] (YARN-10688) ClusterMetrics should support GPU capacity related metrics.

2021-03-15 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17301987#comment-17301987 ] Eric Badger commented on YARN-10688: [~zhuqi], thanks for the updated patch. To make things a little

[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable

2021-03-15 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10495: --- Fix Version/s: 3.3.1 [~angerszhu], I backported this to branch-3.3. There's a conflict past that. If

[jira] [Commented] (YARN-10688) ClusterMetrics should support GPU related metrics.

2021-03-11 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299824#comment-17299824 ] Eric Badger commented on YARN-10688: {noformat} 2021-03-11 19:25:11,183 ERROR

[jira] [Commented] (YARN-10688) ClusterMetrics should support GPU related metrics.

2021-03-11 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299814#comment-17299814 ] Eric Badger commented on YARN-10688: {noformat} + Integer gpuIndex =

[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-11 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299772#comment-17299772 ] Eric Badger commented on YARN-10501: [~aajisaka], looks like the precommit is still failing to

[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-09 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298253#comment-17298253 ] Eric Badger commented on YARN-10501: [~ahussein], [~aajisaka], is this due to any of the recent yetus

[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-08 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297684#comment-17297684 ] Eric Badger commented on YARN-10501: Reopening and submitting patch so that Hadoop QA will run >

[jira] [Reopened] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-08 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened YARN-10501: > Can't remove all node labels after add node label without nodemanager port >

[jira] [Updated] (YARN-10664) Allow parameter expansion in NM_ADMIN_USER_ENV

2021-03-08 Thread Eric Badger (Jira)
[ https://issues.apache.org/jira/browse/YARN-10664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10664: --- Fix Version/s: 3.2.3 Thanks for the patch, [~Jim_Brennan]! +1 from me. The checkstyle warning should

  1   2   3   4   5   6   7   8   9   10   >