[jira] [Updated] (YARN-4589) Diagnostics for localization timeouts is lacking

2021-01-13 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-4589: -- Attachment: YARN-4589-branch-3.2.001.patch > Diagnostics for localization timeouts is lacking >

[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking

2021-01-13 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264306#comment-17264306 ] Jim Brennan commented on YARN-4589: --- Thanks [~epayne]!  I will put up a patch for branch-3.2.   >

[jira] [Updated] (YARN-4589) Diagnostics for localization timeouts is lacking

2021-01-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-4589: -- Attachment: YARN-4589.005.patch > Diagnostics for localization timeouts is lacking >

[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking

2021-01-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263700#comment-17263700 ] Jim Brennan commented on YARN-4589: --- patch 005 removes the extra file.   > Diagnostics for

[jira] [Updated] (YARN-10562) Follow up changes for YARN-9833

2021-01-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10562: --- Summary: Follow up changes for YARN-9833 (was: Alternate fix for DirectoryCollection.checkDirs()

[jira] [Commented] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263696#comment-17263696 ] Jim Brennan commented on YARN-10562: Patch 004 replaces the {{CopyOnWriteArrayLists}} with

[jira] [Updated] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10562: --- Attachment: YARN-10562.004.patch > Alternate fix for DirectoryCollection.checkDirs() race >

[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking

2021-01-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263691#comment-17263691 ] Jim Brennan commented on YARN-4589: --- I don't think I need to add a unit test for this, as it is only

[jira] [Commented] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263462#comment-17263462 ] Jim Brennan commented on YARN-10562: Thanks for the discussion and comment [~ebadger]! I agree that

[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking

2021-01-11 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262933#comment-17262933 ] Jim Brennan commented on YARN-4589: --- Forgot to attach the patch. Doh! > Diagnostics for localization

[jira] [Updated] (YARN-4589) Diagnostics for localization timeouts is lacking

2021-01-11 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-4589: -- Attachment: YARN-4589.004.patch > Diagnostics for localization timeouts is lacking >

[jira] [Updated] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10562: --- Attachment: YARN-10562.003.patch > Alternate fix for DirectoryCollection.checkDirs() race >

[jira] [Commented] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261403#comment-17261403 ] Jim Brennan commented on YARN-10562: patch 003 fixes the new checkstyle issues. > Alternate fix for

[jira] [Commented] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-07 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260614#comment-17260614 ] Jim Brennan commented on YARN-10562: Submitted patch 002 to fix the checkstyle issues and add unit

[jira] [Updated] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-07 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10562: --- Attachment: YARN-10562.002.patch > Alternate fix for DirectoryCollection.checkDirs() race >

[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2021-01-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260089#comment-17260089 ] Jim Brennan commented on YARN-9833: --- [~ebadger], [~pbacsko], I filed a new Jira so I could put up the

[jira] [Updated] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10562: --- Attachment: (was: YARN-9833.001.patch) > Alternate fix for DirectoryCollection.checkDirs() race

[jira] [Created] (YARN-10562) Alternate fix for DirectoryCollection.checkDirs() race

2021-01-06 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10562: -- Summary: Alternate fix for DirectoryCollection.checkDirs() race Key: YARN-10562 URL: https://issues.apache.org/jira/browse/YARN-10562 Project: Hadoop YARN Issue

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-23 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254133#comment-17254133 ] Jim Brennan commented on YARN-10540: Thanks [~sunilg]! > Node page is broken in YARN UI1 and UI2

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-22 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253560#comment-17253560 ] Jim Brennan commented on YARN-10540: Thanks [~ebadger]!  And thanks [~hexiaoqiao],  [~ayushtkn] and

[jira] [Created] (YARN-10542) Node Utilization on UI is misleading if nodes don't report utilization

2020-12-21 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10542: -- Summary: Node Utilization on UI is misleading if nodes don't report utilization Key: YARN-10542 URL: https://issues.apache.org/jira/browse/YARN-10542 Project: Hadoop

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253117#comment-17253117 ] Jim Brennan commented on YARN-10540: I filed YARN-10542 as a follow-up. > Node page is broken in

[jira] [Assigned] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned YARN-10540: -- Assignee: Jim Brennan I have attached a patch that initializes nodeUtilization to a

[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10540: --- Attachment: YARN-10540.001.patch > Node page is broken in YARN UI1 and UI2 including RMWebService

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253090#comment-17253090 ] Jim Brennan commented on YARN-10540: I have manually reproduced this in trunk on a VM by setting the

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253022#comment-17253022 ] Jim Brennan commented on YARN-10540: Simpler fix might be to initialize {{nodeUtilization}} in

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253018#comment-17253018 ] Jim Brennan commented on YARN-10540: Just to clarify, we are seeing this only on Mac in branch-3.2.2?

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253003#comment-17253003 ] Jim Brennan commented on YARN-10540: Trying to figure out why we might get an NPE in this case. One

[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-14 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249316#comment-17249316 ] Jim Brennan commented on YARN-9833: --- {quote} My worry with this is that code changes in the future will

[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-14 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249043#comment-17249043 ] Jim Brennan commented on YARN-9833: --- Thinking about it more over the weekend, I suspect that the reason

[jira] [Commented] (YARN-10494) CLI tool for docker-to-squashfs conversion (pure Java)

2020-12-11 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248173#comment-17248173 ] Jim Brennan commented on YARN-10494: [~ccondit], [~ebadger] I am OK with including this for now.  >

[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-10 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247304#comment-17247304 ] Jim Brennan commented on YARN-9833: --- Did you consider that changing from a view to a copy changes the

[jira] [Commented] (YARN-10494) CLI tool for docker-to-squashfs conversion (pure Java)

2020-12-03 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243242#comment-17243242 ] Jim Brennan commented on YARN-10494: I'm not sure that a PR is better than a patch for something this

[jira] [Commented] (YARN-8558) NM recovery level db not cleaned up properly on container finish

2020-11-17 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233767#comment-17233767 ] Jim Brennan commented on YARN-8558: --- I have committed this to branch-2.10.   > NM recovery level db

[jira] [Reopened] (YARN-8558) NM recovery level db not cleaned up properly on container finish

2020-11-17 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reopened YARN-8558: --- Re-opening so I can put up a patch for branch-2.10. > NM recovery level db not cleaned up properly on

[jira] [Resolved] (YARN-10485) TimelineConnector swallows InterruptedException

2020-11-16 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan resolved YARN-10485. Fix Version/s: 3.2.3 3.4.1 3.1.5 3.3.1

[jira] [Commented] (YARN-10485) TimelineConnector swallows InterruptedException

2020-11-16 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232803#comment-17232803 ] Jim Brennan commented on YARN-10485: Apologies, I marked this resolved by accident.  Got my tabs

[jira] [Updated] (YARN-10485) TimelineConnector swallows InterruptedException

2020-11-16 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10485: --- Fix Version/s: (was: 3.2.3) (was: 3.4.1) (was:

[jira] [Reopened] (YARN-10485) TimelineConnector swallows InterruptedException

2020-11-16 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reopened YARN-10485: > TimelineConnector swallows InterruptedException > --- >

[jira] [Updated] (YARN-10485) TimelineConnector swallows InterruptedException

2020-11-13 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10485: --- Fix Version/s: 3.2.3 3.4.1 3.1.5 3.3.1 >

[jira] [Resolved] (YARN-10485) TimelineConnector swallows InterruptedException

2020-11-13 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan resolved YARN-10485. Resolution: Fixed > TimelineConnector swallows InterruptedException >

[jira] [Commented] (YARN-8558) NM recovery level db not cleaned up properly on container finish

2020-11-13 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231778#comment-17231778 ] Jim Brennan commented on YARN-8558: --- Any objection to pulling this back to branch-2.10? It looks like

[jira] [Commented] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-05 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227069#comment-17227069 ] Jim Brennan commented on YARN-10479: Thanks [~epayne]! > RMProxy should retry on SocketTimeout

[jira] [Commented] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-05 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226770#comment-17226770 ] Jim Brennan commented on YARN-10479: All of the failed unit tests also fail in trunk, due to the

[jira] [Commented] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-04 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226303#comment-17226303 ] Jim Brennan commented on YARN-10479: I believe most of the YARN failures are unrelated to this

[jira] [Commented] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-04 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226271#comment-17226271 ] Jim Brennan commented on YARN-10479: patch 003 fixes the checkstyle issues. > RMProxy should retry

[jira] [Updated] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-04 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10479: --- Attachment: YARN-10479.003.patch > RMProxy should retry on SocketTimeout Exceptions >

[jira] [Commented] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-03 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225682#comment-17225682 ] Jim Brennan commented on YARN-10479: I added a test case in patch 002. > RMProxy should retry on

[jira] [Updated] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-03 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10479: --- Attachment: YARN-10479.002.patch > RMProxy should retry on SocketTimeout Exceptions >

[jira] [Updated] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-03 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10479: --- Attachment: YARN-10479.001.patch > RMProxy should retry on SocketTimeout Exceptions >

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-11-02 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224921#comment-17224921 ] Jim Brennan commented on YARN-10475: Thanks [~epayne]! > Scale RM-NM heartbeat interval based on

[jira] [Created] (YARN-10479) RMProxy should retry on SocketTimeout Exceptions

2020-11-02 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10479: -- Summary: RMProxy should retry on SocketTimeout Exceptions Key: YARN-10479 URL: https://issues.apache.org/jira/browse/YARN-10479 Project: Hadoop YARN Issue Type:

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-11-02 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224717#comment-17224717 ] Jim Brennan commented on YARN-10475: I have filed [YARN-10478] for making this pluggable. > Scale

[jira] [Created] (YARN-10478) Make RM-NM heartbeat scaling calculator pluggable

2020-11-02 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10478: -- Summary: Make RM-NM heartbeat scaling calculator pluggable Key: YARN-10478 URL: https://issues.apache.org/jira/browse/YARN-10478 Project: Hadoop YARN Issue

[jira] [Updated] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10475: --- Attachment: YARN-10475-branch-3.2.003.patch > Scale RM-NM heartbeat interval based on node

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223733#comment-17223733 ] Jim Brennan commented on YARN-10475: [~epayne], I have put up patches for branch-3.3 and branch-3.2

[jira] [Updated] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10475: --- Attachment: YARN-10475-branch-3.3.003.patch > Scale RM-NM heartbeat interval based on node

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223668#comment-17223668 ] Jim Brennan commented on YARN-10475: Thanks for the suggestion [~bibinchundatt]! I think a plugin

[jira] [Commented] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-30 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223656#comment-17223656 ] Jim Brennan commented on YARN-10471: [~epayne] I agree we don't need to go to branch-3.1 nor

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-29 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223217#comment-17223217 ] Jim Brennan commented on YARN-10475: Thanks [~epayne]! I put up patch 003, which adds documentation

[jira] [Updated] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-29 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10475: --- Attachment: YARN-10475.003.patch > Scale RM-NM heartbeat interval based on node utilization >

[jira] [Commented] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-29 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223195#comment-17223195 ] Jim Brennan commented on YARN-10471: Thanks [~epayne]! I have committed this to trunk, branch-3.3,

[jira] [Updated] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-29 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10471: --- Fix Version/s: 3.2.3 3.4.1 3.3.1 > Prevent logs for any

[jira] [Resolved] (YARN-10477) runc launch failure should not cause nodemanager to go unhealthy

2020-10-28 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan resolved YARN-10477. Resolution: Invalid Closing this as invalid. The problem was only there in our internal version

[jira] [Commented] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-28 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222409#comment-17222409 ] Jim Brennan commented on YARN-10471: Thanks [~epayne]! It looks like there is a problem in the last

[jira] [Created] (YARN-10477) runc launch failure should not cause nodemanager to go unhealthy

2020-10-28 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10477: -- Summary: runc launch failure should not cause nodemanager to go unhealthy Key: YARN-10477 URL: https://issues.apache.org/jira/browse/YARN-10477 Project: Hadoop YARN

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-28 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1794#comment-1794 ] Jim Brennan commented on YARN-10475: I put up patch 002 to address checkstyle/javac issues. > Scale

[jira] [Updated] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-28 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10475: --- Attachment: YARN-10475.002.patch > Scale RM-NM heartbeat interval based on node utilization >

[jira] [Commented] (YARN-10467) ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers

2020-10-28 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1735#comment-1735 ] Jim Brennan commented on YARN-10467: Thanks for reporting this and for the solution [~haibochen]!   

[jira] [Commented] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-28 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1723#comment-1723 ] Jim Brennan commented on YARN-10471: Thanks for putting this up [~epayne]!   I am +1 on patches for

[jira] [Updated] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-27 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10475: --- Attachment: YARN-10475.001.patch > Scale RM-NM heartbeat interval based on node utilization >

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-27 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221774#comment-17221774 ] Jim Brennan commented on YARN-10475: This adds the following {{yarn.resourcemanager.nodemanagers}}

[jira] [Created] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-27 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10475: -- Summary: Scale RM-NM heartbeat interval based on node utilization Key: YARN-10475 URL: https://issues.apache.org/jira/browse/YARN-10475 Project: Hadoop YARN

[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-19 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216747#comment-17216747 ] Jim Brennan commented on YARN-10450: Thanks [~ebadger]! > Add cpu and memory utilization per node

[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-15 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214879#comment-17214879 ] Jim Brennan commented on YARN-10450: Thanks [~ebadger]!  I've attached patches for branches 3.2, 3.1,

[jira] [Updated] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-15 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10450: --- Attachment: YARN-10450-branch-3.2.003.patch YARN-10450-branch-3.1.003.patch

[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-15 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214760#comment-17214760 ] Jim Brennan commented on YARN-10450: patch 003 changes the Cluster page to use *Physical Mem Used %*

[jira] [Updated] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-13 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10450: --- Attachment: YARN-10450.003.patch > Add cpu and memory utilization per node and cluster-wide metrics

[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-13 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213305#comment-17213305 ] Jim Brennan commented on YARN-10450: Thanks [~ebadger] and [~jhung]!  I will upload a new patch with

[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212662#comment-17212662 ] Jim Brennan commented on YARN-10450: Thanks for the review and comments [~ebadger]!  I agree the

[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212547#comment-17212547 ] Jim Brennan commented on YARN-10450: Anyone else available to review? [~jhung], [~ebadger] ? > Add

[jira] [Updated] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-12 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-9667: -- Fix Version/s: 2.10.2 > Container-executor.c duplicates messages to stdout >

[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210465#comment-17210465 ] Jim Brennan commented on YARN-9667: --- I've committed this to branch-3.2 and branch-3.1, but the patch

[jira] [Updated] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-9667: -- Fix Version/s: 3.1.5 > Container-executor.c duplicates messages to stdout >

[jira] [Updated] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-9667: -- Fix Version/s: (was: 3.2.3) 3.2.2 > Container-executor.c duplicates messages to

[jira] [Updated] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-9667: -- Fix Version/s: 3.2.3 > Container-executor.c duplicates messages to stdout >

[jira] [Updated] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10455: --- Fix Version/s: (was: 3.2.1) 3.2.2 > TestNMProxy.testNMProxyRPCRetry is not

[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210457#comment-17210457 ] Jim Brennan commented on YARN-9667: --- +1 on the patch for branch-3.2 > Container-executor.c duplicates

[jira] [Commented] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210423#comment-17210423 ] Jim Brennan commented on YARN-10455: I have committed this to the following branches: trunk, 3.3,

[jira] [Updated] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10455: --- Fix Version/s: 3.1.2 > TestNMProxy.testNMProxyRPCRetry is not consistent >

[jira] [Updated] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10455: --- Fix Version/s: 3.2.1 > TestNMProxy.testNMProxyRPCRetry is not consistent >

[jira] [Updated] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10455: --- Fix Version/s: 3.3.1 3.4.0 > TestNMProxy.testNMProxyRPCRetry is not consistent >

[jira] [Commented] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent

2020-10-08 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210312#comment-17210312 ] Jim Brennan commented on YARN-10455: Thanks for the patch [~ahussein]!  I verified that this test

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-10-07 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209597#comment-17209597 ] Jim Brennan commented on YARN-10393: Thanks [~adam.antal]! > MR job live lock caused by completed

[jira] [Commented] (YARN-10451) RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.

2020-10-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209068#comment-17209068 ] Jim Brennan commented on YARN-10451: I have committed this to trunk, branch-3.3, branch-3.2,

[jira] [Updated] (YARN-10451) RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.

2020-10-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10451: --- Fix Version/s: 2.10.2 3.1.5 3.3.1 3.4.0

[jira] [Commented] (YARN-10451) RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.

2020-10-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209014#comment-17209014 ] Jim Brennan commented on YARN-10451: Thanks [~epayne]!  I will commit this shortly. > RM (v1) UI

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-10-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208867#comment-17208867 ] Jim Brennan commented on YARN-10393: [~adam.antal] I have uploaded a patch for branch-2.10. > MR job

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-10-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10393: --- Attachment: YARN-10393-branch-2.10.001.patch > MR job live lock caused by completed state container

[jira] [Commented] (YARN-10451) RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.

2020-10-06 Thread Jim Brennan (Jira)
[ https://issues.apache.org/jira/browse/YARN-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208793#comment-17208793 ] Jim Brennan commented on YARN-10451: I am +1 on patch 003.  The unit test that failed did not fail

<    1   2   3   4   5   6   7   8   9   >