[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-10-04 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207797#comment-17207797 ] zhenzhao wang commented on YARN-10393: -- +1, LGTM, thanks. > MR job live lock caused by completed

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-26 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17202738#comment-17202738 ] zhenzhao wang commented on YARN-10393: -- [~Jim_Brennan] And feel free to re-assign the ticket to you

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-26 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17202737#comment-17202737 ] zhenzhao wang commented on YARN-10393: -- [~Jim_Brennan] Sorry, I missed the msg. Thanks a lot for all

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-02 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189046#comment-17189046 ] zhenzhao wang commented on YARN-10393: -- And one more thing to clarify. The following code in the

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-02 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189009#comment-17189009 ] zhenzhao wang commented on YARN-10393: -- Thanks all for the great discussion.  As stated earlier, I

[jira] [Commented] (YARN-10398) Every NM will try to upload Jar/Archives/Files/Resources to Yarn Shared Cache Manager Like DDOS

2020-08-23 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182949#comment-17182949 ] zhenzhao wang commented on YARN-10398: -- [~jiwq] I double checked and confirmed the PR is the fix for

[jira] [Comment Edited] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-20 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180989#comment-17180989 ] zhenzhao wang edited comment on YARN-10393 at 8/20/20, 7:03 AM: Thanks

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-20 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180989#comment-17180989 ] zhenzhao wang commented on YARN-10393: -- Thanks [~Jim_Brennan] [~yuanbo] for the comment! ??citation

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-13 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177462#comment-17177462 ] zhenzhao wang commented on YARN-10393: -- [~adam.antal] This is a great question. First, it's not

[jira] [Comment Edited] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-13 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177462#comment-17177462 ] zhenzhao wang edited comment on YARN-10393 at 8/14/20, 3:50 AM:

[jira] [Commented] (YARN-10398) Every NM will try to upload Jar/Archives/Files/Resources to Yarn Shared Cache Manager Like DDOS

2020-08-12 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176758#comment-17176758 ] zhenzhao wang commented on YARN-10398: -- [~templedf] Could you please help review this patch? Thanks!

[jira] [Created] (YARN-10398) Every NM will try to upload Jar/Archives/Files/Resources to Yarn Shared Cache Manager Like DDOS

2020-08-12 Thread zhenzhao wang (Jira)
zhenzhao wang created YARN-10398: Summary: Every NM will try to upload Jar/Archives/Files/Resources to Yarn Shared Cache Manager Like DDOS Key: YARN-10398 URL: https://issues.apache.org/jira/browse/YARN-10398

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-12 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176740#comment-17176740 ] zhenzhao wang commented on YARN-10393: -- [~bibinchundatt]

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-12 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176467#comment-17176467 ] zhenzhao wang commented on YARN-10393: -- I could see two issues here: # RM and NM has a different

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-12 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-10393: - Affects Version/s: 3.4.0 3.3.0 2.6.1

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-08 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-10393: - Description: This was a bug we had seen multiple times on Hadoop 2.6.2. And the following

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-08 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-10393: - Description: This was a bug we had seen multiple times on Hadoop 2.4.x. And the following

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-08 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-10393: - Description: This was a bug we had seen multiple times on Hadoop 2.4.x. And the following

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-08 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-10393: - Description: This was a bug we had seen multiple times on Hadoop 2.4.x. And the following

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-08 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-10393: - Description: This was a bug we had seen multiple times on Hadoop 2.4.x. And the following

[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-08 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-10393: - Summary: MR job live lock caused by completed state container leak in heartbeat between node

[jira] [Assigned] (YARN-10393) MR job live lock caused by completed state container leak between node manager and RM heartbeat.

2020-08-08 Thread zhenzhao wang (Jira)
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang reassigned YARN-10393: Assignee: zhenzhao wang > MR job live lock caused by completed state container leak

[jira] [Created] (YARN-10393) MR job live lock caused by completed state container leak between node manager and RM heartbeat.

2020-08-08 Thread zhenzhao wang (Jira)
zhenzhao wang created YARN-10393: Summary: MR job live lock caused by completed state container leak between node manager and RM heartbeat. Key: YARN-10393 URL: https://issues.apache.org/jira/browse/YARN-10393

[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900507#comment-16900507 ] zhenzhao wang commented on YARN-9616: - [~smarthan] Sorry, I missed the msg. I got a patch which works

[jira] [Updated] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-9616: Attachment: YARN-9616.001-2.9.patch > Shared Cache Manager Failed To Upload Unpacked Resources >

[jira] [Updated] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor

2019-06-12 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-5727: Attachment: YARN-5727-Design-v2.pdf > Improve YARN shared cache support for LinuxContainerExecutor

[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-06-10 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860417#comment-16860417 ] zhenzhao wang commented on YARN-9616: - I had seen this issue in 2.9 and 2.6. More check is needed to

[jira] [Updated] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-06-10 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-9616: Affects Version/s: 2.8.3 2.9.2 > Shared Cache Manager Failed To Upload

[jira] [Updated] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-06-10 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-9616: Affects Version/s: 2.8.5 > Shared Cache Manager Failed To Upload Unpacked Resources >

[jira] [Created] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-06-10 Thread zhenzhao wang (JIRA)
zhenzhao wang created YARN-9616: --- Summary: Shared Cache Manager Failed To Upload Unpacked Resources Key: YARN-9616 URL: https://issues.apache.org/jira/browse/YARN-9616 Project: Hadoop YARN

[jira] [Assigned] (YARN-2774) shared cache service should authorize calls properly

2019-05-17 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang reassigned YARN-2774: --- Assignee: zhenzhao wang > shared cache service should authorize calls properly >

[jira] [Assigned] (YARN-6097) Add support for directories in the Shared Cache

2019-02-25 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang reassigned YARN-6097: --- Assignee: zhenzhao wang > Add support for directories in the Shared Cache >

[jira] [Assigned] (YARN-6910) Increase RM audit log coverage

2017-08-07 Thread zhenzhao wang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang reassigned YARN-6910: --- Assignee: zhenzhao wang > Increase RM audit log coverage > -- > >