[jira] [Commented] (FLINK-24401) TM cannot exit after Metaspace OOM
[ https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438031#comment-17438031 ] future commented on FLINK-24401: Of course. I'd like to contribute. Please assign this issue to me. Thanks a lot. > TM cannot exit after Metaspace OOM > -- > > Key: FLINK-24401 > URL: https://issues.apache.org/jira/browse/FLINK-24401 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / Task >Affects Versions: 1.12.0, 1.13.0 >Reporter: future >Priority: Major > Fix For: 1.14.1, 1.13.4 > > Attachments: image-2021-09-29-12-00-28-510.png, > image-2021-09-29-12-00-44-812.png > > > Hi masters, from the code and log, we can see that OOM will terminateJVM > directly, but Metaspace OutOfMemoryError will graceful shutdown. The code > comment mentions: {{_it does not usually require more class loading to fail > again with the Metaspace OutOfMemoryError_.}}. > But we encountered: after Metaspace OutOfMemoryError, > {{_java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm > unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class > loading failure, until kill tm by manually. > I want to add a catch Throwable in the onFatalError method, and directly > terminateJVM() in the catch. Is there any problem with this strategy? > > [code link > |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] > picture: > > !image-2021-09-29-12-00-44-812.png|width=1337,height=692! > !image-2021-09-29-12-00-28-510.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-24401) TM cannot exit after Metaspace OOM
[ https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430283#comment-17430283 ] future commented on FLINK-24401: Hi [~pnowojski] , thanks for you reply. I'm sorry, I didn't save the stack trace of TM, I just save the log. > TM cannot exit after Metaspace OOM > -- > > Key: FLINK-24401 > URL: https://issues.apache.org/jira/browse/FLINK-24401 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / Task >Affects Versions: 1.12.0, 1.13.0 >Reporter: future >Priority: Major > Fix For: 1.14.1, 1.13.4 > > Attachments: image-2021-09-29-12-00-28-510.png, > image-2021-09-29-12-00-44-812.png > > > Hi masters, from the code and log, we can see that OOM will terminateJVM > directly, but Metaspace OutOfMemoryError will graceful shutdown. The code > comment mentions: {{_it does not usually require more class loading to fail > again with the Metaspace OutOfMemoryError_.}}. > But we encountered: after Metaspace OutOfMemoryError, > {{_java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm > unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class > loading failure, until kill tm by manually. > I want to add a catch Throwable in the onFatalError method, and directly > terminateJVM() in the catch. Is there any problem with this strategy? > > [code link > |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] > picture: > > !image-2021-09-29-12-00-44-812.png|width=1337,height=692! > !image-2021-09-29-12-00-28-510.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM
[ https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] future updated FLINK-24401: --- Description: Hi masters, from the code and log, we can see that OOM will terminateJVM directly, but Metaspace OutOfMemoryError will graceful shutdown. The code comment mentions: {{_it does not usually require more class loading to fail again with the Metaspace OutOfMemoryError_.}}. But we encountered: after Metaspace OutOfMemoryError, {{_java.lang.NoClassDefFoundError: Could not initialize class org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class loading failure, until kill tm by manually. I want to add a catch Throwable in the onFatalError method, and directly terminateJVM() in the catch. Is there any problem with this strategy? [code link |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] picture: !image-2021-09-29-12-00-44-812.png|width=1337,height=692! !image-2021-09-29-12-00-28-510.png! was: Hi masters, from the code and log, we can see that OOM will terminateJVM directly, but Metaspace OutOfMemoryError will graceful shutdown. The code comment mentions: {{_it does not usually require more class loading to fail again with the Metaspace OutOfMemoryError_.}}. But we encountered: after Metaspace OutOfMemoryError, {{_java.lang.NoClassDefFoundError: Could not initialize class org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class loading failure, until kill tm by manually. I want to add a catch Throwable in the onFatalError method, and directly terminateJVM() in the catch. Is there any problem with this strategy? [code link |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] picture: !image-2021-09-29-11-45-48-098.png|width=663,height=343! !image-2021-09-29-11-47-47-157.png! > TM cannot exit after Metaspace OOM > -- > > Key: FLINK-24401 > URL: https://issues.apache.org/jira/browse/FLINK-24401 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.12.0, 1.13.0 >Reporter: future >Priority: Major > Fix For: 1.13.3, 1.14.1 > > Attachments: image-2021-09-29-12-00-28-510.png, > image-2021-09-29-12-00-44-812.png > > > Hi masters, from the code and log, we can see that OOM will terminateJVM > directly, but Metaspace OutOfMemoryError will graceful shutdown. The code > comment mentions: {{_it does not usually require more class loading to fail > again with the Metaspace OutOfMemoryError_.}}. > But we encountered: after Metaspace OutOfMemoryError, > {{_java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm > unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class > loading failure, until kill tm by manually. > I want to add a catch Throwable in the onFatalError method, and directly > terminateJVM() in the catch. Is there any problem with this strategy? > > [code link > |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] > picture: > > !image-2021-09-29-12-00-44-812.png|width=1337,height=692! > !image-2021-09-29-12-00-28-510.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM
[ https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] future updated FLINK-24401: --- Attachment: image-2021-09-29-12-00-28-510.png > TM cannot exit after Metaspace OOM > -- > > Key: FLINK-24401 > URL: https://issues.apache.org/jira/browse/FLINK-24401 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.12.0, 1.13.0 >Reporter: future >Priority: Major > Fix For: 1.13.3, 1.14.1 > > Attachments: image-2021-09-29-12-00-28-510.png, > image-2021-09-29-12-00-44-812.png > > > Hi masters, from the code and log, we can see that OOM will terminateJVM > directly, but Metaspace OutOfMemoryError will graceful shutdown. The code > comment mentions: {{_it does not usually require more class loading to fail > again with the Metaspace OutOfMemoryError_.}}. > But we encountered: after Metaspace OutOfMemoryError, > {{_java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm > unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class > loading failure, until kill tm by manually. > I want to add a catch Throwable in the onFatalError method, and directly > terminateJVM() in the catch. Is there any problem with this strategy? > > [code link > |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] > picture: > !image-2021-09-29-11-45-48-098.png|width=663,height=343! > > !image-2021-09-29-11-47-47-157.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM
[ https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] future updated FLINK-24401: --- Attachment: image-2021-09-29-12-00-44-812.png > TM cannot exit after Metaspace OOM > -- > > Key: FLINK-24401 > URL: https://issues.apache.org/jira/browse/FLINK-24401 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.12.0, 1.13.0 >Reporter: future >Priority: Major > Fix For: 1.13.3, 1.14.1 > > Attachments: image-2021-09-29-12-00-28-510.png, > image-2021-09-29-12-00-44-812.png > > > Hi masters, from the code and log, we can see that OOM will terminateJVM > directly, but Metaspace OutOfMemoryError will graceful shutdown. The code > comment mentions: {{_it does not usually require more class loading to fail > again with the Metaspace OutOfMemoryError_.}}. > But we encountered: after Metaspace OutOfMemoryError, > {{_java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm > unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class > loading failure, until kill tm by manually. > I want to add a catch Throwable in the onFatalError method, and directly > terminateJVM() in the catch. Is there any problem with this strategy? > > [code link > |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] > picture: > !image-2021-09-29-11-45-48-098.png|width=663,height=343! > > !image-2021-09-29-11-47-47-157.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM
[ https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] future updated FLINK-24401: --- Attachment: (was: image-2021-09-29-11-47-47-157.png) > TM cannot exit after Metaspace OOM > -- > > Key: FLINK-24401 > URL: https://issues.apache.org/jira/browse/FLINK-24401 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.12.0, 1.13.0 >Reporter: future >Priority: Major > Fix For: 1.13.3, 1.14.1 > > Attachments: image-2021-09-29-12-00-28-510.png, > image-2021-09-29-12-00-44-812.png > > > Hi masters, from the code and log, we can see that OOM will terminateJVM > directly, but Metaspace OutOfMemoryError will graceful shutdown. The code > comment mentions: {{_it does not usually require more class loading to fail > again with the Metaspace OutOfMemoryError_.}}. > But we encountered: after Metaspace OutOfMemoryError, > {{_java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm > unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class > loading failure, until kill tm by manually. > I want to add a catch Throwable in the onFatalError method, and directly > terminateJVM() in the catch. Is there any problem with this strategy? > > [code link > |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] > picture: > !image-2021-09-29-11-45-48-098.png|width=663,height=343! > > !image-2021-09-29-11-47-47-157.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM
[ https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] future updated FLINK-24401: --- Attachment: (was: image-2021-09-29-11-45-48-098.png) > TM cannot exit after Metaspace OOM > -- > > Key: FLINK-24401 > URL: https://issues.apache.org/jira/browse/FLINK-24401 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.12.0, 1.13.0 >Reporter: future >Priority: Major > Fix For: 1.13.3, 1.14.1 > > Attachments: image-2021-09-29-11-47-47-157.png > > > Hi masters, from the code and log, we can see that OOM will terminateJVM > directly, but Metaspace OutOfMemoryError will graceful shutdown. The code > comment mentions: {{_it does not usually require more class loading to fail > again with the Metaspace OutOfMemoryError_.}}. > But we encountered: after Metaspace OutOfMemoryError, > {{_java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm > unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class > loading failure, until kill tm by manually. > I want to add a catch Throwable in the onFatalError method, and directly > terminateJVM() in the catch. Is there any problem with this strategy? > > [code link > |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] > picture: > !image-2021-09-29-11-45-48-098.png|width=663,height=343! > > !image-2021-09-29-11-47-47-157.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24401) TM cannot exit after Metaspace OOM
future created FLINK-24401: -- Summary: TM cannot exit after Metaspace OOM Key: FLINK-24401 URL: https://issues.apache.org/jira/browse/FLINK-24401 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.13.0, 1.12.0 Reporter: future Fix For: 1.13.3, 1.14.1 Attachments: image-2021-09-29-11-45-48-098.png, image-2021-09-29-11-47-47-157.png Hi masters, from the code and log, we can see that OOM will terminateJVM directly, but Metaspace OutOfMemoryError will graceful shutdown. The code comment mentions: {{_it does not usually require more class loading to fail again with the Metaspace OutOfMemoryError_.}}. But we encountered: after Metaspace OutOfMemoryError, {{_java.lang.NoClassDefFoundError: Could not initialize class org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class loading failure, until kill tm by manually. I want to add a catch Throwable in the onFatalError method, and directly terminateJVM() in the catch. Is there any problem with this strategy? [code link |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312] picture: !image-2021-09-29-11-45-48-098.png|width=663,height=343! !image-2021-09-29-11-47-47-157.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)