[jira] [Commented] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-11-03 Thread future (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438031#comment-17438031
 ] 

future commented on FLINK-24401:


Of course. I'd like to contribute. Please assign this issue to me. Thanks a lot.

> TM cannot exit after Metaspace OOM
> --
>
> Key: FLINK-24401
> URL: https://issues.apache.org/jira/browse/FLINK-24401
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / Task
>Affects Versions: 1.12.0, 1.13.0
>Reporter: future
>Priority: Major
> Fix For: 1.14.1, 1.13.4
>
> Attachments: image-2021-09-29-12-00-28-510.png, 
> image-2021-09-29-12-00-44-812.png
>
>
> Hi masters, from the code and log, we can see that OOM will terminateJVM 
> directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
> comment mentions: {{_it does not usually require more class loading to fail 
> again with the Metaspace OutOfMemoryError_.}}.
> But we encountered: after Metaspace OutOfMemoryError, 
> {{_java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
> unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
> loading failure, until kill tm by manually.
> I want to add a catch Throwable in the onFatalError method, and directly 
> terminateJVM() in the catch. Is there any problem with this strategy? 
>  
> [code link 
> |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]
> picture:
>  
> !image-2021-09-29-12-00-44-812.png|width=1337,height=692!
>   !image-2021-09-29-12-00-28-510.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-10-18 Thread future (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430283#comment-17430283
 ] 

future commented on FLINK-24401:


Hi [~pnowojski] , thanks for you reply. I'm sorry, I didn't save the stack 
trace of TM, I just save the log. 

> TM cannot exit after Metaspace OOM
> --
>
> Key: FLINK-24401
> URL: https://issues.apache.org/jira/browse/FLINK-24401
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / Task
>Affects Versions: 1.12.0, 1.13.0
>Reporter: future
>Priority: Major
> Fix For: 1.14.1, 1.13.4
>
> Attachments: image-2021-09-29-12-00-28-510.png, 
> image-2021-09-29-12-00-44-812.png
>
>
> Hi masters, from the code and log, we can see that OOM will terminateJVM 
> directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
> comment mentions: {{_it does not usually require more class loading to fail 
> again with the Metaspace OutOfMemoryError_.}}.
> But we encountered: after Metaspace OutOfMemoryError, 
> {{_java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
> unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
> loading failure, until kill tm by manually.
> I want to add a catch Throwable in the onFatalError method, and directly 
> terminateJVM() in the catch. Is there any problem with this strategy? 
>  
> [code link 
> |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]
> picture:
>  
> !image-2021-09-29-12-00-44-812.png|width=1337,height=692!
>   !image-2021-09-29-12-00-28-510.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-09-28 Thread future (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

future updated FLINK-24401:
---
Description: 
Hi masters, from the code and log, we can see that OOM will terminateJVM 
directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
comment mentions: {{_it does not usually require more class loading to fail 
again with the Metaspace OutOfMemoryError_.}}.

But we encountered: after Metaspace OutOfMemoryError, 
{{_java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
loading failure, until kill tm by manually.

I want to add a catch Throwable in the onFatalError method, and directly 
terminateJVM() in the catch. Is there any problem with this strategy? 

 

[code link 
|https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]

picture:

 

!image-2021-09-29-12-00-44-812.png|width=1337,height=692!

  !image-2021-09-29-12-00-28-510.png!

 

 

  was:
Hi masters, from the code and log, we can see that OOM will terminateJVM 
directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
comment mentions: {{_it does not usually require more class loading to fail 
again with the Metaspace OutOfMemoryError_.}}.

But we encountered: after Metaspace OutOfMemoryError, 
{{_java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
loading failure, until kill tm by manually.

I want to add a catch Throwable in the onFatalError method, and directly 
terminateJVM() in the catch. Is there any problem with this strategy? 

 

[code link 
|https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]

picture:

!image-2021-09-29-11-45-48-098.png|width=663,height=343!

 

!image-2021-09-29-11-47-47-157.png!

 


> TM cannot exit after Metaspace OOM
> --
>
> Key: FLINK-24401
> URL: https://issues.apache.org/jira/browse/FLINK-24401
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.12.0, 1.13.0
>Reporter: future
>Priority: Major
> Fix For: 1.13.3, 1.14.1
>
> Attachments: image-2021-09-29-12-00-28-510.png, 
> image-2021-09-29-12-00-44-812.png
>
>
> Hi masters, from the code and log, we can see that OOM will terminateJVM 
> directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
> comment mentions: {{_it does not usually require more class loading to fail 
> again with the Metaspace OutOfMemoryError_.}}.
> But we encountered: after Metaspace OutOfMemoryError, 
> {{_java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
> unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
> loading failure, until kill tm by manually.
> I want to add a catch Throwable in the onFatalError method, and directly 
> terminateJVM() in the catch. Is there any problem with this strategy? 
>  
> [code link 
> |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]
> picture:
>  
> !image-2021-09-29-12-00-44-812.png|width=1337,height=692!
>   !image-2021-09-29-12-00-28-510.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-09-28 Thread future (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

future updated FLINK-24401:
---
Attachment: image-2021-09-29-12-00-28-510.png

> TM cannot exit after Metaspace OOM
> --
>
> Key: FLINK-24401
> URL: https://issues.apache.org/jira/browse/FLINK-24401
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.12.0, 1.13.0
>Reporter: future
>Priority: Major
> Fix For: 1.13.3, 1.14.1
>
> Attachments: image-2021-09-29-12-00-28-510.png, 
> image-2021-09-29-12-00-44-812.png
>
>
> Hi masters, from the code and log, we can see that OOM will terminateJVM 
> directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
> comment mentions: {{_it does not usually require more class loading to fail 
> again with the Metaspace OutOfMemoryError_.}}.
> But we encountered: after Metaspace OutOfMemoryError, 
> {{_java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
> unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
> loading failure, until kill tm by manually.
> I want to add a catch Throwable in the onFatalError method, and directly 
> terminateJVM() in the catch. Is there any problem with this strategy? 
>  
> [code link 
> |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]
> picture:
> !image-2021-09-29-11-45-48-098.png|width=663,height=343!
>  
> !image-2021-09-29-11-47-47-157.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-09-28 Thread future (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

future updated FLINK-24401:
---
Attachment: image-2021-09-29-12-00-44-812.png

> TM cannot exit after Metaspace OOM
> --
>
> Key: FLINK-24401
> URL: https://issues.apache.org/jira/browse/FLINK-24401
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.12.0, 1.13.0
>Reporter: future
>Priority: Major
> Fix For: 1.13.3, 1.14.1
>
> Attachments: image-2021-09-29-12-00-28-510.png, 
> image-2021-09-29-12-00-44-812.png
>
>
> Hi masters, from the code and log, we can see that OOM will terminateJVM 
> directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
> comment mentions: {{_it does not usually require more class loading to fail 
> again with the Metaspace OutOfMemoryError_.}}.
> But we encountered: after Metaspace OutOfMemoryError, 
> {{_java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
> unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
> loading failure, until kill tm by manually.
> I want to add a catch Throwable in the onFatalError method, and directly 
> terminateJVM() in the catch. Is there any problem with this strategy? 
>  
> [code link 
> |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]
> picture:
> !image-2021-09-29-11-45-48-098.png|width=663,height=343!
>  
> !image-2021-09-29-11-47-47-157.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-09-28 Thread future (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

future updated FLINK-24401:
---
Attachment: (was: image-2021-09-29-11-47-47-157.png)

> TM cannot exit after Metaspace OOM
> --
>
> Key: FLINK-24401
> URL: https://issues.apache.org/jira/browse/FLINK-24401
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.12.0, 1.13.0
>Reporter: future
>Priority: Major
> Fix For: 1.13.3, 1.14.1
>
> Attachments: image-2021-09-29-12-00-28-510.png, 
> image-2021-09-29-12-00-44-812.png
>
>
> Hi masters, from the code and log, we can see that OOM will terminateJVM 
> directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
> comment mentions: {{_it does not usually require more class loading to fail 
> again with the Metaspace OutOfMemoryError_.}}.
> But we encountered: after Metaspace OutOfMemoryError, 
> {{_java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
> unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
> loading failure, until kill tm by manually.
> I want to add a catch Throwable in the onFatalError method, and directly 
> terminateJVM() in the catch. Is there any problem with this strategy? 
>  
> [code link 
> |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]
> picture:
> !image-2021-09-29-11-45-48-098.png|width=663,height=343!
>  
> !image-2021-09-29-11-47-47-157.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-09-28 Thread future (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

future updated FLINK-24401:
---
Attachment: (was: image-2021-09-29-11-45-48-098.png)

> TM cannot exit after Metaspace OOM
> --
>
> Key: FLINK-24401
> URL: https://issues.apache.org/jira/browse/FLINK-24401
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.12.0, 1.13.0
>Reporter: future
>Priority: Major
> Fix For: 1.13.3, 1.14.1
>
> Attachments: image-2021-09-29-11-47-47-157.png
>
>
> Hi masters, from the code and log, we can see that OOM will terminateJVM 
> directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
> comment mentions: {{_it does not usually require more class loading to fail 
> again with the Metaspace OutOfMemoryError_.}}.
> But we encountered: after Metaspace OutOfMemoryError, 
> {{_java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
> unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
> loading failure, until kill tm by manually.
> I want to add a catch Throwable in the onFatalError method, and directly 
> terminateJVM() in the catch. Is there any problem with this strategy? 
>  
> [code link 
> |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]
> picture:
> !image-2021-09-29-11-45-48-098.png|width=663,height=343!
>  
> !image-2021-09-29-11-47-47-157.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24401) TM cannot exit after Metaspace OOM

2021-09-28 Thread future (Jira)
future created FLINK-24401:
--

 Summary: TM cannot exit after Metaspace OOM
 Key: FLINK-24401
 URL: https://issues.apache.org/jira/browse/FLINK-24401
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.13.0, 1.12.0
Reporter: future
 Fix For: 1.13.3, 1.14.1
 Attachments: image-2021-09-29-11-45-48-098.png, 
image-2021-09-29-11-47-47-157.png

Hi masters, from the code and log, we can see that OOM will terminateJVM 
directly, but Metaspace OutOfMemoryError will graceful shutdown. The code 
comment mentions: {{_it does not usually require more class loading to fail 
again with the Metaspace OutOfMemoryError_.}}.

But we encountered: after Metaspace OutOfMemoryError, 
{{_java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm 
unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class 
loading failure, until kill tm by manually.

I want to add a catch Throwable in the onFatalError method, and directly 
terminateJVM() in the catch. Is there any problem with this strategy? 

 

[code link 
|https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]

picture:

!image-2021-09-29-11-45-48-098.png|width=663,height=343!

 

!image-2021-09-29-11-47-47-157.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)