[jira] [Commented] (FLINK-22535) Resource leak would happen if exception thrown during AbstractInvokable#restore of task life

2021-04-29 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337088#comment-17337088
 ] 

Yun Gao commented on FLINK-22535:
-

Hi [~akalashnikov] sorry I think if there is always the re-thrown, there should 
be still problems since that if the invoke() throws exception and enter the 
catch block, then the try block throws the second exception (like in the 
cancelTask() method), then cleanupInvoke() seems would not be called before 
re-thrown the exception.

> Resource leak would happen if exception thrown during 
> AbstractInvokable#restore of task life
> 
>
> Key: FLINK-22535
> URL: https://issues.apache.org/jira/browse/FLINK-22535
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.13.0
>Reporter: Yun Tang
>Priority: Critical
> Fix For: 1.13.1
>
>
> FLINK-17012 introduced new initialization phase such as 
> {{AbstractInvokable.restore}}, however, if 
> [invokable.restore()|https://github.com/apache/flink/blob/79a521e08df550d96f97bb6915191d8496bb29ea/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L754-L759]
>  throws exception out, no more {{StreamTask#cleanUpInvoke}} would be called, 
> leading to resource leak.
> We internally leveraged another way to use managed memory by registering 
> specific operator identifier in memory manager, forgetting to call the stream 
> task cleanup would let stream operator not be disposed and we have to face 
> critical resource leak.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22535) Resource leak would happen if exception thrown during AbstractInvokable#restore of task life

2021-04-29 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335692#comment-17335692
 ] 

Yun Gao commented on FLINK-22535:
-

Hi  [~akalashnikov], sorry I mixed the order of braces, you are right that the 
exception would be thrown at the last of the catch block. Sorry for the wrong 
information.

> Resource leak would happen if exception thrown during 
> AbstractInvokable#restore of task life
> 
>
> Key: FLINK-22535
> URL: https://issues.apache.org/jira/browse/FLINK-22535
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.13.0
>Reporter: Yun Tang
>Priority: Critical
> Fix For: 1.13.1
>
>
> FLINK-17012 introduced new initialization phase such as 
> {{AbstractInvokable.restore}}, however, if 
> [invokable.restore()|https://github.com/apache/flink/blob/79a521e08df550d96f97bb6915191d8496bb29ea/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L754-L759]
>  throws exception out, no more {{StreamTask#cleanUpInvoke}} would be called, 
> leading to resource leak.
> We internally leveraged another way to use managed memory by registering 
> specific operator identifier in memory manager, forgetting to call the stream 
> task cleanup would let stream operator not be disposed and we have to face 
> critical resource leak.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22535) Resource leak would happen if exception thrown during AbstractInvokable#restore of task life

2021-04-29 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335685#comment-17335685
 ] 

Anton Kalashnikov commented on FLINK-22535:
---

[~gaoyunhaii] thanks for the observation. But how is it possible to call 
cleanUpInvoke twice? As I see if the exception happens, only one cleanUpInvoke 
will be invoked which is in catch block and then the exception will be rethrown 
which allows avoiding the second invocation.

 

According to the original problem, it looks like we indeed should finish 
StreamTask#restore with _cleanUpInvoke._ My fault. I will fix it.

> Resource leak would happen if exception thrown during 
> AbstractInvokable#restore of task life
> 
>
> Key: FLINK-22535
> URL: https://issues.apache.org/jira/browse/FLINK-22535
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.13.0
>Reporter: Yun Tang
>Priority: Critical
> Fix For: 1.13.1
>
>
> FLINK-17012 introduced new initialization phase such as 
> {{AbstractInvokable.restore}}, however, if 
> [invokable.restore()|https://github.com/apache/flink/blob/79a521e08df550d96f97bb6915191d8496bb29ea/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L754-L759]
>  throws exception out, no more {{StreamTask#cleanUpInvoke}} would be called, 
> leading to resource leak.
> We internally leveraged another way to use managed memory by registering 
> specific operator identifier in memory manager, forgetting to call the stream 
> task cleanup would let stream operator not be disposed and we have to face 
> critical resource leak.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22535) Resource leak would happen if exception thrown during AbstractInvokable#restore of task life

2021-04-29 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335645#comment-17335645
 ] 

Yun Gao commented on FLINK-22535:
-

Also another related possible issue is that if StreamTask#invoke throws 
exception, the _cleanUpInvoke_ might be called twice, but we do not has a flag 
to skip the second execution.

> Resource leak would happen if exception thrown during 
> AbstractInvokable#restore of task life
> 
>
> Key: FLINK-22535
> URL: https://issues.apache.org/jira/browse/FLINK-22535
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.13.0
>Reporter: Yun Tang
>Priority: Critical
> Fix For: 1.13.1
>
>
> FLINK-17012 introduced new initialization phase such as 
> {{AbstractInvokable.restore}}, however, if 
> [invokable.restore()|https://github.com/apache/flink/blob/79a521e08df550d96f97bb6915191d8496bb29ea/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L754-L759]
>  throws exception out, no more {{StreamTask#cleanUpInvoke}} would be called, 
> leading to resource leak.
> We internally leveraged another way to use managed memory by registering 
> specific operator identifier in memory manager, forgetting to call the stream 
> task cleanup would let stream operator not be disposed and we have to face 
> critical resource leak.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)