Github user zhijiangW commented on the issue:
https://github.com/apache/flink/pull/3360
@StephanEwen , if the exception is bubbled out, and cause TaskExecutor to
exit as a result, I think the JobMaster can be assumed in a sane state in final
based on detection of TaskExecutor
Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/3360
Thanks for the clarification @zhijiangW. I know understand the problem that
we effectively introduce via `RpcEndpoint.runAsync` another message which might
get "lost" (e.g. due to OOM
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3360
Looking at this from another angle: If any Runnable that is scheduled ever
lets an exception bubble out, can we still assume that the JobManager is in a
sane state? Or should be actually make
Github user zhijiangW commented on the issue:
https://github.com/apache/flink/pull/3360
Hi @tillrohrmann , thank you for reviews and positive suggestions!
I try to explain the root case of this issue first:
From JobMaster side, it sends the cancel rpc message and gets
Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/3360
I think adding this safety net makes sense and protects against a corrupted
state.
However, isn't the root cause of the described problem that the
JobMaster-TaskExecutor communication
Github user zhijiangW commented on the issue:
https://github.com/apache/flink/pull/3360
@StephanEwen , already submit the modifications.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user zhijiangW commented on the issue:
https://github.com/apache/flink/pull/3360
@StephanEwen , thank you for so quick reviews!
That is a good idea to add the uniform way in the utils, so we can use that
in anywhere.
I will fix it as your suggestions later
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3360
I would suggest that we adopt the following pattern for all the places like
the one in this pull request where we catch Throwables:
```java
try {
...
} catch (Throwable