[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-07 Thread Xintong Song (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010440#comment-17010440 ] Xintong Song commented on FLINK-15456: -- Thanks [~zhuzh] for looking into the problem. I agree with

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-07 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009644#comment-17009644 ] Zhu Zhu commented on FLINK-15456: - I just reproduced the issue with debug logs enabled. See attached

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007193#comment-17007193 ] Zhu Zhu commented on FLINK-15456: - Synced with [~xintongsong] offline, the RM recovered because a

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007187#comment-17007187 ] Zhu Zhu commented on FLINK-15456: - [~xintongsong], the RM was not really revoked leadership nor

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Xintong Song (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006794#comment-17006794 ] Xintong Song commented on FLINK-15456: -- [~zhuzh] If RM is not restarted, there will be no need to

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006779#comment-17006779 ] Zhu Zhu commented on FLINK-15456: - [~xintongsong], in this case, the RM recovered without a restart, you

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Xintong Song (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006738#comment-17006738 ] Xintong Song commented on FLINK-15456: -- When Flink RM recovers previous attempt containers from

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006719#comment-17006719 ] Zhu Zhu commented on FLINK-15456: - Thanks for the explanation [~xintongsong]. I still have one question

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Xintong Song (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006705#comment-17006705 ] Xintong Song commented on FLINK-15456: -- I think the problem of FLINK-13554 is that, int the time

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006696#comment-17006696 ] Zhu Zhu commented on FLINK-15456: - This issue looks like the case described in FLINK-13554.

[jira] [Commented] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

2020-01-02 Thread Xintong Song (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006694#comment-17006694 ] Xintong Song commented on FLINK-15456: -- It seems to be the same problem as FLINK-13554. But I