[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948730#comment-16948730 ] Till Rohrmann commented on FLINK-14010: --- I think we introduced a Yarn test instability with this fix. FLINK-14347 looks as if we are reacting to an {{onShutdownRequest}} during the test clean up phase. Since we are calling the {{FatalExceptionHandler}} we terminate the process and log a fatal error in the logs which fails the test. > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.2, 1.9.0, 1.10.0 >Reporter: Zili Chen >Assignee: Zili Chen >Priority: Critical > Labels: pull-request-available > Fix For: 1.10.0, 1.9.1, 1.8.3 > > Time Spent: 20m > Remaining Estimate: 0h > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933140#comment-16933140 ] TisonKun commented on FLINK-14010: -- Will send a pull request in hours :-) > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933139#comment-16933139 ] Till Rohrmann commented on FLINK-14010: --- Ok, then let's do it like this. Do you have to time to work on this issue [~Tison]? > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933030#comment-16933030 ] TisonKun commented on FLINK-14010: -- [~trohrmann] Technically I agree that it is a valid solution. Give it another look I think we can complete shutdown future exceptionally "ResourceManager got closed when DispatcherResourceManagerComponent is running". It infers that the application goes into an UNKNOWN state so that the semantic is also correct. > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932676#comment-16932676 ] Till Rohrmann commented on FLINK-14010: --- Can't we say that we always complete {{DispatcherResourceManagerComponent#shutDownFuture}} exceptionally if {{ResourceManager.getTerminationFuture()}} terminates while {{DispatcherResourceManagerComponent#isRunning}} is {{true}}? The contract could be that the {{ResourceManager}} always needs to run and if it stops, then this is an indicator that something went wrong and we should stop the {{ClusterEntrypoint}}. We could do this by completing {{DispatcherResourceManagerComponent#shutDownFuture}} exceptionally if {{DispatcherResourceManagerComponent#isRunning}} is {{true}}. However, one could similarly also simply call {{onFatalError}} from within the {{ResourceManager}} as you've initially proposed. > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932423#comment-16932423 ] TisonKun commented on FLINK-14010: -- I have thought of this. The problem is that when the situation described here happens, we actually complete {{ResourceManager#getTerminationFuture}} normally, which cannot be sourced that it comes from {{YarnResourceManager#onShutdownRequest}}. If we achieve the function by using {{ResourceManager#getTerminationFuture}} to trigger the shut down of the {{DispatcherResourceManagerComponent}}, the assumption is: If ResourceManager is closed first(since termination future completes normally in both cases, we cannot distinguish by {{whenComplete}}), it infers an exceptionally status so that we should complete {{DispatcherResourceManagerComponent#getShutDownFuture}} exceptionally. Otherwise ResourceManager closes normally by other triggers, and the either {{DispatcherResourceManagerComponent#getShutDownFuture}} is already completed or {{ClusterEntrypoint#shutdownAsync}} is guarded to be executed once. I think this assumption is counter-intuitive that ResourceManager terminates "normally" but we complete shutdownFuture exceptionally. > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932396#comment-16932396 ] Till Rohrmann commented on FLINK-14010: --- Couldn't we simply use {{ResourceManager#getTerminationFuture}} to trigger the shut down of the {{DispatcherResourceManagerComponent}} [~Tison]? > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931490#comment-16931490 ] TisonKun commented on FLINK-14010: -- Well, it's reasonable we try to gracefully shut down. I start to work on it but I'm not sure about what the future should look like. There are two options in my mind, both of which introduce a {{shutdownFuture}} in {{ResourceManager}}. 1. {{ResourceManager#shutdownFuture}} is completed on {{YarnResourceManager#onShutdownRequest}} gets called. And we register callback in {{DispatcherResourceManagerComponent#registerShutDownFuture}}, when {{ResourceManager#shutdownFuture}} complete, we complete {{DispatcherResourceManagerComponent#shutDownFuture}} exceptionally. Concern here is that {{ResourceManager#shutdownFuture}} is never completed if {{YarnResourceManager#onShutdownRequest}} never gets called. I'm not sure if it is well. 2. {{ResourceManager#shutdownFuture}} is completed normally on {{ResourceManager#stopResourceManagerServices}} gets called, while completed exceptionally on {{YarnResourceManager#onShutdownRequest}} gets called. Also we register callback in {{DispatcherResourceManagerComponent#registerShutDownFuture}}, when {{ResourceManager#shutdownFuture}} complete exceptionally, we complete {{DispatcherResourceManagerComponent#shutDownFuture}} exceptionally; when when {{ResourceManager#shutdownFuture}} complete normally we do nothing. It might be a bit more complex than 1 and we should ensure that codepaths {{ResourceManager}} exit are all covered. WDYT [~till.rohrmann]? > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931386#comment-16931386 ] Till Rohrmann commented on FLINK-14010: --- {{#onFatalError}} could also be an option but I would prefer to distinguish here. I would consider {{#onShutdownRequest}} as request and not an error case. Hence, I would suggest to try to gracefully shut down. If this does not work, then we could fail fatally. > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931287#comment-16931287 ] TisonKun commented on FLINK-14010: -- [~till.rohrmann] Thanks to your explanation, I learn where the components layout comes from. Back to this issue, what if we call {{#onFatalError}} in {{YarnResourceManager#onShutdownRequest}}? {{#onShutdownRequest}} is only called when AM exceptionally switched and we can regard it as a fatal error. For implementation details, it calls {{System.exit}} that correctly shutdown the AM and release leadership. > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925778#comment-16925778 ] Till Rohrmann commented on FLINK-14010: --- Thanks for reporting this issue [~Tison]. I think your analysis is correct that we currently do not react properly to {{onShutdownRequest}} signals. What happens is that we only terminate the {{ResourceManager}} but not the other Flink components ({{Dispatcher}} and {{JobMasters}}). Before looking into a potential solutions let's answer your questions because they are important for understanding the bigger picture. The two components {{Dispatcher}} and {{ResourceManager}} have been designed to run independently of the other component, potentially also in a separate process. The current implementation couples the {{Dispatcher}} with the {{JobMasters}} but there should be nothing fundamental preventing from decoupling them. The main thing one needs to do is to add logic to start {{JobMaster}} processes somewhere. So long story short, the idea is that all components can run independently (modulo the existing {{Dispatcher}} implementation). At the moment we run all these components in one process because it was easier and less work to implement it this way. This is reflected in the {{DispatcherResourceManagerComponent}}. Hence, for the current implementation we can assume that if the {{ResourceManager}} terminates, then the other components should shut down as well. This could be implemented by registering in the {{DispatcherResourceManagerComponent}} a callback on the {{ResourceManager's}} termination future which triggers the closing of the {{DispatcherResourceManagerComponent}}. Concerning using a single leader election service, I would say that this is not feasible if we still want to keep the option to deploy the components in separate processes eventually. Would you have time to work on the fix for this problem [~Tison]? > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down
[ https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925460#comment-16925460 ] TisonKun commented on FLINK-14010: -- CC [~StephanEwen] [~till.rohrmann] [~xiaogang.shi] Here comes a high-level problem, do we explicitly constrain Dispatcher, ResourceManager and JobManagers run on one process? 1. the usage of reference to {{JobManagerGateway}} in Dispatcher already infers that we require this. 2. back to the design of FLIP-6, we have a global singleton of Dispatcher, and for each job, launch a JobManager and ResourceManager. The implementation diverges quite a lot. Could you please provide any background? 3. if we explicitly constrain as above, we actually need not to start leader election services per components, actually, we can use the abstraction and layout as below: - start a leader election service per dispatcher-resource-manager component, in cluster entrypoint level. It will participant the election and all metadata commits are delegate to this service. - all cluster level components that need to publish their address, such as Dispatcher, ResourceManager and WebMonitor publish their address via this leader election service. - Actors can be started as {{PermanentlyFencedRpcEndpoint}} and thus we survive from handling a lot of mutable shared state among leadership epoch. Specifically, cluster entrypoint acts as DispatcherRunner and so on, like JobManagerRunner to JobMaster. See also [this branch|https://github.com/tillrohrmann/flink/commits/removeSuspendFromJobMaster]. - back to this issue, cluster entrypoint({{YARNClusterEntrypoint}} maybe) reacts to AMRM request and thus all components can be required to shutdown properly. > Dispatcher & JobManagers don't give up leadership when AM is shut down > -- > > Key: FLINK-14010 > URL: https://issues.apache.org/jira/browse/FLINK-14010 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.7.2, 1.8.1, 1.9.0, 1.10.0 >Reporter: TisonKun >Priority: Critical > > In YARN deployment scenario, YARN RM possibly launches a new AM for the job > even if the previous AM does not terminated, for example, when AMRM heartbeat > timeout. This is a common case that RM will send a shutdown request to the > previous AM and expect the AM shutdown properly. > However, currently in {{YARNResourceManager}}, we handle this request in > {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not > Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new > AM cannot be granted leadership properly. Visually, > on previous AM: Dispatcher leader, JM leaders > on new AM: ResourceManager leader > since on client side or in per-job mode, JobManager address and port are > configured as the new AM, the whole cluster goes into an unrecoverable > inconsistent status: client all queries the dispatcher on new AM who is now > the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up > their leadership properly. -- This message was sent by Atlassian Jira (v8.3.2#803003)