Hi, The above exception may be caused by both savepoint timing out and job termination timing out. To distinguish between these two cases, could you please check the status of the savepoint and the tasks in the Flink Web UI? IIUC, after you get this exception on client, you still have the job running. Could you also check if there are any exceptions in "Exceptions history" or in the logs?
Regards, Roman On Mon, Sep 27, 2021 at 6:49 AM Marco Villalobos <mvillalo...@kineteque.com> wrote: > > Today, I kept on receiving a timeout exception when stopping my job with a > savepoint. > This happened with Flink version 1.12.2 running in EMR. > > I had to use the deprecated cancel with savepoint feature instead. > > In fact, stopping with a savepoint, creating a savepoint, and cancelling with > a savepoint all gave me the timeout exception. > > However, the cancel with savepoint started creating a savepoint on the > cluster. > > The program finished with the following exception: > > org.apache.flink.util.FlinkException: Could not stop with a savepoint job > "5d6100984035db9541e9f08ecbd311bf". > at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:585) > at > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1006) > at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:573) > at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1073) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:583) > ... 9 more > > >