Hi Dan, I think you might use older version of Flink and this problem has been resolved by FLINK-16753 [1] after Flink-1.10.3.
[1] https://issues.apache.org/jira/browse/FLINK-16753 Best Yun Tang ________________________________ From: Robert Metzger <rmetz...@apache.org> Sent: Monday, April 26, 2021 14:46 To: Dan Hill <quietgol...@gmail.com> Cc: user <user@flink.apache.org> Subject: Re: Checkpoint error - "The job has failed" Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill <quietgol...@gmail.com<mailto:quietgol...@gmail.com>> wrote: My Flink job failed to checkpoint with a "The job has failed" error. The logs contained no other recent errors. I keep hitting the error even if I cancel the jobs and restart them. When I restarted my jobmanager and taskmanager, the error went away. What error am I hitting? It looks like there is bad state that lives outside the scope of a job. How often do people restart their jobmanagers and taskmanager to deal with errors like this?