Hi running 1.14.4 Logs included: https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0
1- My task managers shut down with: Terminating TaskManagerRunner with exit code 1. 2- It seems to happen at the same time every day. Which leads me to believe it's our database indexing (See below for reasoning of this). 3- Most of our jobs are ETL from Kafka to SQL Server. 4- We see the following exceptions in the logs: - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal - interrupting; it is stuck for 30 seconds in method: ... com.microsoft.sqlserver.jdbc.TDSChannel ... - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched from RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkException: Disconnect from JobManager responsible for ... 5- Also seeing this: Failed to close consumer network client with type org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient java.lang.NoClassDefFoundError: org/apache/kafka/common/network/Selector$CloseMode So what I'm guessing is happening is the indexing is blocking the job and the task manager cannot cleanly remove the job and finally after a while it decides to shut down completely? Is there a way to pause the stream and restart at a later time knowing that this happens always at the same wall clock time? Or maybe allow the JDBC to cleanly shutdown with a timeout?