Hi running 1.14.4

Logs included:
https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0

1- My task managers shut down with: Terminating TaskManagerRunner with exit
code 1.
2- It seems to happen at the same time every day. Which leads me to believe
it's our database indexing (See below for reasoning of this).
3- Most of our jobs are ETL from Kafka to SQL Server.
4- We see the following exceptions in the logs:
      - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal -
interrupting; it is stuck for 30 seconds in method:
... com.microsoft.sqlserver.jdbc.TDSChannel ...
      - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched from
RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkException:
Disconnect from JobManager responsible for ...
5- Also seeing this: Failed to close consumer network client with type
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
java.lang.NoClassDefFoundError:
org/apache/kafka/common/network/Selector$CloseMode

So what I'm guessing is happening is the indexing is blocking the job and
the task manager cannot cleanly remove the job and finally after a while it
decides to shut down completely?

Is there a way to pause the stream and restart at a later time knowing that
this happens always at the same wall clock time? Or maybe allow the JDBC to
cleanly shutdown with a timeout?

Reply via email to