Sanjana Kaundinya created KAFKA-9988:
----------------------------------------

             Summary: Log incorrectly reports task has failed when task takes 
too long to shutdown
                 Key: KAFKA-9988
                 URL: https://issues.apache.org/jira/browse/KAFKA-9988
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
            Reporter: Sanjana Kaundinya


If the OffsetStorageReader is closed while the task is trying to shutdown, and 
the task is trying to access the offsets from the OffsetStorageReader, then we 
see the following in the logs.

{code:java}
[2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=replicator-18} Task threw 
an uncaught and unrecoverable exception 
(org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Failed to fetch offsets.
        at 
org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:114)
        at 
org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offset(OffsetStorageReaderImpl.java:63)
        at 
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:205)
        at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.connect.errors.ConnectException: Offset reader 
closed while attempting to read offsets. This is likely because the task was 
been scheduled to stop but has taken longer than the graceful shutdown period 
to do so.
        at 
org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:103)
        ... 14 more
[2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=replicator-18} Task is 
being killed and will not recover until manually restarted 
(org.apache.kafka.connect.runtime.WorkerTask)
{code}

This is a bit misleading, because the task is already on its way of being 
shutdown, and doesn't actually need manual intervention to be restarted. We can 
see that as later on in the logs we see that it throws another unrecoverable 
exception.

{code:java}
[2020-05-05 05:40:39,361] ERROR WorkerSourceTask{id=replicator-18} Task threw 
an uncaught and unrecoverable exception 
(org.apache.kafka.connect.runtime.WorkerTask)
{code}

If we know a task is on its way of shutting down, we should not throw a 
ConnectException and instead log a warning so that we don't log false negatives.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to