Hello,

We have installed a version of Confluent Kafka 5.2.1 for our 3 node Kafka
cluster and 3 node Zookeeper cluster.

We have a  number of topics in the Kafka cluster and processes which
continuously write data into some of these topics. Along with these topics
we have some test topic which are rarely used. But for the past couple of
months we are seeing some weird behavior in these test topics which leads
to crashing of Kafka node. It seems that the Kafka Scheduler throws
Exception and crashes the service when it could not find a log to be
deleted. I am not able to understand that even when there is no data, why
does Kafka Scheduler looks for deleting a particular log segment from a
topic's log directory.

Attaching logged error details where the original topic names have been
replaced with different names. Also I have copied and pasted the error
below.

[2019-12-02 21:25:59,269] ERROR Error while deleting segments for
test-debug-0 in dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel:76)
java.nio.file.NoSuchFileException:
/tmp/kafka-logs/test-debug-0/00000000000000000000.log
    at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at
org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at
scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at
kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at
kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException:
/tmp/kafka-logs/test-debug-0/00000000000000000000.log ->
/tmp/kafka-logs/test-debug-0/00000000000000000000.log.deleted
        at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more
[2019-12-02 21:25:59,271] ERROR Uncaught exception in scheduled task
'kafka-log-retention' (kafka.utils.KafkaScheduler:76)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting
segments for test-debug-0 in dir /tmp/kafka-logs
Caused by: java.nio.file.NoSuchFileException:
/tmp/kafka-logs/test-debug-0/00000000000000000000.log
    at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at
org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at
scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at
kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at
kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException:
/tmp/kafka-logs/test-debug-0/00000000000000000000.log ->
/tmp/kafka-logs/test-debug-0/00000000000000000000.log.deleted
        at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more
[2019-12-02 21:25:59,272] INFO [ReplicaManager broker=1] Stopping serving
replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager:66)
[2019-12-02 21:25:59,287] INFO [ReplicaFetcherManager on broker 1] Removed
fetcher for partitions Set(test-debug-0,  <other topics.....>)
(kafka.server.ReplicaAlterLogDirsManager:66)
[2019-12-02 21:25:59,326] INFO [KafkaApi-1] Closing connection due to error
during produce request with correlation id 81507 from client id  with ack=0
Topic and partition to exceptions: topic_a-0 ->
org.apache.kafka.common.errors.KafkaStorageException
(kafka.server.KafkaApis:66)
[2019-12-02 21:25:59,409] INFO [ReplicaManager broker=1] Broker 1 stopped
fetcher for partitions test-debug-0, ,  <other topics.....> and stopped
moving logs for partitions  because they are in the failed log directory
/tmp/kafka-logs. (kafka.server.ReplicaManager:66)
[2019-12-02 21:25:59,409] INFO Stopping serving logs in dir /tmp/kafka-logs
(kafka.log.LogManager:66)
[2019-12-02 21:25:59,411] INFO [KafkaApi-1] Closing connection due to error
during produce request with correlation id 57755 from client id  with ack=0
Topic and partition to exceptions: topic_-2 ->
org.apache.kafka.common.errors.KafkaStorageException
(kafka.server.KafkaApis:66)
[2019-12-02 21:25:59,428] ERROR Shutdown broker because all log dirs in
/tmp/kafka-logs have failed (kafka.log.LogManager:143)

Kindly guide me through how to overcome this error. Any help would be much
appreciated.

Best,
Akhauri Prateek Shekhar

Reply via email to