Looks like you were trying to show some screen which didn't go thru.

Examining LogSegment.scala, I wonder if the time index was being appended
to (which caused the deletion to fail):

      // append an entry to the index (if needed)
      if(bytesSinceLastIndexEntry > indexIntervalBytes) {
        offsetIndex.append(largestOffset, physicalPosition)
        timeIndex.maybeAppend(maxTimestampSoFar, offsetOfMaxTimestamp)

Can you capture the stack trace and pastebin it ?

If LogSegment shows up in the stack trace, we may have some idea for the
root cause.

Thanks

On Mon, May 14, 2018 at 8:31 AM, M. Manna <manme...@gmail.com> wrote:

> I am having some difficulties debugging the cluster. IN CWIKI for kafka,
> I can see a setup guide for eclipse, but nothing as such for debugger
> setup. The issue is that every time the log cleaner thread is run, it's
> having a FATAL shutdown.
> I tried to close all producers and consumers, but it still locks the log
> and Transaction index segments. Forcefully unmapping the offset and time
> offset files work normally, but log and transaction index files don't work
> properly.
>
> Could anyone please advise how to setup the kafka debugging on eclipse ? I
> have added the sdb jar noted in scala-debugger.org site. But I couldn't
> connect to port 5005 at all.
>
>
>
>
> On 13 May 2018 at 10:00, M. Manna <manme...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> I highly appreciate the response over the weekend, and thanks for
>> pointing out the JIRAs.
>>
>> I don't believe the processes are responsible, but individual threads
>> which are still holding the log/index files using IO streams. I am trying
>> walk a single node setup through debugger to find out which thread is
>> locking the file. Apologise but it's a huge application so it might get me
>> some time to get around it :)
>>
>> Please do update if you find something new.
>>
>> Regards,
>>
>> On 12 May 2018 at 22:15, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> There are two outstanding issues: KAFKA-6059 and KAFKA-6200 which bear
>>> some
>>> resemblance.
>>>
>>> Can you try to find how the other process uses the file being deleted ?
>>>
>>> https://superuser.com/questions/117902/find-out-which-proces
>>> s-is-locking-a-file-or-folder-in-windows
>>> https://www.computerhope.com/issues/ch000714.htm
>>>
>>> Cheers
>>>
>>> On Sat, May 12, 2018 at 1:42 PM, M. Manna <manme...@gmail.com> wrote:
>>>
>>> > Hello,
>>> >
>>> > We are still stuck with this issue where 2.11-1.1.0 distro is failing
>>> to
>>> > cleanup logs on Windows and brings the entire cluster down one by one.
>>> > Extending the retention hours and sizes don't help because they burden
>>> the
>>> > hard drive.
>>> >
>>> > Here is the log
>>> >
>>> > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1]
>>> Rolled
>>> > > new log segment at offset 45 in 105 ms. (kafka.log.Log)
>>> > > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1]
>>> > > Scheduling log segment [baseOffset 0, size 2290] for deletion.
>>> > > (kafka.log.Log)
>>> > > [2018-05-12 21:36:57,673] ERROR Error while deleting segments for
>>> test-0
>>> > > in dir C:\kafka1 (kafka.server.LogDirFailureChannel)
>>> > > java.nio.file.FileSystemException:
>>> > > C:\kafka1\test-0\00000000000000000000.log ->
>>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process
>>> cannot
>>> > > access the file because it is being used by another process.
>>> > >
>>> > >         at
>>> > > sun.nio.fs.WindowsException.translateToIOException(
>>> > WindowsException.java:86)
>>> > >         at
>>> > > sun.nio.fs.WindowsException.rethrowAsIOException(
>>> > WindowsException.java:97)
>>> > >         at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
>>> > >         at
>>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
>>> > java:287)
>>> > >         at java.nio.file.Files.move(Files.java:1395)
>>> > >         at
>>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
>>> > Utils.java:697)
>>> > >         at
>>> > > org.apache.kafka.common.record.FileRecords.renameTo(
>>> > FileRecords.java:212)
>>> > >         at kafka.log.LogSegment.changeFil
>>> eSuffixes(LogSegment.scala:415)
>>> > >         at kafka.log.Log.kafka$log$Log$$a
>>> syncDeleteSegment(Log.scala:
>>> > 1601)
>>> > >         at kafka.log.Log.kafka$log$Log$$d
>>> eleteSegment(Log.scala:1588)
>>> > >         at
>>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
>>> > apply$mcI$sp$1.apply(Log.scala:1170)
>>> > >         at
>>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
>>> > apply$mcI$sp$1.apply(Log.scala:1170)
>>> > >         at
>>> > > scala.collection.mutable.ResizableArray$class.foreach(
>>> > ResizableArray.scala:59)
>>> > >         at
>>> > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>>> > >         at
>>> > > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170)
>>> > >         at kafka.log.Log$$anonfun$deleteS
>>> egments$1.apply(Log.scala:1161)
>>> > >         at kafka.log.Log$$anonfun$deleteS
>>> egments$1.apply(Log.scala:1161)
>>> > >         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
>>> > >         at kafka.log.Log.deleteSegments(Log.scala:1161)
>>> > >         at kafka.log.Log.deleteOldSegments(Log.scala:1156)
>>> > >         at kafka.log.Log.deleteRetentionMsBreachedSegme
>>> > nts(Log.scala:1228)
>>> > >         at kafka.log.Log.deleteOldSegments(Log.scala:1222)
>>> > >         at
>>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager
>>> .scala:854)
>>> > >         at
>>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager
>>> .scala:852)
>>> > >         at
>>> > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach
>>> $1.apply(
>>> > TraversableLike.scala:733)
>>> > >         at scala.collection.immutable.List.foreach(List.scala:392)
>>> > >         at
>>> > > scala.collection.TraversableLike$WithFilter.
>>> > foreach(TraversableLike.scala:732)
>>> > >         at kafka.log.LogManager.cleanupLogs(LogManager.scala:852)
>>> > >         at
>>> > > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(
>>> > LogManager.scala:385)
>>> > >         at
>>> > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
>>> > KafkaScheduler.scala:110)
>>> > >         at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
>>> > >         at
>>> > > java.util.concurrent.Executors$RunnableAdapter.call(Executor
>>> s.java:511)
>>>
>>> > >         at java.util.concurrent.FutureTask.runAndReset(
>>> > FutureTask.java:308)
>>> > >         at
>>> > > java.util.concurrent.ScheduledThreadPoolExecutor$
>>> > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>> > >         at
>>> > > java.util.concurrent.ScheduledThreadPoolExecutor$
>>> > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>> > >         at
>>> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> > ThreadPoolExecutor.java:1149)
>>> > >         at
>>> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> > ThreadPoolExecutor.java:624)
>>> > >         at java.lang.Thread.run(Thread.java:748)
>>> > >         Suppressed: java.nio.file.FileSystemException:
>>> > > C:\kafka1\test-0\00000000000000000000.log ->
>>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process
>>> cannot
>>> > > access the file because it is being used by another process.
>>> > >
>>> > >                 at
>>> > > sun.nio.fs.WindowsException.translateToIOException(
>>> > WindowsException.java:86)
>>> > >                 at
>>> > > sun.nio.fs.WindowsException.rethrowAsIOException(
>>> > WindowsException.java:97)
>>> > >                 at
>>> > > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
>>> > >                 at
>>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
>>> > java:287)
>>> > >                 at java.nio.file.Files.move(Files.java:1395)
>>> > >                 at
>>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
>>> > Utils.java:694)
>>> > >                 ... 32 more
>>> > > [2018-05-12 21:36:57,689] INFO [ReplicaManager broker=1] Stopping
>>> serving
>>> > > replicas in dir C:\kafka1 (kafka.server.ReplicaManager)
>>> > > [2018-05-12 21:36:57,689] ERROR Uncaught exception in scheduled task
>>> > > 'kafka-log-retention' (kafka.utils.KafkaScheduler)
>>> > > org.apache.kafka.common.errors.KafkaStorageException: Error while
>>> > deleting
>>> > > segments for test-0 in dir C:\kafka1
>>> > > Caused by: java.nio.file.FileSystemException:
>>> > > C:\kafka1\test-0\00000000000000000000.log ->
>>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process
>>> cannot
>>> > > access the file because it is being used by another process.
>>> > >
>>> > >         at
>>> > > sun.nio.fs.WindowsException.translateToIOException(
>>> > WindowsException.java:86)
>>> > >         at
>>> > > sun.nio.fs.WindowsException.rethrowAsIOException(
>>> > WindowsException.java:97)
>>> > >         at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
>>> > >         at
>>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
>>> > java:287)
>>> > >         at java.nio.file.Files.move(Files.java:1395)
>>> > >         at
>>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
>>> > Utils.java:697)
>>> > >         at
>>> > > org.apache.kafka.common.record.FileRecords.renameTo(
>>> > FileRecords.java:212)
>>> > >         at kafka.log.LogSegment.changeFil
>>> eSuffixes(LogSegment.scala:415)
>>> > >         at kafka.log.Log.kafka$log$Log$$a
>>> syncDeleteSegment(Log.scala:
>>> > 1601)
>>> > >         at kafka.log.Log.kafka$log$Log$$d
>>> eleteSegment(Log.scala:1588)
>>> > >         at
>>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
>>> > apply$mcI$sp$1.apply(Log.scala:1170)
>>> > >         at
>>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
>>> > apply$mcI$sp$1.apply(Log.scala:1170)
>>> > >         at
>>> > > scala.collection.mutable.ResizableArray$class.foreach(
>>> > ResizableArray.scala:59)
>>> > >         at
>>> > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>>> > >         at
>>> > > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170)
>>> > >         at kafka.log.Log$$anonfun$deleteS
>>> egments$1.apply(Log.scala:1161)
>>> > >         at kafka.log.Log$$anonfun$deleteS
>>> egments$1.apply(Log.scala:1161)
>>> > >         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
>>> > >         at kafka.log.Log.deleteSegments(Log.scala:1161)
>>> > >         at kafka.log.Log.deleteOldSegments(Log.scala:1156)
>>> > >         at kafka.log.Log.deleteRetentionMsBreachedSegme
>>> > nts(Log.scala:1228)
>>> > >         at kafka.log.Log.deleteOldSegments(Log.scala:1222)
>>> > >         at
>>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager
>>> .scala:854)
>>> > >         at
>>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager
>>> .scala:852)
>>> > >         at
>>> > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach
>>> $1.apply(
>>> > TraversableLike.scala:733)
>>> > >         at scala.collection.immutable.List.foreach(List.scala:392)
>>> > >         at
>>> > > scala.collection.TraversableLike$WithFilter.
>>> > foreach(TraversableLike.scala:732)
>>> > >         at kafka.log.LogManager.cleanupLogs(LogManager.scala:852)
>>> > >         at
>>> > > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(
>>> > LogManager.scala:385)
>>> > >         at
>>> > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
>>> > KafkaScheduler.scala:110)
>>> > >         at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
>>> > >         at
>>> > > java.util.concurrent.Executors$RunnableAdapter.call(Executor
>>> s.java:511)
>>>
>>> > >         at java.util.concurrent.FutureTask.runAndReset(
>>> > FutureTask.java:308)
>>> > >         at
>>> > > java.util.concurrent.ScheduledThreadPoolExecutor$
>>> > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>> > >         at
>>> > > java.util.concurrent.ScheduledThreadPoolExecutor$
>>> > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>> > >         at
>>> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> > ThreadPoolExecutor.java:1149)
>>> > >         at
>>> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> > ThreadPoolExecutor.java:624)
>>> > >         at java.lang.Thread.run(Thread.java:748)
>>> > >         Suppressed: java.nio.file.FileSystemException:
>>> > > C:\kafka1\test-0\00000000000000000000.log ->
>>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process
>>> cannot
>>> > > access the file because it is being used by another process.
>>> > >
>>> > >                 at
>>> > > sun.nio.fs.WindowsException.translateToIOException(
>>> > WindowsException.java:86)
>>> > >                 at
>>> > > sun.nio.fs.WindowsException.rethrowAsIOException(
>>> > WindowsException.java:97)
>>> > >                 at
>>> > > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
>>> > >                 at
>>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
>>> > java:287)
>>> > >                 at java.nio.file.Files.move(Files.java:1395)
>>> > >                 at
>>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
>>> > Utils.java:694)
>>> > >                 ... 32 more
>>> > > [2018-05-12 21:36:57,689] INFO [ReplicaFetcherManager on broker 1]
>>> > Removed
>>> > > fetcher for partitions
>>> > > __consumer_offsets-22,__consumer_offsets-30,__
>>> > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
>>> > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
>>> > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
>>> > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
>>> > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
>>> > consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_
>>> offsets-31,__
>>> > consumer_offsets-36,__consumer_offsets-42,__consumer_offsets
>>> -3,__consumer_
>>> > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
>>> > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_of
>>> fsets-17,__
>>> > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
>>> > consumer_offsets-13,__consumer_offsets-2,__consumer_
>>> > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
>>> > consumer_offsets-20,__consumer_offsets-0,__consumer_
>>> > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
>>> > consumer_offsets-45,__consumer_offsets-1,__consumer_
>>> > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
>>> > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
>>> > consumer_offsets-40
>>> > > (kafka.server.ReplicaFetcherManager)
>>> > > [2018-05-12 21:36:57,689] INFO [ReplicaAlterLogDirsManager on broker
>>> 1]
>>> > > Removed fetcher for partitions
>>> > > __consumer_offsets-22,__consumer_offsets-30,__
>>> > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
>>> > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
>>> > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
>>> > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
>>> > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
>>> > consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_
>>> offsets-31,__
>>> > consumer_offsets-36,__consumer_offsets-42,__consumer_offsets
>>> -3,__consumer_
>>> > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
>>> > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_of
>>> fsets-17,__
>>> > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
>>> > consumer_offsets-13,__consumer_offsets-2,__consumer_
>>> > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
>>> > consumer_offsets-20,__consumer_offsets-0,__consumer_
>>> > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
>>> > consumer_offsets-45,__consumer_offsets-1,__consumer_
>>> > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
>>> > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
>>> > consumer_offsets-40
>>> > > (kafka.server.ReplicaAlterLogDirsManager)
>>> > > [2018-05-12 21:36:57,751] INFO [ReplicaManager broker=1] Broker 1
>>> stopped
>>> > > fetcher for partitions
>>> > > __consumer_offsets-22,__consumer_offsets-30,__
>>> > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
>>> > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
>>> > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
>>> > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
>>> > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
>>> > consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_
>>> offsets-31,__
>>> > consumer_offsets-36,__consumer_offsets-42,__consumer_offsets
>>> -3,__consumer_
>>> > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
>>> > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_of
>>> fsets-17,__
>>> > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
>>> > consumer_offsets-13,__consumer_offsets-2,__consumer_
>>> > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
>>> > consumer_offsets-20,__consumer_offsets-0,__consumer_
>>> > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
>>> > consumer_offsets-45,__consumer_offsets-1,__consumer_
>>> > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
>>> > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
>>> > consumer_offsets-40
>>> > > and stopped moving logs for partitions  because they are in the
>>> failed
>>> > log
>>> > > directory C:\kafka1. (kafka.server.ReplicaManager)
>>> > > [2018-05-12 21:36:57,751] INFO Stopping serving logs in dir C:\kafka1
>>> > > (kafka.log.LogManager)
>>> > > [2018-05-12 21:36:57,767] ERROR Shutdown broker because all log dirs
>>> in
>>> > > C:\kafka1 have failed (kafka.log.LogManager)
>>> > >
>>> >
>>> > Could someone please share some ideas how to rectify this on Windows?
>>> If
>>> > this will never be supported on Windows, could we get some official
>>> > communication perhaps?
>>> >
>>> > Regards,
>>> >
>>>
>>
>>
>

Reply via email to