I use flume 1.6.0 (revision 2561a23240a71ba20bf288c7c2cda88f443c2080) for testing to move files from local file system to s3. Only a flume process is launched (a single jvm process). The problem is each time a deadlock occurs between roll timer and PollingRunner threads after running a while. A thread dumps is shown as below:
"hdfs-sk-roll-timer-0": waiting to lock monitor 0x00007f46c40b5578 (object 0x00000000e002dc90, a java.lang.Object), which is held by "SinkRunner-PollingRunner-DefaultSinkProcessor" "SinkRunner-PollingRunner-DefaultSinkProcessor": waiting to lock monitor 0x00007f4684004db8 (object 0x00000000e17b64d8, a org.apache.flume.sink.hdfs.BucketWriter), which is held by "hdfs-sk-roll-timer-0" Java stack information for the threads listed above: =================================================== "hdfs-sk-roll-timer-0": at org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:396) - waiting to lock <0x00000000e002dc90> (a java.lang.Object) at org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:447) at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:408) - locked <0x00000000e17b64d8> (a org.apache.flume.sink.hdfs.BucketWriter) at org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:280) at org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:274) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "SinkRunner-PollingRunner-DefaultSinkProcessor": at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:304) - waiting to lock <0x00000000e17b64d8> (a org.apache.flume.sink.hdfs.BucketWriter) at org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:163) at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431) at java.util.HashMap.put(HashMap.java:505) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:407) - locked <0x00000000e002dc90> (a java.lang.Object) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Found 1 deadlock. The setting is below: a1.sources = src a1.sinks = sk a1.channels = ch ... a1.sinks.sk.type = hdfs a1.sinks.sk.channel = ch ... a1.sinks.sk.hdfs.fileType = DataStream ... a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.rollSize = 0 a1.sinks.k1.hdfs.rollInterval = 100 ... a1.channels.ch.type = file a1.channels.ch.checkpointDir = /path/to/chechkpointDir a1.channels.ch.dataDirs = /path/to/dataDir The command to run flume is nohup ./bin/flume-ng agent --conf conf/ --conf-file test.conf --name a1 ... > /path/to/test.log 2 >&1 & Is this a bug or something I can tune to fix it? Thanks