Given a quick test (for 20k records) with patch[1] applied, the dead
lock problem looks fixed. Thanks!

[1]. https://issues.apache.org/jira/browse/FLUME-2973


On 9 February 2017 at 11:29, Chia-Hung Lin <cli...@googlemail.com> wrote:
> Thanks for the information. The maxOpenFiles value I use is the
> default one (I don't touch that config value in fact).
>
> On 8 February 2017 at 15:28, Denes Arvay <de...@cloudera.com> wrote:
>> Hi,
>>
>> Yes, it seems to be a bug, I also bumped into it.
>> It seems that the conf file poller detects change in the config file and
>> tries to stop the components and in the same time HDFS sink tries to roll a
>> file.
>> It should be solved by https://issues.apache.org/jira/browse/FLUME-2973
>>
>> From your thread dump it seems that rolling is triggered by the maxOpenFiles
>> limit, is it overridden in your config file? A very low value could increase
>> the chances of this deadlock.
>>
>> I'd also recommend to use the --no-reload-conf command line parameter if the
>> live config reload feature is not needed.
>>
>> Kind regards,
>> Denes
>>
>>
>>
>> On Mon, Feb 6, 2017 at 6:08 PM Chia-Hung Lin <cli...@googlemail.com> wrote:
>>>
>>> I use flume 1.6.0 (revision 2561a23240a71ba20bf288c7c2cda88f443c2080)
>>> for testing to move files from local file system to s3. Only a flume
>>> process is launched (a single jvm process). The problem is each time a
>>> deadlock occurs between roll timer and PollingRunner threads after
>>> running a while. A thread dumps is shown as below:
>>>
>>> "hdfs-sk-roll-timer-0":
>>>   waiting to lock monitor 0x00007f46c40b5578 (object
>>> 0x00000000e002dc90, a java.lang.Object),
>>>   which is held by "SinkRunner-PollingRunner-DefaultSinkProcessor"
>>> "SinkRunner-PollingRunner-DefaultSinkProcessor":
>>>   waiting to lock monitor 0x00007f4684004db8 (object
>>> 0x00000000e17b64d8, a org.apache.flume.sink.hdfs.BucketWriter),
>>>   which is held by "hdfs-sk-roll-timer-0"
>>>
>>> Java stack information for the threads listed above:
>>> ===================================================
>>> "hdfs-sk-roll-timer-0":
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:396)
>>>         - waiting to lock <0x00000000e002dc90> (a java.lang.Object)
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:447)
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:408)
>>>         - locked <0x00000000e17b64d8> (a
>>> org.apache.flume.sink.hdfs.BucketWriter)
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:280)
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:274)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>         at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>         at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> "SinkRunner-PollingRunner-DefaultSinkProcessor":
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:304)
>>>         - waiting to lock <0x00000000e17b64d8> (a
>>> org.apache.flume.sink.hdfs.BucketWriter)
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:163)
>>>         at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431)
>>>         at java.util.HashMap.put(HashMap.java:505)
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:407)
>>>         - locked <0x00000000e002dc90> (a java.lang.Object)
>>>         at
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>         at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> Found 1 deadlock.
>>>
>>> The setting is below:
>>>
>>> a1.sources = src
>>> a1.sinks = sk
>>> a1.channels = ch
>>> ...
>>> a1.sinks.sk.type = hdfs
>>> a1.sinks.sk.channel = ch
>>> ...
>>> a1.sinks.sk.hdfs.fileType = DataStream
>>> ...
>>> a1.sinks.k1.hdfs.rollCount = 0
>>> a1.sinks.k1.hdfs.rollSize = 0
>>> a1.sinks.k1.hdfs.rollInterval = 100
>>> ...
>>> a1.channels.ch.type = file
>>> a1.channels.ch.checkpointDir = /path/to/chechkpointDir
>>> a1.channels.ch.dataDirs = /path/to/dataDir
>>>
>>> The command to run flume is
>>>
>>> nohup ./bin/flume-ng agent --conf conf/ --conf-file test.conf --name
>>> a1 ... > /path/to/test.log 2 >&1 &
>>>
>>> Is this a bug or something I can tune to fix it?
>>>
>>> Thanks

Reply via email to