The other property you will want to look at is maxOpenFiles, which is the number of file/paths held in memory at one time.
If you search for the email thread with subject "hdfs.idleTimeout ,what's it used for ?" from back in January you will find a discussion along these lines. As a quick summary, if rollInterval is not set to 0, you should avoid using idleTimeout and should set maxOpenFiles to a reasonable number (the default is 500 which is too large; I think that default is changed for 1.4). - Connor On Tue, May 21, 2013 at 9:59 AM, Tim Driscoll <[email protected]>wrote: > Hello, > > We have a Flume Agent (version 1.3.1) set up using the HDFSEventSink. We > were noticing that we were running out of memory after a few days of > running, and believe we had pinpointed it to an issue with using the > hdfs.idleTimeout setting. I believe this is fixed in 1.4 per FLUME-1864. > > Our planned workaround was to just remove the idleTimeout setting, which > worked, but brought up another issue. Since we are partitioning our data > by timestamp, at midnight, we rolled over to a new bucket/partition, opened > new bucket writers, and left the current bucket writers open. Ideally the > idleTimeout would clean this up. So instead of a slow steady leak, we're > encountering a 100MB leak every day. > > Short of upgrading Flume, does anyone know of a configuration workaround > for this? Currently we just bumped up the heap memory and I'm having to > restart our agents every few days, which obviously isn't ideal. > > Is anyone else seeing issues like this? Or how do others use the HDFS > sink to continuously write large amounts of logs from multiple source > hosts? I can get more in-depth about our setup/environment if necessary. > > Here's a snippet of the one of our 4 HDFS Sink configs: > agent.sinks.rest-xaction-hdfs-sink.type = hdfs > agent.sinks.rest-xaction-hdfs-sink.channel = rest-xaction-chan > agent.sinks.rest-xaction-hdfs-sink.hdfs.path = > /user/svc-neb/rest_xaction_logs/date=%Y-%m-%d > agent.sinks.rest-xaction-hdfs-sink.hdfs.rollCount = 0 > agent.sinks.rest-xaction-hdfs-sink.hdfs.rollSize = 0 > agent.sinks.rest-xaction-hdfs-sink.hdfs.rollInterval = 3600 > agent.sinks.rest-xaction-hdfs-sink.hdfs.idleTimeout = 300 > agent.sinks.rest-xaction-hdfs-sink.hdfs.batchSize = 1000 > agent.sinks.rest-xaction-hdfs-sink.hdfs.filePrefix = %{host} > agent.sinks.rest-xaction-hdfs-sink.hdfs.fileSuffix = .avro > agent.sinks.rest-xaction-hdfs-sink.hdfs.fileType = DataStream > agent.sinks.rest-xaction-hdfs-sink.serializer = avro_event > > -Tim >
