Hi Chris, Check out hdfs.idleTimeout parameter. Maybe set it to 5 minutes (i.e. hdfs.idleTimeout = 300) or something.
http://flume.apache.org/FlumeUserGuide.html Regards, Mike On Thu, Mar 21, 2013 at 1:21 PM, Chris Neal <[email protected]> wrote: > Hi :) > > I have an ExecSource running a tail -F on a bunch of log files that get > rotated nightly by log4J. I want my HDFS Sink to roll them when log4J > rolls them. I tried setting all the "roll" parameters to 0, thinking a new > file handle from the ExecSource would cause the current file in HDFS to be > closed, and a new file to be created, but I'm seeing only the new file > created, and the previous days file is still there as a .tmp file, unclosed. > > I was wondering what configuration would achieve the behavior I'm after? > I was thinking a rollInterval of 24 hours, but wouldn't that cause HDFS to > roll the file at a different time than log4J rolled it? > > Thanks for the time :) > > Here is my HDFS Sink setup currently: > > # hdfs-hadoopjt01_1-sink properties > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.type = hdfs > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.path = > hdfs://nameservice1/%{path} > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.filePrefix = > %{filename}.%Y-%m-%d_1 > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollInterval = 0 > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollSize = 0 > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollCount = 0 > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.batchSize = 10000 > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.threadsPoolSize = 8 > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollTimerPoolSize = 5 > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.codeC = GzipCodec > hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.fileType = CompressedStream >
