I remember I had similar experience with 1.1.0. I suggest to download the 1.2.0 and try it again.
Regards, Yongkun Wang On 12/08/01 21:41, "Christian Schroer" <[email protected]> wrote: >Hi, > >i have some trouble setting up the HDFS sink in Flume-NG (CDH3U4, 1.1.0): > >Here's my sink configuration: > >agent.sinks.hdfsSinkSMP.type = hdfs >agent.sinks.hdfsSinkSMP.channel = memoryChannel >agent.sinks.hdfsSinkSMP.hdfs.filePrefix = flumenode1 >agent.sinks.hdfsSinkSMP.hdfs.fileType = SequenceFile >agent.sinks.hdfsSinkSMP.hdfs.codeC = gzip >agent.sinks.hdfsSinkSMP.hdfs.rollCount = 0 >agent.sinks.hdfsSinkSMP.hdfs.batchSize = 1 >agent.sinks.hdfsSinkSMP.hdfs.rollInterval = 15 >agent.sinks.hdfsSinkSMP.hdfs.rollSize = 0 >agent.sinks.hdfsSinkSMP.hdfs.path = >hdfs://namenode/user/hive/warehouse/someDatabase.db >/someTable/%Y-%m-%d/%H00/%M/somePartion > >Events are genereated by a SyslogTcp source. We write the data into hive >partions. This works, it just keeps open a lot of .tmp files. I disabled >event count and size based file rolling, just enabled the interval to >have the files closed after 15 seconds. But flume keeps files open much >longer than 15 seconds (sometimes for hours or even never closing them). >Also stopping flume keeps .tmp files in those directories. Sometimes it >opens new files in partions without having any data for those. Maybe I'm >doing the file rolling completely wrong? > >Some hive jobs use 5 minutes old data, but if flume renames a file after >job start, the job fails. That's the reason why I want to close the files >after 15 seconds. New files are no problems. > >Anyone has an idea? > >Best regards, >Christian >
