Hi Roberto, Setting the roll intervals to 0 will stop the sink rolling the files in HDFS. Try setting hdfs.rollCount to the number of messages you want to roll the file on (I.e. The number of messages per file). Bare in mind setting this low will result in higher HDFS overhead.
-- Chris Horrocks On Wed, Nov 16, 2016 at 10:35 am, Roberto Coluccio <'[email protected]'> wrote: Hello folks, I'm testing a Flume agent defined by a topology made of : JMS source (Tibco implementation) -> memory channel -> hdfs sink The JMS source has: - my_agent.sources.my_source.batchSize = 100 The memory channel has: - my_agent.channels.my_channel.capacity = 100 The HDFS sink has: - my_agent.sinks.my_sink.hdfs.batchSize = 100 - my_agent.sinks.my_sink.hdfs.rollCount = 0 - my_agent.sinks.my_sink.hdfs.rollInterval = 0 - my_agent.sinks.my_sink.hdfs.idleTimeout = 0 I don't understand how/why new files on HDFS are created/closed. In fact, when I: - launch the agent (JMS queue empty) - push a new text message on the JMS queue It happens that a new file is created by the HDFS, but not yet closed (as I expect). BUT, when I 3. push again a new text message on the JMS queue regardles how much time I waited to perform step 3, the HDFS sink closes the previously open file, then open a new one for the new incoming message consumed from the queue and processed through the channel. This way, files will always have 1 and only 1 message inside them. I was expecting that number to be 100, according to the configuration mentioned above. Any hints? Best regards, Roberto
