Hello folks,

I'm testing a Flume agent defined by a topology made of :

*JMS source* (Tibco implementation) -> *memory channel* -> *hdfs sink*

The *JMS source* has:

 * my_agent.sources.my_source.batchSize = 100

The *memory channel* has:

 * my_agent.channels.my_channel.capacity = 100

The *HDFS sink* has:

 * my_agent.sinks.my_sink.hdfs.batchSize = 100
 * my_agent.sinks.my_sink.hdfs.rollCount = 0
 * my_agent.sinks.my_sink.hdfs.rollInterval = 0
 * my_agent.sinks.my_sink.hdfs.idleTimeout = 0

I don't understand how/why new files on HDFS are created/closed. In fact, when I:

1. launch the agent (JMS queue empty)
2. push a new text message on the JMS queue

It happens that a new file is created by the HDFS, but not yet closed (as I expect). BUT, when I

    3. push again a new text message on the JMS queue

regardles how much time I waited to perform step 3, the HDFS sink closes the previously open file, then open a new one for the new incoming message consumed from the queue and processed through the channel.

This way, files will always have 1 and only 1 message inside them. I was expecting that number to be 100, according to the configuration mentioned above.

Any hints?

Best regards,

Roberto







Reply via email to