Hello folks,
I'm testing a Flume agent defined by a topology made of :
*JMS source* (Tibco implementation) -> *memory channel* -> *hdfs sink*
The *JMS source* has:
* my_agent.sources.my_source.batchSize = 100
The *memory channel* has:
* my_agent.channels.my_channel.capacity = 100
The *HDFS sink* has:
* my_agent.sinks.my_sink.hdfs.batchSize = 100
* my_agent.sinks.my_sink.hdfs.rollCount = 0
* my_agent.sinks.my_sink.hdfs.rollInterval = 0
* my_agent.sinks.my_sink.hdfs.idleTimeout = 0
I don't understand how/why new files on HDFS are created/closed. In
fact, when I:
1. launch the agent (JMS queue empty)
2. push a new text message on the JMS queue
It happens that a new file is created by the HDFS, but not yet closed
(as I expect). BUT, when I
3. push again a new text message on the JMS queue
regardles how much time I waited to perform step 3, the HDFS sink closes
the previously open file, then open a new one for the new incoming
message consumed from the queue and processed through the channel.
This way, files will always have 1 and only 1 message inside them. I was
expecting that number to be 100, according to the configuration
mentioned above.
Any hints?
Best regards,
Roberto