Hi Simone

I wonder why you’re seeing 90% CPU use when you use a file channel.  I would 
expect high disk I/O.  To counter, I have on a single server 4 spool dir 
sources, each going to a separate file channel.  Also on an SSD based server.   
I do not see any CPU or even disk IO utilization.  I am pushing about 10 
million events per day across all 4 sources and has been running reliably for 2 
years now.

I would always use a file channel, any memory channel runs the risk of data 
loss if the node were to fail.  I would be as worried about the local node 
failing seeing that a 3 node kafka cluster losing 2 nodes before it would lose 
quorum.

Not sure what your data source is, if you can add more flume nodes of course 
that would help.

Have you given ample heap space, seeing maybe GC’s causing the high CPU?


Phil


From: Simone Roselli [mailto:[email protected]]
Sent: Friday, October 09, 2015 12:33 AM
To: [email protected]
Subject: Flume-ng 1.6 reliable setup

Hi,

I'm currently plan to migrate from Flume 0.9 to Flume-ng 1.6, but I'm having 
troubles trying to find a reliable setup for this one.

My sink is a 3 nodes Kafka cluster. I must avoid to lose events in case the 
main sink is down, broken or unreachable for a while.

In Flume 0.9, I use a memory channel with the store on failure feature, which 
starts writing events on the local disk in case the target sink is not 
available.

In Flume-ng 1.6 the same behaviour would be accomplished by setting up a 
Spillable memory channel, but the problem with this solution is written in the 
end of the channel's description: "This channel is currently experimental and 
not recommended for use in production."

In Flume-ng 1.6, it's possible to setup a pool of Failover sinks. So, I was 
thinking to hypothetically configure a File Roll as Secondary sink in case the 
Primary is down. However, once the Primary sink would be back online, the data 
placed on the Secondary sink (local disk) won't be automatically pushed on the 
Primary one.

Another option would be setting up a file channel: write each event on the disk 
and then sink. Without mentioning that I don't love the idea to write/delete 
each single event continuously on a SSD, this setup is taking 90% of CPU. The 
same exactly configuration but using a memory channel takes 3%.

Other solutions to evaluate ?

Simone

Reply via email to