Flume architecture questions -- failover, load balancing, durability and scalability

Lan Jiang Fri, 27 Feb 2015 09:23:16 -0800

Hi, there

I have been researching on Flume-ng for a while. I have some architecture
related questions and hope the Flume community can help me.


1. Does Flume have a agent-based (cross different nodes) failover and load
balance support? Flume does provide sink processors that has failover/load
balance support. But sink processor and sink 1.. n all exist in one agent.
If the agent itself crashes, or the node that agent resides is down, there
is nothing we can do, right? In such event, although existing events in the
channel can be preserved if you use file-based channel, any new events that
have not entered channels will be lost.  Is it something Flume going to
support in the future?

2. Flume channel seems does not duplicate the events to a different node,
as I understand. Although the file-based channel provides durablility
support, if the disk fails, the events stored in the channel will be lost.
I guess we can use SAN or RAID to mitigate the risk, but that's something
outside of Flume architecture. Is that a correct understanding? JDBC based
channel is supported, but I heard that the performance is not great.

3. How can we scale flume when event incoming rate is very high? In some of
the architecture diagrams, I see a pan out design, where an agent sits at
the front as the gatekeeper, routing messages to multiple other agents.
This design might not work if the gatekeeper agent cannot keep up with the
incoming event speed. Another choice is to have designate web server 1-4 to
send log to agent1 and web server 5-8 to send to agent2. However, what if
there is only one event end point, such as a twitter feed. You can attach
multiple agent to the same event end point, but you end up with duplicate
events. So what is the best design to make the flume topology suitable to
deal with high rate incoming events?

Flafka might help mitigating some of the issues mentioned above, but that's
relatively new and bascially taps into the features of another framework.

Thanks for the help in advance.

Lan

Flume architecture questions -- failover, load balancing, durability and scalability

Reply via email to