Re: Failover Processor + Load Balanced Processor?

Juhani Connolly Wed, 05 Sep 2012 21:25:01 -0700

Since there was no response to this, I set up a separate ticket athttps://issues.apache.org/jira/browse/FLUME-1541 and implemented it as aSinkSelector for the LoadBalancingSinkProcessor.


Review can be found at https://reviews.apache.org/r/6939/

Chris: if you're interested you may want to give this a poke, see if itfulfills your needs. The only change in configuration needed is tochange the selector type from "round_robin" to "round_robin_backoff"


On 09/04/2012 07:39 PM, Juhani Connolly wrote:

I'm thinking of working on this(adding backoff semantics to the loadbalancing processor)
The ticket FLUME-1488 however refers to the load balancing rpcclient(or is it just poorly worded/unclear?). If it is in fact aseparate ticket I'll file one for this
Anyway, I was interested in hearing thoughts on approach. I'd haveliked to do it within the framework of the LoadBalancingSinkProcessorby adding a new Selector, however as it is now, it the processorprovides no feedback to the selectors about whether sinks are workingor not, so this can't work.
This leaves two choices: write a new SinkProcessor or modify theSinkSelector interface to give it a couple of callbacks that theprocessor calls to inform the selector of trouble. This shouldn'treally be a problem even if people have written their own selectors solong as they are extending AbstractSinkSelector which can stub thecallbacks.
Thoughts?

On 08/18/2012 02:01 AM, Arvind Prabhakar wrote:
Hi,
FYI - the load balancing sink processor does support simple failoversemantics. The way it works is that if a sink is down, it willproceed to the next sink in the group until all sinks are exhausted.The failover sink processor on the other hand does complex failurehandling and back-off such as blacklisting sinks that repeatedly failetc. The issue [1] tracks enhancing this processor to support backoffsemantics.
The one issue with your configuration that I could spot by a quickglance is that you are adding your active sinks to both the sinkgroups. This does not really work and the configuration subsystemsimply flags the second inclusion as a problem and ignores it. Bydesign, a sink can either be on its own or in one explicit sink group.
[1] https://issues.apache.org/jira/browse/FLUME-1488

Regards,
Arvind Prabhakar
On Fri, Aug 17, 2012 at 8:59 AM, Chris Neal <[email protected]<mailto:[email protected]>> wrote:
    Hi all.

    The User Guide talks about the various types of Sink Processors,
    but doesn't say whether they can be aggregated together.  A
    Failover Processor that moves between 1..n sinks is great, as is
    a Load Balancer Processor that moves between 1..n sinks, but what
    is the best would be an agent that can utilize both a Failover
    Processor AND a Load Balancer Processor!

    I've created a configuration which I believe supports this, and
    the Agent starts up and processes events, but I wanted to ping
    this group to make sure that this configuration is really doing
    what I think it is doing behind the scenes.

    Comments?

    # Define the sources, sinks, and channels for the agent
    agent.sources = avro-instance_1-source avro-instance_2-source
    agent.channels = memory-agent-channel
    agent.sinks = avro-hdfs_1-sink avro-hdfs_2-sink
    agent.sinkgroups = failover-sink-group lb-sink-group

    # Bind sources to channels
    agent.sources.avro-instance_1-source.channels = memory-agent-channel
    agent.sources.avro-instance_2-source.channels = memory-agent-channel

    # Define sink group for failover
    agent.sinkgroups.failover-sink-group.sinks = avro-hdfs_1-sink
    avro-hdfs_2-sink
    agent.sinkgroups.failover-sink-group.processor.type = failover
    agent.sinkgroups.failover-sink-group.processor.priority.avro-hdfs_1-sink
    = 5
    agent.sinkgroups.failover-sink-group.processor.priority.avro-hdfs_2-sink
    = 10
    agent.sinkgroups.failover-sink-group.processor.maxpenalty = 10000

    # Define sink group for load balancing
    agent.sinkgroups = lb-sink-group
    agent.sinkgroups.group1.sinks = avro-hdfs_1-sink avro-hdfs_2-sink
    agent.sinkgroups.group1.processor.type = load_balance
    agent.sinkgroups.group1.processor.selector = round_robin

    # Bind sinks to channels
    agent.sinks.avro-hdfs_1-sink.channel = memory-agent-channel
    agent.sinks.avro-hdfs_2-sink.channel = memory-agent-channel

    # avro-instance_1-source properties
    agent.sources.avro-instance_1-source.type = exec
    agent.sources.avro-instance_1-source.command = tail -F
    /somedir/Trans.log
    agent.sources.avro-instance_1-source.restart = true
    agent.sources.avro-instance_1-source.batchSize = 100

    # avro-instance_2-source properties
    agent.sources.avro-instance_2-source.type = exec
    agent.sources.avro-instance_2-source.command = tail -F
    /somedir/UDXMLTrans.log
    agent.sources.avro-instance_2-source.restart = true
    agent.sources.avro-instance_2-source.batchSize = 100

    # avro-hdfs_1-sink properties
    agent.sinks.avro-hdfs_1-sink.type = avro
    agent.sinks.avro-hdfs_1-sink.hostname = hdfshost1.domin.com
    <http://hdfshost1.domin.com>
    agent.sinks.avro-hdfs_1-sink.port = 10000

    # avro-hdfs_2-sink properties
    agent.sinks.avro-hdfs_2-sink.type = avro
    agent.sinks.avro-hdfs_2-sink.hostname = hdfshost2.domain.com
    <http://hdfshost2.domain.com>
    agent.sinks.avro-hdfs_2-sink.port = 10000

    # memory-agent-channel properties
    agent.channels.memory-agent-channel.type = memory
    agent.channels.memory-agent-channel.capacity = 20000
    agent.channels.memory-agent-channel.transactionCapacity = 100

    Thanks!

Re: Failover Processor + Load Balanced Processor?

Reply via email to