Hi Gwen, Thank you so much for the comments. I followed your suggestion and get pretty much what I want except one situation which is killing agent unexpected.
My test case is: 1 Kafka channel and 1 HDFS sink, 1 topic with 2 partitions, two agents read from same topic with same group id (default “flume”). I produced 200,000 messages evenly distributed to 2 partitions and brought up two agents to consume messages. Once the agent2 creating hfs files (with .tmp postfix), I kill it 2 seconds later (rollInterval set to 20). Then the agent1 will consume everything left but not the messages agent2 consumed but not committed the offset (at least I think the agent2 didn’t commit because the transaction didn’t finish). At the HDFS side, the .tmp file is not renamed (as expected) but the size of it is not correctly showed in HDFS. It only shows like several kb but it actually have way more than that. The messages agent1 consumed is not all messages from the topic, but add to the .tmp file that agent2 consumed, it is everything (plus some duplicate). My question is that is this the right behavior? How can I rename .tmp file even if agent is killed unexpected? Or can I make active agent consume everything after the other agent down? (Duplicate is fine in my case, the only requirement is not to lose data). Thanks for help! My configuration is below. a1.sinks = sink1 a1.channels = channel1 a1.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel a1.channels.channel1.brokerList = localhost:9093,localhost:9094 a1.channels.channel1.topic = test2 a1.channels.channel1.zookeeperConnect = localhost:2181 a1.channels.channel1.parseAsFlumeEvent = false a1.channels.channel1.timeout = 1000 a1.sinks.sink1.type = hdfs a1.sinks.sink1.hdfs.path = hdfs://corehadoop/tmp/incoming-logs a1.sinks.sink1.hdfs.filePrefix = agent1 a1.sinks.sink1.hdfs.rollInterval = 20 a1.sinks.sink1.hdfs.rollSize = 0 a1.sinks.sink1.hdfs.rollCount = 0 a1.sinks.sink1.hdfs.fileType = DataStream a1.sinks.sink1.channel = channel1 a2.sinks = sink1 a2.channels = channel1 a2.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel a2.channels.channel1.brokerList = localhost:9093,localhost:9094 a2.channels.channel1.topic = test2 a2.channels.channel1.zookeeperConnect = localhost:2181 a2.channels.channel1.parseAsFlumeEvent = false a2.channels.channel1.timeout = 1000 a2.sinks.sink1.type = hdfs a2.sinks.sink1.hdfs.path = hdfs://corehadoop/tmp/incoming-logs a2.sinks.sink1.hdfs.filePrefix = agent2 a2.sinks.sink1.hdfs.rollInterval = 20 a2.sinks.sink1.hdfs.rollSize = 0 a2.sinks.sink1.hdfs.rollCount = 0 a2.sinks.sink1.hdfs.fileType = DataStream a2.sinks.sink1.channel = channel1 > On Jul 5, 2015, at 10:14 PM, Gwen Shapira <[email protected]> wrote: > > See inline :) > > On Sun, Jul 5, 2015 at 5:37 PM, Jun MA <[email protected]> wrote: >> Hi Rufus, >> >> Thank you so much for your help, I do bypass the issue. >> >> Another question I have is that can I have two Flafka agents consume from >> one topic (agents run on same machine or different machine)? > > Yes. > >> Will the two >> agents have exact same two copies or each agent will consume one part of the >> topic? > > It depends on how you configure them. > If you configure two different consumer group IDs, they will each get a copy. > If they use the same consumer group ID, each agent will consume from > different subset of partitions. They will automatically load balance > partitions in case more agents join or if agents are stopped. > >> The purpose I want to run two agents on same topic is that I want to >> have a high availability. What I want is that two agents will consume >> different portion of the topic but if one agent down, the other one will >> consume everything from the topic. >> From what I tested, I bring up two agents on same machine with different >> agent name (everything else are the same), and only the first one can >> consume messages from topic. I’m wondering if it is the right behavior. If >> so, is there anyway I can solve this single point of failure? > > If you configure both Flafkas with same consumer group ID, you should > get the behavior you want. > Just make sure you have at least two partitions - the load balancing > behavior happens at the partition level, so you can only have as many > concurrent agents as you have partitions. > > Gwen > >> >> Bests, >> Jun >> >> On Jul 4, 2015, at 10:22 AM, Johny Rufus <[email protected]> wrote: >> >> Been looking into this for some time and found a couple of issues. Raised >> Jiras for both, there are workarounds to get past these errors. >> >> https://issues.apache.org/jira/browse/FLUME-2734 >> https://issues.apache.org/jira/browse/FLUME-2735 >> >> For timeout to take effect, you need to specify as - >> agent1.channels.channel1.timeout = 1000000 (This is a temporary work around, >> kafka.consumer.timeout.ms should work as per the guide when FLUME-2734 is >> resolved) >> >> To get past the IllegalStateException, I had to download the zookeeper jar >> from http://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper/3.3.6 >> and put it in the lib directory. This is also a workaround to get past the >> issue, until we figure out the root cause. >> >> >> Thanks, >> Rufus >> >> On Fri, Jul 3, 2015 at 10:09 PM, Jun Ma <[email protected]> wrote: >>> >>> Oops, thanks for figuring that out. >>> Any idea why unable to deliver event? >>> >>> On Fri, Jul 3, 2015 at 8:00 PM, Johny Rufus <[email protected]> wrote: >>>> >>>> There is a typo in your property "cnannel1", hence the property is not >>>> set >>>> >>>> a1.channels.cnannel1.kafka.consumer.timeout.ms = 1000000 >>>> >>>> Thanks, >>>> Rufus >>>> >>>> On Fri, Jul 3, 2015 at 4:49 PM, Jun Ma <[email protected]> wrote: >>>>> >>>>> Thanks for your reply. But from what I read, the magic things Flafka >>>>> does is that you don't need to have a source, you can directly move things >>>>> from channel to sink. >>>>> >>>>> http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/ >>>>> (see Flume's kafka channel) >>>>> >>>>> >>>>> On Fri, Jul 3, 2015 at 1:26 PM, Foo Lim <[email protected]> wrote: >>>>>> >>>>>> There's an error that you didn't specify a source: >>>>>> >>>>>> Agent configuration for 'a1' has no sources. >>>>> >>>>> >>>> >>> >> >>
