Great post, Mike! One question if you can either address via mailing list or future posts...
I am curious about how to remove duplicated messages in this flow. For example, when I set up a switch/router to send syslog messages, I'd like to send two syslog collectors or two flume agents. In this case, the switch/router is just a dumb device, not knowing how to fail-over or load-balance. As a result, we have two copies of the same message going into flume. I have seen people describing doing hbase operations to remove duplicates, but I am wondering if we can do anything in the flume infrastructure. Thanks. -Simon On Fri, Jan 11, 2013 at 3:48 PM, Mohammad Tariq <[email protected]> wrote: > +1 > > Thank you so much Mike, for all the good work. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > > > On Sat, Jan 12, 2013 at 2:15 AM, Mike Percy <[email protected]> wrote: >> >> Thanks Brock! I've been working on this, off and on, for a while. :) >> >> >> On Fri, Jan 11, 2013 at 12:18 PM, Brock Noland <[email protected]> wrote: >>> >>> Nice post! >>> >>> On Fri, Jan 11, 2013 at 12:13 PM, Mike Percy <[email protected]> wrote: >>> > Hi folks, >>> > I just posted to the Apache blog on how to do performance tuning with >>> > Flume. >>> > I plan on following it up with a post about using the Flume monitoring >>> > capabilities while tuning. Feedback is welcome. >>> > >>> > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 >>> > >>> > Regards, >>> > Mike >>> > >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - >>> http://incubator.apache.org/mrunit/ >> >> >
