When I said process after writing I meant outside of flume with some kind of batch process. To be able to do this you'd have to of course use a serializer that writes out the headers as well as the event body.

If you want to do the processing in realtime, you would probably have to take the approach of replicating events and sending to a custom sink.

On 08/28/2013 09:59 PM, Anat Rozenzon wrote:
Thank you for the quick answer.

How can I process events after they have been written? is there any post-write interceptor I can code?


On Wed, Aug 28, 2013 at 11:45 AM, Juhani Connolly <[email protected] <mailto:[email protected]>> wrote:

    The most common cause of resending events from the source would be
    failure to write to the channel. Most of the time this would be
    because the channel is full.

    An approach to collecting statistics will vary on what exactly you
    want to do, but perhaps you could write metadata to headers in the
    interceptor and than batch process the serialized headers after
    events have actually been written. Or if you need to be realtime
    you can replicate events to an additional path which leads to a
    custom sink that collects statistics. So long as the sink doesn't
    "bounce" events(rollback transactions) it shouldn't get any events
    resent.

    One thing to keep in mind though is that flume in general only
    guarantees delivery, it doesn't guarantee that stuff will only be
    delivered once(though many components do only deliver once)


    On 08/28/2013 04:09 PM, Anat Rozenzon wrote:

        Hi,

        I want to get some statistics out of Flume (For example, how
        many records were collected, How many files etc.).
        I've written my own interceptor that updates an MBean whenever
        records arrive.

        I've also written a MonitorServices that collects the data
        from the MBean every X minutes and send it to a database.

        My problem is that sometimes events are resent again from the
        source, I saw that while debugging.
        Not sure why... maybe because of a timeout while sending to
        the sink?

        Anyway, if this happens in production it will corrupt my
        statistics.

        Is there any way I can know that an event have failed reaching
        the sink eventhough it passed the interceptor?
        Is there a better place to collect such statistics than an
        interceptor?

        Thanks
        Anat




Reply via email to