Josh,

I'm assuming by 'bad' event you mean one that does not have the required 
headers for tokenized paths. If that's the case there are two potential ways to 
solve this.

One way is to use multiplexing channel selectors, then you can setup a default 
path that handles any events missing the header(s). This gets unwieldy fast 
though if you are routing with multiple headers. I used this method for awhile 
but eventually abandoned it since I use 3 headers to route events.

The second way is to have a static interceptor on your first source that has 
'preserveExisting' set to true (which is default behavior). In my case we use 
two 'type' fields and I just have an interceptor set the value 
'MissingLogType', etc for each possible header. Since I bucket by these header 
values I can quickly find corrupt events this way. I use a timesampt 
interceptor in much the same way, except in that case it'll stamp the event 
with whenever the source first saw it. This can result in an event being 
bucketed in the wrong date/time partition but that's better than it gumming up 
the whole data flow.

Hope that helps,
Paul Chavez

-----Original Message-----
From: Josh [mailto:[email protected]] 
Sent: Wednesday, June 05, 2013 3:53 AM
To: [email protected]
Subject: Get Flume 'bad' event out of channel.

Hi Guys,

I know this was covered back in May (not so long ago) but was wondering if 
there has been any movement on this?

We have written a custom serializer to take data from an http data source using 
the JSON handler. The data source gets sent JSON from our pipeline, which 
checks that all needed headers are present for serialization and raises 
exceptions if not, but we have seen a few events come in that cannot be 
serialized due to missing parts of JSON or any number of other reasons. 
Currently I can't see a way to get these out of the channel without:

a) chucking out the whole channel and everything in it.
b) attaching a custom sink/serializer to the channel which is not so fussy to 
pass the event.

Neither of these really seem like great options. We are using file channels and 
all data that is written to disk looks to be in binary format. If needed, as a 
last resort, could we write a tool to pull java objects out of a channel and 
write the rest back into the channel? Are there any plans to implement anything 
of this kind already? 

As previously suggested I would be nice to be able to:

a) Dump the event to a data file and throw a warning in the log messages?
b) Throw the event away
c) Move the event to an alternate channel where it can be handled differently 



Thanks,
Josh
--
www.mydrivesolutions.com

This email and any attachments is private and confidential. If you have 
received this message in error please remove it from your systems and notify 
the author.
MyDrive Solutions Limited is registered in England and Wales, No 07330334. 
Registered office: Surrey Technology Centre, 40 Occam Road, Guildford GU2 7YG, 
UK

Reply via email to