Hi
Thanks Jeff and Connor. That was helpful.
@Connor :
Scenario:
=======
We have a scenario where we receive logs where each record contains
several
fields (around 100). We want to send every field of every record as it
is to hdfs and send only
"subset" of fields of "every"record to another sink (say Storm).
So in this we don't filter records, we are filtering some fields of
every record in
one channel and none in other.
Hence, apparently the solution of channel selector will not work here
and custom event serializer might need to be written.
As mentioned by you if there are any directions related to that are
available it will be of great help.
Regards,
Jagadish
On 04/23/2013 12:22 PM, Connor Woodson wrote:
Some more thoughts on this:
The way Interceptors are currently set to work is that they apply to
an event as it is received. There are good uses for this - for
instances, it allows easily configuring a single Timestamp interceptor
that gives all events a source receives a timestamp, so even if you
have multiple sinks/channels responding to an event, you only have
that one interceptor. Interceptors in this sense serve to add data to
event headers, and as such it makes sense to have them applied only
once by the source instead of letting the channels change header data.
If you wish to use an interceptor in the above way, to modify header
data, and still want that interceptor to apply for a single channel,
then if you don't mind could you elaborate on what you are trying to
do? I haven't been able to come up with a situation like that. The
solution here would be to do as Jeff suggested and use a serializer;
if you want more in-depth instructions on how to build it, please ask;
I have a set of directions lying around somewhere that I'll find for you.
However, the way Interceptors work I have myself faced a situation
where I would like the interceptors to be channel only. This use case
is when I want to use an Interceptor to filter events; I want to send
an event to some subset of channels based on the contents of its data.
Here is how you can do this in the current setup (where Interceptors
are applied at the source instead of per-channel):
Using the Multiplexing Channel Selector you are able to choose which
channels an event is written to based off of the value of a specified
header (documentation in that link). There are some more features to
the selector that aren't documented, called Optional Channels or
something, but I don't know very much about them - just figured I
would point out that they exist; digging through the source should
provide some more insight.
So here is how you want to set your system up. Create an Interceptor
that will define a certain header value based off of the event's
contents. For instance, if you want all events containing exactly 1
character to be sent to a channel, you could create an Interceptor
that counts the characters in the event. Then that Interceptor will
set a certain header value to "SINGLE" if there is just one character,
or "MULTIPLE" if there are more.
Then you can create your channel selector like this (modified from the
documentation example):
a1.sources = r1
a1.channels = all_events single_events multiple_events
a1.sources.r1.interceptors = your_interceptor
a1.sources.r1.interceptors.your_interceptor.header = header
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = header
a1.sources.r1.selector.mapping.SINGLE = all_events single_events
a1.sources.r1.selector.mapping.MULTIPLE = all_events multiple_events
a1.sources.r1.selector.default = all_events
The result is that now you have created a way to filter which channels
a certain event is sent to. Note that a channel can appear more than
once - for instance, all_events will get all events. And so the trick
is to just define the right interceptor (which are much simpler to
code than a serializer (which itself is fairly easy)).
Hopefully that was clear. Feel free to ask more questions,
- Connor
On Fri, Apr 19, 2013 at 11:14 AM, Jeff Lord <[email protected]
<mailto:[email protected]>> wrote:
Jagadish,
Here is an example of how to write a custom serializer.
https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java
-Jeff
On Fri, Apr 19, 2013 at 9:34 AM, Jeff Lord <[email protected]
<mailto:[email protected]>> wrote:
Hi Jagadish,
Have you considered using a custom event serializer to modify
your event?
Its possible to replicate your flow using two channels and
then have one sink that implements a custom serializer to
modify the event.
-Jeff
On Tue, Apr 16, 2013 at 11:12 PM, Jagadish Bihani
<[email protected]
<mailto:[email protected]>> wrote:
Hi
If anybody has any inputs on this that will surely help.
Regards,
Jagadish
On 04/16/2013 12:06 PM, Jagadish Bihani wrote:
Hi
We have a use case in which
1. spooling source reads data.
2. It needs to write events into multiple channels. It
should apply
interceptor only when putting into one channel and
should put
the event as it is while putting into another channel.
Possible approach we have thought:
1. Create 2 different sources and then apply
interceptor on one and dont
apply on other. But that duplicates reads and
increases IO.
Is there any better way of achieving this use case?
Regards,
Jagadish