Steven, Any reason you are not using interceptors for that? Can you provide more detail on what you are doing?
See more about Interceptors here: http://flume.apache.org/FlumeUserGuide.html#flume-interceptors Regards Mike On Fri, Feb 8, 2013 at 3:34 AM, <[email protected]> wrote: > Hi Nitin, > > Would it be feasible to consider the addition of another extension point > with Flume for the purposes of custom filtering, enrichment, routing etc. > Without trying to envision Flume away into something it was never designed > for (i.e without going overboard) The concept of some sort of intermediate > processing unit is quite attractive to me personally as I have my dedicated > AvroSources purely for aggregating data however in the interest of > modularisation I may want to perform some enrichment/filtering exercise > before I dump the events on my durable channel. I guess the conversation of > flow and some sort of declarative way of configuring the ordering of the > processing units etc. Just thinking out loud. > > > @Nitin/Mike , your experience in the field will assist in validating this > further > > -Steve > > Quoting Nitin Pawar <[email protected]>: > > Mike, Yes >> >> I am not against the approach flume doing it. I would love to see it part >> of flume (it ofcourse helps to remove overload of one processing engine). >> As flume already supports the grouping of agents to the normal route of >> acquisition and sink can continue. >> >> In another route, we can have it to sink to a processor source of flume >> which then converts the data and runs quick analysis on data in memory and >> update the global counters kind of things which then can be sink to live >> reporting systems. >> >> Thanks, >> Nitin >> >> >> On Fri, Feb 8, 2013 at 2:26 PM, Mike Percy <[email protected]> wrote: >> >> Nitin, >>> Good to hear more of your thoughts. Please see inline. >>> >>> On Thu, Feb 7, 2013 at 8:55 PM, Nitin Pawar <[email protected]>** >>> wrote: >>> >>> I can understand the idea of having data processed inside flume by >>> >>>> streaming it to another flume agent. But do we really need to >>>> re-engineer >>>> something inside flume is what I am thinking? Core flume dev team may >>>> have >>>> better ideas on this but currently for streaming data processing storm >>>> is a >>>> huge candidate. >>>> flume does have have an open jira on this integration FLUME-1286< >>>> https://issues.**apache.org/jira/browse/FLUME-**1286<https://issues.apache.org/jira/browse/FLUME-1286> >>>> > >>>> >>>> >>> Yes, a Storm sink could be useful. But that wouldn't preclude us from >>> taking a hard look at what may be missing in Flume itself, right? >>> >>> It will be interesting to draw up the comparisons in performance if the >>> >>>> data processing logic is added to to flume. We do see currently people >>>> having a little bit of pre-processing of their data (they have their own >>>> custom channel types where they modify the data and sink it) >>>> >>>> >>> It sounds like you have some experience with Flume. Are you guys using it >>> at Rightster? >>> >>> I work with a lot of folks to set up and deploy Flume, many of which do >>> lookups / joins with other systems, transformations, etc. in real time >>> along their data ingest pipeline before writing the data to HDFS or HBase >>> for further processing and archival. I wouldn't say these are really >>> heavy >>> number crunching implementations in Flume, but certainly i see a lot of >>> inline parsing, inspection, enrichment, routing, and the like going on. I >>> think Flume could do a lot more, given the right abstractions. >>> >>> Regards, >>> Mike >>> >>> >>> >> >> -- >> Nitin Pawar >> >> > > >
