Nitin, Good to hear more of your thoughts. Please see inline. On Thu, Feb 7, 2013 at 8:55 PM, Nitin Pawar <[email protected]> wrote:
I can understand the idea of having data processed inside flume by > streaming it to another flume agent. But do we really need to re-engineer > something inside flume is what I am thinking? Core flume dev team may have > better ideas on this but currently for streaming data processing storm is a > huge candidate. > flume does have have an open jira on this integration > FLUME-1286<https://issues.apache.org/jira/browse/FLUME-1286> > Yes, a Storm sink could be useful. But that wouldn't preclude us from taking a hard look at what may be missing in Flume itself, right? It will be interesting to draw up the comparisons in performance if the > data processing logic is added to to flume. We do see currently people > having a little bit of pre-processing of their data (they have their own > custom channel types where they modify the data and sink it) > It sounds like you have some experience with Flume. Are you guys using it at Rightster? I work with a lot of folks to set up and deploy Flume, many of which do lookups / joins with other systems, transformations, etc. in real time along their data ingest pipeline before writing the data to HDFS or HBase for further processing and archival. I wouldn't say these are really heavy number crunching implementations in Flume, but certainly i see a lot of inline parsing, inspection, enrichment, routing, and the like going on. I think Flume could do a lot more, given the right abstractions. Regards, Mike
