The only missing link within the Flume architecture I see in this conversation is the actual channel's and brokers themselves which orchestrate this lovely undertaking of data collection. One opportunity I do see (and I may be wrong) is for the data to offloaded into a system such as Apache Mahout before being sent to the sink. Perhaps the concept of a ChannelAdapter of sorts? I.e Mahout Adapter ? Just thinking out loud and it may be well out of the question.
Thanks Steve From: Nitin Pawar <[email protected]> Reply-To: <[email protected]> Date: Thu, 7 Feb 2013 16:52:12 +0530 To: <[email protected]> Subject: Re: Analysis of Data 1) Flume is isolated distributed system in the sense one agent does not idea about any other agent 2) Flume in the sense when needs to collect data from multiple references and work across different data sets, it may not have the entire data set needed 3) let us assume we have required data on agents for processing it in batches, do we really want to pressurize a live production server for data processing which can be done by systems like storm or hadoop or other system? these are my ideas .. i can be totally wrong but just from systems point of view it looks good option to keep data acquisition separate from data processing and then storing the processed data for further data serving On Thu, Feb 7, 2013 at 4:29 PM, Mike Percy <[email protected]> wrote: > Let's take this conversation further. What is missing? > > > On Thu, Feb 7, 2013 at 2:39 AM, Inder Pall <[email protected]> wrote: >> flume is a platform to get events to the right sink (HDFS, local-file, ....) >> analytics is not something which falls in it's territory >> >> - Inder >> >> >> On Thu, Feb 7, 2013 at 3:22 PM, Surindhar <[email protected]> wrote: >>> Hi, >>> >>> Does Flume supports Analysis of Data? >>> >>> Br, >>> >>> >> >> >> >> -- >> - Inder >> "You are average of the 5 people you spend the most time with" > -- Nitin Pawar
