Thank you Mike, you've been a great help. I have conducted additional tests and verified event data is not lost, as you stated in your prior comment.
I appreciate it. Kind Regards, Tzur On Tue, Feb 5, 2013 at 3:31 AM, Mike Percy <[email protected]> wrote: > Hmm in case I didn't answer the whole question: > > Yes the file channel is durable and the data will persist across restarts. > > Any data written by the sink will be removed from the channel. Since Flume > is event oriented then the remaining events in the channel will be drained > when they are taken from the sink at the next opportunity. > > Regards > Mike > > > On Tuesday, February 5, 2013, Mike Percy wrote: > >> Tzur, >> The source and sink are decoupled completely. The source will fill the >> channel until there is no more work or the channel is full. So the data is >> sitting buffered in the channel until the sink removes it. >> >> Hope that explains things. Let me know if anything is unclear. >> >> Regards, >> Mike >> >> On Friday, February 1, 2013, Tzur Turkenitz wrote: >> >>> Mike, so when the data is committed to the channel, and the channel is >>> of type "File" then when the agent will be restarted the data will continue >>> to flow onto the sink? >>> And if only 20% of the data passed onto the sink before it crashed then >>> a "Replay" will be done to resend the whole data? >>> >>> Just trying to grasp the basics.... >>> >>> >>> >>> >>> On Fri, Feb 1, 2013 at 4:56 AM, Mike Percy <[email protected]> wrote: >>> >>>> Tzur, that is expected, because the data is committed by the source >>>> onto the channel. Sources and sinks are decoupled, they only interact via >>>> the channel, which buffers the data and serves to mitigate impedance >>>> mismatches. >>>> >>>> >>>> >>>> On Thu, Jan 31, 2013 at 2:35 PM, Tzur Turkenitz <[email protected]>wrote: >>>> >>>>> Hello all, >>>>> >>>>> I am running HDP 1.2 and Flume 1.3. I have a flume setup which >>>>> includes a >>>>> (1) - Load Balancer that uses SpoolDir adapter and sends events to >>>>> Avro sinks >>>>> (2) - Agents which consume the data using an avro source and writing >>>>> to hdfs. >>>>> >>>>> During testing I noticed that there's a dissonance between the Load >>>>> Balancer and the Consumers... >>>>> When a Load Balancer process a file it marks it as COMPLETED, even if >>>>> the consumer has crashed while writing to HDFS. >>>>> >>>>> A preferred behavior would be the Load Balancer to wait until the >>>>> consumer commits its transaction and reports it as successful before the >>>>> file is marked as COMPLETED. This does not allow me to verify which files >>>>> has been loaded successfully if an agent has crashed and recovery is in >>>>> process. >>>>> >>>>> Have I miss-configured my Agents or this is actually the desired >>>>> behavior? >>>>> >>>>> >>>>> Kind Regards, >>>>> Tzur >>>>> >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Tzur Turkenitz >>> Vision.BI >>> http://www.vision.bi/ >>> >>> "*Facts are stubborn things, but statistics are more pliable*" >>> -Mark Twain >>> >> -- Regards, Tzur Turkenitz Vision.BI http://www.vision.bi/ "*Facts are stubborn things, but statistics are more pliable*" -Mark Twain
