A, B and C are all using 1.2.0. D is using 1.1.0 (because D has an HDFS sink and I am using an old version of hadoop). ________________________________________ From: Jarek Jarcec Cecho [[email protected]] Sent: Thursday, July 26, 2012 6:15 PM To: [email protected] Subject: Re: flume non-duplication guarantees?
What version of flume were you using Mark? Based on the "end-to-end configuration" , I would say that you're using old flume (version 0.9.x). If that is true, than the duplicity is unfortunately known flow. We've significantly redesigned flume in 1.x (known as flume-ng) to avoid such issues. Jarcec On Jul 26, 2012, at 7:51 AM, Stern, Mark wrote: > I was testing flume in an end-to-end configuration where A can send to D > via B or C. A, B, C and D are all flume agents with file channels. In > the course of the test, I killed and restarted B and C. At the end of > the test. I found that all the events reached D, but 100 > events (that is my batch size on the avro sinks) were duplicated. > > Is this expected (or at least accepted) behaviour? > > Thanks, > > Mark Stern
