Hi, Yes you if you use tail, you will eventually both lose data and get duplicates. It's better to send the events to Flume from the application generating them. Flume has a java "client" which can do this as well as a log4j appender.
Brock On Fri, Jul 27, 2012 at 11:20 PM, Jagadish Bihani < [email protected]> wrote: > Hi > > In Flume-ng is there any way using exec (tail -F) as the source to get > only the new lines which are being added to the log file ? > (i.e. there is a growing log file and we want to transfer all the logs > using flume > without duplication of logs) > > I understand if something fails and as tail doesn't maintain state we will > have duplicates. > But we are not considering failovers as of now. > > So I think "tail -F" is useful only in scenarios where sink or any > intermediate > agent can remove duplicates. Is it correct? > > But as tail looks like quite a popular source in flume I thought I might > be missing > something..... > > > Presently using "tail -F <file>" as the source to read from the log file > leads to > scenarios like this: > > 1. If file has not changed for a while, but tail still tails file every > second and then prints the same lines again (depending upon -n option) > 2. Even if file grows then using tail we can't quite control which lines > we want? > > Regards, > Jagadish > > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
