Yes, It is very similar. The spool directory will keep getting new files. We need to scan through the directory, send the data in the existing files to HDFS , cleanup the files (delete / move/ rename, etc) and scan for new files again. The Spooldir source is not available yet, right?
Thanks, Sadu On Tue, Oct 16, 2012 at 10:11 AM, Brock Noland <[email protected]> wrote: > Sounds like https://issues.apache.org/jira/browse/FlUME-1425 ? > > Brock > > On Mon, Oct 15, 2012 at 11:37 PM, Sadananda Hegde <[email protected]> > wrote: > > Hello, > > > > I have a scenario where in the client application is continuously pushing > > xml messages. Actually the application is writing these messages to files > > (new files; same directory). So we will be keep getting new files > throughout > > the day. I am trying to configure Flume agents on these applcation > servers > > (4 of them) to pick up the new data and transfer them to HDFS on a hadoop > > cluster. How should I configure my source to pick up new files (and > exclude > > the files that have been processed already)? I don't think Exec source > with > > tail -F will work in this scenario because data is not getting added to > > existing files; rather new files get created. > > > > Thank you very much for your time and support. > > > > Sadu > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ >
