If the files are continually written to I don't think there is a good option. Can new files be written to every time interval?
On Wed, Oct 9, 2013 at 11:09 PM, Abhijeet Shipure <[email protected]>wrote: > Hi Steve, > > Thanks for quick reply, as you pointed out Exec Source does not provide > reliability, which is required in my case, and hence it is not suitable. > > So which other inbuilt source could be used to read from many files ? Just > one other requirement is file name s are also dynamically generated using > time stamp after every 5 mins. > > > Regards > Abhijeet > > > On Thu, Oct 10, 2013 at 11:22 AM, Steve Morin <[email protected]>wrote: > >> If your read the Flume manual it doesn't support a tail source >> >> http://flume.apache.org/FlumeUserGuide.html#exec-source >> >> Warning >> >> >> The problem with ExecSource and other asynchronous sources is that the >> source can not guarantee that if there is a failure to put the event into >> the Channel the client knows about it. In such cases, the data will be >> lost. As a for instance, one of the most commonly requested features is the >> tail -F [file]-like use case where an application writes to a log file >> on disk and Flume tails the file, sending each line as an event. While this >> is possible, there’s an obvious problem; what happens if the channel fills >> up and Flume can’t send an event? Flume has no way of indicating to the >> application writing the log file that it needs to retain the log or that >> the event hasn’t been sent, for some reason. If this doesn’t make sense, >> you need only know this: Your application can never guarantee data has been >> received when using a unidirectional asynchronous interface such as >> ExecSource! As an extension of this warning - and to be completely clear - >> there is absolutely zero guarantee of event delivery when using this >> source. For stronger reliability guarantees, consider the Spooling >> Directory Source or direct integration with Flume via the SDK. >> >> >> >> On Wed, Oct 9, 2013 at 10:33 PM, Abhijeet Shipure <[email protected] >> > wrote: >> >>> Hi, >>> >>> I am looking for Flume NG source that can be used for reading many files >>> which are getting continuously updated. >>> I trued Spool Dir source but it does not work if file to be read gets >>> modified. >>> >>> Here is the scenario: >>> 100 files are getting generated at one time and these files >>> are continuously updated for fixed interval say 5 mins, after 5 mins new >>> 100 files get generated and being written again for 5 mins. >>> >>> Which flume source is most suitable and how it should be used >>> effectively without any data loss. >>> >>> Any help is greatly appreciated. >>> >>> >>> Thanks >>> Abhijeet Shipure >>> >>> >> >
