Yep my bad, typo :)
On Wed, Jan 23, 2013 at 1:04 PM, Roshan Naik <[email protected]> wrote: > Thats SpoolDirectorySource.java .. i thought you referred to > SpoolingFileSource > earlier. i assume that was a typo ? > > > On Wed, Jan 23, 2013 at 11:53 AM, Mike Percy <[email protected]> wrote: > >> >> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java >> >> >> On Tue, Jan 22, 2013 at 9:23 PM, Roshan Naik <[email protected]>wrote: >> >>> Mike, >>> Where is the SpoolingFileSource that you are referring to ? >>> >>> -roshan >>> >>> >>> On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <[email protected]> wrote: >>> >>>> Hi Roshan, >>>> Yep in general I'd have concerns w.r.t. capacity planning and garbage >>>> collector behavior for large events. Flume holds at least one event batch >>>> in memory at once, depending on # of sources/sinks, and even with a batch >>>> size of 1 if you have unpredictably large events there is nothing >>>> preventing an OutOfMemoryError in extreme cases. But if you plan for >>>> capacity and test thoroughly then it can be made to work. >>>> >>>> Regards, >>>> Mike >>>> >>>> >>>> On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[email protected]>wrote: >>>> >>>>> i recall some discussion with regards to being cautious on the size of >>>>> the events (in this case the file being moved) as flume is not quite >>>>> intended for large events. Mike perhaps you can throw some light on that >>>>> aspect ? >>>>> >>>>> >>>>> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[email protected]>wrote: >>>>> >>>>>> Check out the latest changes to SpoolingFileSource w.r.t. >>>>>> EventDeserializers on trunk. You can deserialize a whole file that way if >>>>>> you want. Whether that is a good idea depends on your use case, though. >>>>>> >>>>>> It's on trunk, lacking user docs for the latest changes but I will >>>>>> try to hammer out updated docs soon. In the meantime, you can just look >>>>>> at >>>>>> the code and read the comments. >>>>>> >>>>>> Regards, >>>>>> Mike >>>>>> >>>>>> On Monday, January 21, 2013, Nitin Pawar wrote: >>>>>> >>>>>>> you cant configure it to send the entire file in an event unless you >>>>>>> have fixed number of events in each of the files. basically it reads the >>>>>>> entire file into a channel and then starts writing. >>>>>>> >>>>>>> so as long as you can limit the events in the file, i think you can >>>>>>> send entire file as a transaction but not as a single event >>>>>>> as long as I understand flume treats individual lines in the file as >>>>>>> event >>>>>>> >>>>>>> if you want to pull the entire file then you may want to implement >>>>>>> that with messaging queues where you send an event to activemq queue and >>>>>>> then your consumer may pull the file in one transaction with some other >>>>>>> mechanism like ftp or scp or something like that >>>>>>> >>>>>>> others will have better idea, i am just suggesting a crude way to >>>>>>> get the entire file as a single event >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> As far as I know, Directory Spooling Source will send the file line >>>>>>>> by line as an event, and File Roll Sink will receive these lines and >>>>>>>> roll >>>>>>>> up to a big file by a fixed interval. Is it right, and can we config >>>>>>>> it to >>>>>>>> send the whole file as an event? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> why don't you use directory spooling ? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> When using Flume to collect log files, we want to just COPY the >>>>>>>>>> original files from several servers to a central storage (unix file >>>>>>>>>> system), not to roll up to a big file. Because we must record some >>>>>>>>>> messages >>>>>>>>>> of the original file such as name, host, path, timestamp, etc. >>>>>>>>>> Besides, we >>>>>>>>>> want to guarantee total reliability: no file miss, no file >>>>>>>>>> reduplicated. >>>>>>>>>> >>>>>>>>>> It seems that, in Source, we must put a whole file (size may be >>>>>>>>>> between 100KB and 100MB) into a Flume event; and in Sink, we must >>>>>>>>>> write >>>>>>>>>> each event to a single file. >>>>>>>>>> >>>>>>>>>> Is it practicable? Thanks! >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Regards, >>>>>>>>>> Henry Ma >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nitin Pawar >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Henry Ma >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nitin Pawar >>>>>>> >>>>>> >>>>> >>>> >>> >> >
