Yes, I understand. I was thinking initially that the buffer was on the tail side as most command line programs buffer data when not writing to a terminal. After further review, I am not sure tail behaves that way.
Can you get a jstack <pid> of the flume agent while it's waiting? What version of flume are you running? Depending on the version, the data is probably one of these places: 1) In ExecSource's BufferedReader 2) In ExecSource's batch ( > 1.2) 3) In RollingFileSink's batch ( > 1.2) Ultimately if you are concerned with data loss, tailing files is not a good option. The communication from tail is one way, beyond that, there is no guarantee that tail has started reading the file at the appropriate location. Meaning when it starts more than 10 line so of data could have been written before it starts reading. Options with no data loss would include: 1) Waiting until the file is rotated and then copying it whole 2) Modifying the application using the SDK to write to say an AvroSink 3) Syslog would be more reliable than tail as well Brock On Fri, Sep 21, 2012 at 2:46 PM, Cochran, David M (Contractor) <[email protected]> wrote: > Perhaps my explanation was unclear. Flume is tailing a log file on the > app server (sinking to another box (FILE_ROLL)) . I'm manually tailing > both the log file on the app server and the output file on the sink > server. The App server log has 10 lines of entries that have yet to be > written at the sink side, 3+ hours has elapsed since the source log was > updated. Now, if I echo another dozen or so lines to the end of the > source log, all the lines that were waiting and (some or all) newly > added lines will appear at the sink. Wash, Rinse, Repeat. > > I'm not sure where the last few lines are sitting that need to be > sent/written out, but in limbo seems bad (at least from my perspective). > > I perhaps wrongly assumed they are sitting in some sort of buffer/bucket > that is waiting to be full before sending. If this is the case, then > would periodically checking to see if there is data waiting to be > committed even if the bucket is not full seem like a good idea? > > > Dave > > > > > -----Original Message----- > From: Brock Noland [mailto:[email protected]] > Sent: Friday, September 21, 2012 2:19 PM > To: [email protected] > Subject: Re: Event flushing? > > If you are sure the lines are in the tail buffer, what you probably want > is this: > > http://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation > .html > > Which does look to, finally, be available in the latest distros like > RHEL 6.3. > > Brock > > On Fri, Sep 21, 2012 at 1:52 PM, Cochran, David M (Contractor) > <[email protected]> wrote: >> >> Is there a way to automatically flush an agent/source tailing a file >> to the sink even if the buffer is not yet full every xx seconds? >> >> Maybe that's not worded quite right... example >> >> Tailing a log file sending to File_roll sink... works like a charm, >> however if activity stops, there are still a number of lines not sent >> to the sink, apparently waiting for the buffer to fill up. This can >> be an issue for me as I want to have a script reviewing the logs on >> the sink for errors and such... but if something goes sideways and is >> recorded in the last xx lines not yet sent they could go undetected >> for a long period of time of not written to the sink. >> >> In the case I'm looking at right now, the log I'm tailing has 10 lines > >> that have not been sent to the sink, since it's Friday afternoon there > >> is little activity, actually none in the last 3 hours. Taken a step >> further if the app crashed and only wrote out 5 lines calling for >> help, they could go undetected for a long time. Anyway to flush any >> standing events to sink every 30 seconds or so? >> >> Thanks, >> Dave >> >> > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
