Exec source doesn't flush the last data

larryzhang Tue, 12 Mar 2013 20:34:18 -0700

Hi,

I did a simple test about exec source, and found it didn't flush thelast data. Here's the steps:*a. create the source file 1.test, which has sequence number from 1 to15, like this:*

    ----------
          1
          2
         ...
          15
   ----------
*b. create the configure file flume_simple.conf like this:*
-------------------------
       a1.sources = r1
a1.channels = c1
a1.sinks = k1


a1.sources.r1.type = exec

a1.sources.r1.command = tail -n +0 -F/opt/scripts/tvhadoop/flume/flume-1.3.0/source/1.test

a1.sources.r1.channels = c1
a1.sources.r1.batchSize = 10

a1.channels.c1.type = memory

a1.sinks.k1.type = file_roll
a1.sinks.k1.channel = c1
a1.sinks.k1.sink.directory = /opt/scripts/tvhadoop/flume/flume-1.3.0/sink
---------------------
*c. run flume with command: *

bin/flume-ng agent --conf conf -f conf/flume_simple.conf-Dflume.root.logger=DEBUG,console -n a1

After more than 1 minute(file roll interval), I check the outputdirectory, there are 2 files, one has number from 1 to 10, and the otherhas nothing.*I think this is because the batchSize was set to 10, the the last 5numbers didn't get flushed and lost.* Even I apply the patch in'https://issues.apache.org/jira/browse/FLUME-1819', nothing changed. IfI debug into the code, *I found the red codes outside while clause neverget executed*.

       ----------------
          while ((line = reader.readLine()) != null) {
            counterGroup.incrementAndGet("exec.lines.read");
            eventList.add(EventBuilder.withBody(line.getBytes()));
            if(eventList.size() >= bufferCount) {
              channelProcessor.processEventBatch(eventList);
              eventList.clear();
            }
          }
if(!eventList.isEmpty()) {
            channelProcessor.processEventBatch(eventList);
          }
       --------------

In my scenario, the source log files are divided by hour, so I needto change the file name in flume configure file. Because of the abovebug, I can only set the batchsize of execSource to 1, whichsignificantly slowdown the through pass. I wonder how to solve thisproblem. Any suggestions are most welcomed.

Best Regards,
larry

Exec source doesn't flush the last data

Reply via email to