I've setup something similar with the spooling directory source. I have a 
script that is scheduled on the app server to create an incremental file every 
minute and then drop the incremental file in the spool directory for 
processing. The use case is web logs that roll over daily, but we want events 
'near' real time. We didn't want to use the exec source as that gives no 
delivery guarantee, at least with a spooling source if the flume agent stops 
processing the incremental files stay in the spool dir until it's back up.

Hope that helps,
Paul Chavez

From: Wang, Yongkun | Yongkun | BDD [mailto:[email protected]]
Sent: Sunday, August 18, 2013 7:30 PM
To: [email protected]
Subject: sleep() in script doesn't work when called by exec Source

Hi,

I am testing with apache-flume-1.4.0-bin.
I made a naive python script for exec source to do throttling by calling 
sleep() function.
But the sleep() doesn't work when called by exec source.
Any ideas about this or do you have some simply solution for throttling instead 
of a custom source?

Flume config:


agent.sources = src1

agent.sources.src1.type = exec

agent.sources.src1.command = read-file-throttle.py

read-file-throttle.py:


#!/usr/bin/python



import time



count=0

pre_time=time.time()

with open("apache.log") as infile:

    for line in infile:

        line = line.strip()

        print line

        count += 1

        if count % 50000 == 0:

            now_time = time.time()

            diff = now_time - pre_time

            if diff < 10:

                #print "sleeping %s seconds ..." % (diff)

                time.sleep(diff)

                pre_time = now_time


Thank you very much.

Best Regards,
Yongkun Wang

Reply via email to