In your case I would look at the spooling directory source.
On Sun, Aug 18, 2013 at 9:29 PM, Wang, Yongkun | Yongkun | BDD < [email protected]> wrote: > Hi, > > I am testing with apache-flume-1.4.0-bin. > I made a naive python script for exec source to do throttling by calling > sleep() function. > But the sleep() doesn't work when called by exec source. > Any ideas about this or do you have some simply solution for throttling > instead of a custom source? > > Flume config: > > > agent.sources = src1 > agent.sources.src1.type = exec > agent.sources.src1.command = read-file-throttle.py > > > read-file-throttle.py: > > > #!/usr/bin/python > import time > > count=0 > pre_time=time.time() > with open("apache.log") as infile: > for line in infile: > line = line.strip() > print line > count += 1 > if count % 50000 == 0: > now_time = time.time() > diff = now_time - pre_time > if diff < 10: > #print "sleeping %s seconds ..." % (diff) > time.sleep(diff) > pre_time = now_time > > > > Thank you very much. > > Best Regards, > Yongkun Wang > -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
