Luu, Have you tried using the spooling directory source?
-Jeff On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[email protected]> wrote: > Hi all, > > I need to copy data in a local directory (hadoop server) into hdfs > regularly and automatically. This is my flume config: > > agent.sources = execSource > agent.channels = fileChannel > agent.sinks = hdfsSink > > agent.sources.execSource.type = exec > > agent.sources.execSource.shell = /bin/bash -c > agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done > > agent.sources.execSource.restart = true > agent.sources.execSource.restartThrottle = 3600000 > agent.sources.execSource.batchSize = 100 > > ... > agent.sinks.hdfsSink.hdfs.rollInterval = 0 > agent.sinks.hdfsSink.hdfs.rollSize = 262144000 > agent.sinks.hdfsSink.hdfs.rollCount = 0 > agent.sinks.hdfsSink.batchsize = 100000 > ... > agent.channels.fileChannel.type = FILE > agent.channels.fileChannel.capacity = 100000 > ... > > while hadoop command takes 30second, Flume takes arround 4 minutes to copy > 1 gb text file into HDFS. I am worried about whether the config is not good > or shouldn't use flume in this case? > > How about your opinion? > >
