Re: Use flume to copy data in local directory (hadoop server) into hdfs

Jeff Lord Mon, 21 Oct 2013 08:51:28 -0700

Luu,

Have you tried using the spooling directory source?


-Jeff


On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[email protected]> wrote:

> Hi all,
>
> I need to copy data in a local directory (hadoop server) into hdfs
> regularly and automatically. This is my flume config:
>
> agent.sources = execSource
> agent.channels = fileChannel
> agent.sinks = hdfsSink
>
> agent.sources.execSource.type = exec
>
> agent.sources.execSource.shell = /bin/bash -c
> agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done
>
> agent.sources.execSource.restart = true
> agent.sources.execSource.restartThrottle = 3600000
> agent.sources.execSource.batchSize = 100
>
> ...
> agent.sinks.hdfsSink.hdfs.rollInterval = 0
> agent.sinks.hdfsSink.hdfs.rollSize = 262144000
> agent.sinks.hdfsSink.hdfs.rollCount = 0
> agent.sinks.hdfsSink.batchsize = 100000
> ...
> agent.channels.fileChannel.type = FILE
> agent.channels.fileChannel.capacity = 100000
> ...
>
> while hadoop command takes 30second, Flume takes arround 4 minutes to copy
> 1 gb text file into HDFS. I am worried about whether the config is not good
> or shouldn't use flume in this case?
>
> How about your opinion?
>
>

Re: Use flume to copy data in local directory (hadoop server) into hdfs

Reply via email to