Re: Recommendation of parameters for better performance with File Channel

Juhani Connolly Wed, 19 Dec 2012 01:24:36 -0800

Hi Jagadish,

You may want to check out the mails "Re: Flume 1.3.0 - NFS + FileChannel Performance"

It turns out the changes in 1609 affect FileChannel performance a fairbit(even normal non-nfs file systems). We ran a version of 1.3 from anearlier trunk, and took a big performance hit when we switched to the1.3 release. I isolated it the FLUME-1609 patch. After building the 1.4trunk and installing, performance was back to normal.


On 12/18/2012 08:05 PM, Jagadish Bihani wrote:

Hi

Thanks for the inputs Hari and Brock.

I had tried for batch size 10000; and throughput increased to 1.8 from1.5 MB/sec.Then I used multiple HDFS sinks which read from the same channel andI could get around

2.3 MB/sec.

Regards,
Jagadish



On 12/13/2012 03:14 AM, Hari Shreedharan wrote:

Yep, each sink with a different prefix will work fine too. Mysuggestion was just meant to avoid collision - file prefixes are goodenough for that.


--
Hari Shreedharan

On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote:

Hari,
If each sink uses a different file prefix, what's the need to write to
multiple HDFS directories.
All our sinks write to the same HDFS directory and each uses a unique
file prefix, and it seems to work fine.
Also haven't found anything in flume code or HDFS APIs which suggest
that two sinks can't write to the same directory.

Just curious.
thanks


On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan
<[email protected] <mailto:[email protected]>> wrote:

Also note that having multiple sinks often improves performance -though youshould have each sink write to a different directory on HDFS. Sinceeachsink really uses only on thread at a time to write, having multiplesinksallows multiple threads to write to HDFS. Also if you can spareadditionaldisks on your Flume agent machine for file channel datadirectories, that

will also improve performance.



Hari

--
Hari Shreedharan

On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote:

Hi,

Why not try increasing the batch size on the source and sink to 10,000?

Brock

On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani

<[email protected]<mailto:[email protected]>> wrote:



I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3.


On 12/12/2012 03:35 PM, Jagadish Bihani wrote:


Hi

I am able to write maximum 1.5 MB/sec data to HDFS (withoutcompression)

using File Channel. Are there any recommendations to improve the
performance?
Has anybody achieved around 10 MB/sec with file channel ? If yes please
share the
configuration like (Hardware used, RAM allocated and batch sizes of
source,sink and channels).

Following are the configuration details :
========================

I am using a machine with reasonable hardware configuration:
Quadcore 2.00 GHz processors and 4 GB RAM.

Command line options passed to flume agent :
-DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote
-XX:MaxDirectMemorySize=2g"

Agent Configuration:
=============
agent.sources = avro-collection-source spooler
agent.channels = fileChannel
agent.sinks = hdfsSink fileSink

# For each one of the sources, the type is defined

agent.sources.spooler.type = spooldir
agent.sources.spooler.spoolDir =/root/test_data
agent.sources.spooler.batchSize = 1000
agent.sources.spooler.channels = fileChannel

# Each sink's type must be defined
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test

agent.sinks.hdfsSink.hdfs.fileType =DataStream
agent.sinks.hdfsSink.hdfs.rollSize=0
agent.sinks.hdfsSink.hdfs.rollCount=0
agent.sinks.hdfsSink.hdfs.batchSize=1000
agent.sinks.hdfsSink.hdfs.rollInterval=60

agent.sinks.hdfsSink.channel= fileChannel

agent.channels.fileChannel.type=file
agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13

agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13

Regards,
Jagadish




--

Apache MRUnit - Unit testing MapReduce -http://incubator.apache.org/mrunit/

Re: Recommendation of parameters for better performance with File Channel

Reply via email to