Re: FlumeNG Performance Questions

Brock Noland Wed, 07 Nov 2012 09:38:40 -0800

Hi,

What version of NG are you running? Comment below inline.


On Tue, Nov 6, 2012 at 8:10 PM, Cameron Gandevia <[email protected]>wrote:

> Hi
>
> I am trying to transition some flume nodes running FlumeOG to FlumeNG but
> am running into a few difficulties. We are writing around 16,000 events/s
> from a bunch of FlumeOG agents to a FlumeNG agent but we can't seem to get
> the FlumeNG agent to drain the memory channel fast enough. At first I
> thought maybe we were reaching the limit of a single Flume agent but I get
> similar performance using a file channel which doesn't make sense.
>
> I have tried configuring anywhere from a single hdfs sink up to twenty of
> them, I have also tried changing the batch sizes from 1000 up to 100,000
> but no matter what I do the channel fills fairly quickly.
>
> I am running a single flow using the below configuration
>
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.type = memory
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.capacity = 1000000
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.transactionCapacity =
> 100000
>
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.type =
> org.apache.flume.source.thriftLegacy.ThriftLegacySource
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.host = 0.0.0.0
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.port = 36892
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.channels =
> hdfs-memoryChannel
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.selector.type = replicating
>
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.type = hdfs
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.path =
> hdfs://${HADOOP_NAMENODE}:8020/rawLogs/%Y-%m-%d/%H00
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.codeC =
> com.hadoop.compression.lzo.LzopCodec
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.fileType = CompressedStream
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollInterval = 300
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollSize = 0
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollCount = 0
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.batchSize = 50000
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.callTimeout = 120000
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.filePrefix =
> ${FLUME_COLLECTOR_ID}_1
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.txnEventMax = 1000
>

I think this should be:

${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.txnEventMax = 50000

Spelled wrong and it should be equal to your batch size. I believe we
removed that parameter in trunk.


> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.serializer = text
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.channel = hdfs-memoryChannel
>
> Thanks
>
> Cameron Gandevia
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: FlumeNG Performance Questions

Reply via email to