Hi Lohit,

HDFS sinks (in fact, most sinks) are single-threaded by design. This is
meant to make writing the sinks easier, but all channels can handle
multiple sinks reading from them. So to improve the efficiency, you
basically configure several sinks which read off the same channel. Make
sure that each sink though writes to files with different HDFS paths or
different file prefixes (else HDFS client API will complain about leases).


Thanks,
Hari

On Wed, Jul 15, 2015 at 9:10 AM, lohit <[email protected]> wrote:

> Hello,
>
> Does anyone have some numbers which they can share around HDFS sink
> performance. From our testing, for single sink writing to HDFS
> (CompressedStream) and reading from MemoryChannel can only do about 35000
> events per second (each event is about 1K) in size. After compression this
> turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our
> configuration looks like this
>
> agent.sinks.hdfsSink.type = hdfs
> agent.sinks.hdfsSink.channel = memoryChannel
> agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
> agent.sinks.hdfsSink.hdfs.codeC = lzo
> agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
> agent.sinks.hdfsSink.hdfs.writeFormat = Writable
> agent.sinks.hdfsSink.hdfs.rollInterval = 3600
> agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
> agent.sinks.hdfsSink.hdfs.rollCount = 0
> agent.sinks.hdfsSink.hdfs.batchSize = 10000
> agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>
> agent.channels.memoryChannel.type = memory
>
> agent.channels.memoryChannel.capacity = 3000000
> agent.channels.memoryChannel.transactionCapacity = 10000
>
> --
> Have a Nice Day!
> Lohit
>

Reply via email to