Roshan - how about posting that on the Flume wiki?
Thanks, Hari On Wed, Jul 15, 2015 at 1:07 PM, Roshan Naik <[email protected]> wrote: > Lohit, > You may want to search the mailing list for 'Flume perf measurements' . > You should find the recent measurements I posted. > -roshan > > From: lohit <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, July 15, 2015 11:19 AM > To: "[email protected]" <[email protected]> > Subject: Re: HDFS Sink performance > > Thanks for the reply Hari. Multiple Sinks make sense, but this would > also mean there is lot more files on HDFS. I will try multiple sinks and > see how fast this can go to. > Given that single HDFS stream can do much higher throughput, may be there > is way to have threadpool for SinkRunner-PollingRunner-DefaultSinkProcessor > instead of single thread per sink. > > 2015-07-15 11:11 GMT-07:00 Hari Shreedharan <[email protected]>: > >> Hi Lohit, >> >> HDFS sinks (in fact, most sinks) are single-threaded by design. This is >> meant to make writing the sinks easier, but all channels can handle >> multiple sinks reading from them. So to improve the efficiency, you >> basically configure several sinks which read off the same channel. Make >> sure that each sink though writes to files with different HDFS paths or >> different file prefixes (else HDFS client API will complain about leases). >> >> >> Thanks, >> Hari >> >> On Wed, Jul 15, 2015 at 9:10 AM, lohit <[email protected]> >> wrote: >> >>> Hello, >>> >>> Does anyone have some numbers which they can share around HDFS sink >>> performance. From our testing, for single sink writing to HDFS >>> (CompressedStream) and reading from MemoryChannel can only do about 35000 >>> events per second (each event is about 1K) in size. After compression this >>> turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our >>> configuration looks like this >>> >>> agent.sinks.hdfsSink.type = hdfs >>> agent.sinks.hdfsSink.channel = memoryChannel >>> agent.sinks.hdfsSink.hdfs.path = /tmp/lohit >>> agent.sinks.hdfsSink.hdfs.codeC = lzo >>> agent.sinks.hdfsSink.hdfs.fileType = CompressedStream >>> agent.sinks.hdfsSink.hdfs.writeFormat = Writable >>> agent.sinks.hdfsSink.hdfs.rollInterval = 3600 >>> agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 >>> agent.sinks.hdfsSink.hdfs.rollCount = 0 >>> agent.sinks.hdfsSink.hdfs.batchSize = 10000 >>> agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 >>> >>> agent.channels.memoryChannel.type = memory >>> >>> agent.channels.memoryChannel.capacity = 3000000 >>> agent.channels.memoryChannel.transactionCapacity = 10000 >>> >>> -- >>> Have a Nice Day! >>> Lohit >>> >> >> > > > -- > Have a Nice Day! > Lohit >
