Roshan - how about posting that on the Flume wiki?

Thanks,
Hari

On Wed, Jul 15, 2015 at 1:07 PM, Roshan Naik <[email protected]> wrote:

>  Lohit,
> You may want to search the mailing list for 'Flume perf measurements' .
> You should find the recent measurements I posted.
> -roshan
>
>   From: lohit <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Wednesday, July 15, 2015 11:19 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: HDFS Sink performance
>
>   Thanks for the reply Hari. Multiple Sinks make sense, but this would
> also mean there is lot more files on HDFS. I will try multiple sinks and
> see how fast this can go to.
> Given that single HDFS stream can do much higher throughput, may be there
> is way to have threadpool for SinkRunner-PollingRunner-DefaultSinkProcessor
> instead of single thread per sink.
>
> 2015-07-15 11:11 GMT-07:00 Hari Shreedharan <[email protected]>:
>
>> Hi Lohit,
>>
>>  HDFS sinks (in fact, most sinks) are single-threaded by design. This is
>> meant to make writing the sinks easier, but all channels can handle
>> multiple sinks reading from them. So to improve the efficiency, you
>> basically configure several sinks which read off the same channel. Make
>> sure that each sink though writes to files with different HDFS paths or
>> different file prefixes (else HDFS client API will complain about leases).
>>
>>
>> Thanks,
>> Hari
>>
>> On Wed, Jul 15, 2015 at 9:10 AM, lohit <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>>  Does anyone have some numbers which they can share around HDFS sink
>>> performance. From our testing, for single sink writing to HDFS
>>> (CompressedStream) and reading from MemoryChannel can only do about 35000
>>> events per second (each event is about 1K) in size. After compression this
>>> turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our
>>> configuration looks like this
>>>
>>>  agent.sinks.hdfsSink.type = hdfs
>>> agent.sinks.hdfsSink.channel = memoryChannel
>>> agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
>>> agent.sinks.hdfsSink.hdfs.codeC = lzo
>>> agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
>>> agent.sinks.hdfsSink.hdfs.writeFormat = Writable
>>> agent.sinks.hdfsSink.hdfs.rollInterval = 3600
>>> agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
>>> agent.sinks.hdfsSink.hdfs.rollCount = 0
>>> agent.sinks.hdfsSink.hdfs.batchSize = 10000
>>> agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>>>
>>>  agent.channels.memoryChannel.type = memory
>>>
>>>  agent.channels.memoryChannel.capacity = 3000000
>>> agent.channels.memoryChannel.transactionCapacity = 10000
>>>
>>>  --
>>> Have a Nice Day!
>>> Lohit
>>>
>>
>>
>
>
>  --
> Have a Nice Day!
> Lohit
>

Reply via email to