This is interesting. I believe Johny is actually looking into this performance issue.
Thanks, Hari On Thu, Jul 23, 2015 at 9:27 AM, lohit <[email protected]> wrote: > Majority of messages need not be persisted to disk for us. So, we are > interested in MemoryChannel. > There has been gradual performance degradation from 1.3.1 -> 1.4.0 -> > 1.6.0. > See this graph below, were I have a constant stream of messages (blue > line). While this is happening I swap different versions of flumes for > agent. > Orange line shows messages dropped. (Flat line is when data is streamed to > HDFS) and I have marked flat lines with different versions. > > > > 2015-07-22 19:48 GMT-07:00 Roshan Naik <[email protected]>: > >> >> My guess is that most of you will probably use File channel in >> production with HDFS sink? In which scenario the common observation seems >> to be that the File channel becomes the primary bottleneck. Going by >> Robert's observations too seems to have dropped also since v1.3. >> >> Robert, can u confirm how many data dirs were used for your readings >> with FCh ? >> >> -roshan >> >> >> >> From: lohit <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Wednesday, July 22, 2015 3:01 PM >> To: "[email protected]" <[email protected]> >> >> Subject: Re: HDFS Sink performance >> >> Thanks for sharing these number Robert. Curious, I did the same >> experiment. >> Flume 1.3.1 version has higher throughput than 1.6.0 (I was able to get >> sustained 60MB/s with Flume 1.3.1) >> No config or setup change, just changing flume version shows this >> difference. We should probably look at change set between 1.3.1 and 1.5 to >> see if there was any obvious changes. >> >> 2015-07-22 14:00 GMT-07:00 Robert B Hamilton <[email protected]>: >> >>> Here is a comparison between versions 1.3, 1.5, and 1.6. >>> I would estimate that error bars are plus or minus 15%. >>> >>> All parameters are identical, as between runs all I change is the >>> version of flume. >>> Lohit’s numbers are fairly consistent with this, because if we double >>> the sinks from my 4 to his 8 and assuming linear scalability we would >>> expect to get somewhere close to 30-40MB/s. >>> >>> It looks like the drop off is more pronounced for the larger event >>> size. This is of concern to us because we are looking at this for a high >>> volume feed with message sizes up to 80 kB. >>> >>> ------------------------------------------ >>> HDFSx4 sink, Memory channel >>> -------------------------------------- >>> Payload V1.3 v1.5 v1.6 >>> (kB) MB/s >>> ---------- ----- ----- ----- >>> 1 27 17 20 >>> 25 56 15 15 >>> >>> >>> >>> From: Hari Shreedharan [mailto:[email protected]] >>> Sent: Wednesday, July 22, 2015 1:27 PM >>> To: [email protected] >>> Subject: Re: HDFS Sink performance >>> >>> That is a bit disconcerting. Are you using the same HDFS setup and same >>> config for both tests? Would it be possible for you to take a look at Flume >>> 1.6.0? Such drops in performance should be taken care of. >>> >>> >>> >>> Thanks, >>> Hari >>> >>> On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton < >>> [email protected]> wrote: >>> My mailer totally scrambled the numbers, probably by inserting special >>> characters. >>> Sorry, here are the actual results.... >>> >>> All rates in MB/s >>> Payload in KB >>> >>> Flume 1.3.1 >>> Payload rate memchRate Fch >>> 25 34 29 >>> 25 31 27.6 >>> 25 50 23.3 >>> 25 46.5 27.2 >>> 50 31.3 23.8 >>> 50 37.4 31.3 >>> 50 32.3 31.8 >>> 80 30.5 25.8 >>> 80 46.2 25.2 >>> 80 39.1 25.8 >>> 80 56.5 25.1 >>> >>> Flume 1.5. >>> Payload rate memchRate Fch >>> 25 18.7 15.6 >>> 50 18.3 17.3 >>> 80 18.4 15.6 >>> >>> -----Original Message----- >>> From: Robert B Hamilton [mailto:[email protected]] >>> Sent: Wednesday, July 22, 2015 11:00 AM >>> To: [email protected] >>> Subject: RE: HDFS Sink performance >>> >>> I only see that kind of throughput for event sizes of 25kB to 50kB or >>> larger. >>> >>> These particular tests are done on flume version 1.3.1. >>> But because you asked, I thought to do a few quick runs on 1.5.0.1 and >>> added those results below. The results are significantly different for 1.5 >>> and I wonder if this is a cause for concern. >>> >>> None of this has been peer reviewed so it should be considered as >>> tentative. >>> >>> As to the HDD, here is result of a quick and dirty dd test. >>> >>> dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync >>> 104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s >>> >>> >>> Source data: each record consists of random ascii strings of constant >>> length (25k,50k,or 80k depending on the run). >>> Source: spooldir >>> Channel: file channel single dataDir, or memory channel. >>> Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20 >>> seconds. >>> >>> Batch size was kept small because of memory channel capacity. Increasing >>> batch size for file channel did not improve performance so I kept it at 10. >>> >>> Here I have numbers for some runs where the payload is varied from >>> 25K,50K, and 80K. I include memory channel for comparison. >>> >>> Multiple runs were peformed for each event size. As you can see the >>> throughput can vary from run to run because these particular measurements >>> were done on an environment that is not tightly controlled. Think of them >>> as "in situ" measurements :) >>> >>> Flume 1.3.1 memory channel and file channel >>> ------------------------------------------------------- >>> Payload Rate memch Rate(filechl) >>> (kB)(MB/s) (MB/s) >>> ----------------------------------------------------- >>> 253429 >>> 253127.6 >>> 255023.3 >>> 2546.527.2 >>> 5031.223.8 >>> 5037.431.3 >>> 5032.331.8 >>> 8030.525.8 >>> 8046.225.2 >>> 8039.125.8 >>> 8056.525.1 >>> >>> >>> Flume 1.5 File Channel and Memory Channel >>> --------------------------------------------------- >>> Event size Rate memch Rate filech >>> (KB) (MB/s) (MB/s) >>> --------------------------------------------------- >>> 2518.715.6 >>> 5018.317.3 >>> 8018.415.6 >>> >>> -----Original Message----- >>> From: Roshan Naik [mailto:[email protected]] >>> Sent: Friday, July 17, 2015 6:21 PM >>> To: [email protected] >>> Subject: Re: HDFS Sink performance >>> >>> I Updated the Flume wiki with my measurements. Also added section with >>> Hive sink measurements. >>> >>> >>> https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+ >>> -+round+2 >>> <https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+-+round+2> >>> >>> >>> @Robert: >>> What sort of a HDD are you using ? >>> What is event size ? >>> Which version of flume ? >>> >>> -roshan >>> >>> >>> >>> >>> On 7/17/15 12:51 PM, "Robert B Hamilton" <[email protected]> wrote: >>> >>> >Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10 >>> >sinks per agent, and with a file channel with a single dataDir. >>> > >>> > >>> >From: lohit [mailto:[email protected]] >>> >Sent: Wednesday, July 15, 2015 11:11 AM >>> >To: [email protected] >>> >Subject: HDFS Sink performance >>> > >>> >Hello, >>> > >>> >Does anyone have some numbers which they can share around HDFS sink >>> >performance. From our testing, for single sink writing to HDFS >>> >(CompressedStream) and reading from MemoryChannel can only do about >>> >35000 events per second (each event is about 1K) in size. After >>> >compression this turns out to be ~10MB/s write stream to HDFS file. >>> >Which is pretty low. Our configuration looks like this >>> > >>> >agent.sinks.hdfsSink.type = hdfs >>> >agent.sinks.hdfsSink.channel = memoryChannel >>> >agent.sinks.hdfsSink.hdfs.path = /tmp/lohit >>> >agent.sinks.hdfsSink.hdfs.codeC = lzo >>> >agent.sinks.hdfsSink.hdfs.fileType = CompressedStream >>> >agent.sinks.hdfsSink.hdfs.writeFormat = Writable >>> >agent.sinks.hdfsSink.hdfs.rollInterval = 3600 >>> >agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 >>> >agent.sinks.hdfsSink.hdfs.rollCount = 0 >>> >agent.sinks.hdfsSink.hdfs.batchSize = 10000 >>> >agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 >>> > >>> >agent.channels.memoryChannel.type = memory >>> > >>> >agent.channels.memoryChannel.capacity = 3000000 >>> >agent.channels.memoryChannel.transactionCapacity = 10000 >>> > >>> >-- >>> >Have a Nice Day! >>> >Lohit >>> > >>> > >>> >Nothing in this message is intended to constitute an electronic >>> >signature unless a specific statement to the contrary is included in >>> this message. >>> > >>> >Confidentiality Note: This message is intended only for the person or >>> >entity to which it is addressed. It may contain confidential and/or >>> >privileged material. Any review, transmission, dissemination or other >>> >use, or taking of any action in reliance upon this message by persons >>> >or entities other than the intended recipient is prohibited and may be >>> >unlawful. If you received this message in error, please contact the >>> >sender and delete it from your computer. >>> >>> >>> >>> Nothing in this message is intended to constitute an electronic >>> signature unless a specific statement to the contrary is included in this >>> message. >>> >>> Confidentiality Note: This message is intended only for the person or >>> entity to which it is addressed. It may contain confidential and/or >>> privileged material. Any review, transmission, dissemination or other use, >>> or taking of any action in reliance upon this message by persons or >>> entities other than the intended recipient is prohibited and may be >>> unlawful. If you received this message in error, please contact the sender >>> and delete it from your computer. >>> >>> >>> Nothing in this message is intended to constitute an electronic >>> signature unless a specific statement to the contrary is included in this >>> message. >>> >>> Confidentiality Note: This message is intended only for the person or >>> entity to which it is addressed. It may contain confidential and/or >>> privileged material. Any review, transmission, dissemination or other use, >>> or taking of any action in reliance upon this message by persons or >>> entities other than the intended recipient is prohibited and may be >>> unlawful. If you received this message in error, please contact the sender >>> and delete it from your computer. >>> >>> >>> >>> Nothing in this message is intended to constitute an electronic >>> signature unless a specific statement to the contrary is included in this >>> message. >>> >>> Confidentiality Note: This message is intended only for the person or >>> entity to which it is addressed. It may contain confidential and/or >>> privileged material. Any review, transmission, dissemination or other use, >>> or taking of any action in reliance upon this message by persons or >>> entities other than the intended recipient is prohibited and may be >>> unlawful. If you received this message in error, please contact the sender >>> and delete it from your computer. >>> >> >> >> >> -- >> Have a Nice Day! >> Lohit >> > > > > -- > Have a Nice Day! > Lohit >
