Robert: Are u saying that the MemCh perf with Null sink also exhibits the same perf degradation ?
A side note: The Spillable channel has a faster performing memory channel (and spilling to disk can be disabled) but unfortunately there is an issue with its metrics publishing which is kind of hard to fix. -roshan On 7/23/15 12:00 PM, "Robert B Hamilton" <[email protected]> wrote: >I now believe that Roshan is correct that the channel may be the place to >look. > >With tests using null sinks I had found that the channel was not much of >a factor with 1.3, but now that I check 1.5 and 1.6 with null sinks, they >still show the same pattern of performance degradation. The interesting >thing is that I find similar performance hits both when using file >channel AND when using memory channel. Looking forward to Johny's >findings. > > >From: Hari Shreedharan [mailto:[email protected]] >Sent: Thursday, July 23, 2015 12:33 PM >To: [email protected] >Subject: Re: HDFS Sink performance > >This is interesting. I believe Johny is actually looking into this >performance issue. > > > >Thanks, >Hari > >On Thu, Jul 23, 2015 at 9:27 AM, lohit <[email protected]> wrote: >Majority of messages need not be persisted to disk for us. So, we are >interested in MemoryChannel. >There has been gradual performance degradation from 1.3.1 -> 1.4.0 -> >1.6.0. >See this graph below, were I have a constant stream of messages (blue >line). While this is happening I swap different versions of flumes for >agent. >Orange line shows messages dropped. (Flat line is when data is streamed >to HDFS) and I have marked flat lines with different versions. > > > >2015-07-22 19:48 GMT-07:00 Roshan Naik <[email protected]>: > >My guess is that most of you will probably use File channel in production >with HDFS sink? In which scenario the common observation seems to be that >the File channel becomes the primary bottleneck. Going by Robert's >observations too seems to have dropped also since v1.3. > >Robert, can u confirm how many data dirs were used for your readings >with FCh ? > >-roshan > > > >From: lohit <[email protected]> >Reply-To: "[email protected]" <[email protected]> >Date: Wednesday, July 22, 2015 3:01 PM >To: "[email protected]" <[email protected]> > >Subject: Re: HDFS Sink performance > >Thanks for sharing these number Robert. Curious, I did the same >experiment. >Flume 1.3.1 version has higher throughput than 1.6.0 (I was able to get >sustained 60MB/s with Flume 1.3.1) >No config or setup change, just changing flume version shows this >difference. We should probably look at change set between 1.3.1 and 1.5 >to see if there was any obvious changes. > >2015-07-22 14:00 GMT-07:00 Robert B Hamilton <[email protected]>: >Here is a comparison between versions 1.3, 1.5, and 1.6. >I would estimate that error bars are plus or minus 15%. > >All parameters are identical, as between runs all I change is the version >of flume. >Lohit¹s numbers are fairly consistent with this, because if we double the >sinks from my 4 to his 8 and assuming linear scalability we would expect >to get somewhere close to 30-40MB/s. > >It looks like the drop off is more pronounced for the larger event size. >This is of concern to us because we are looking at this for a high volume >feed with message sizes up to 80 kB. > >------------------------------------------ >HDFSx4 sink, Memory channel >-------------------------------------- >Payload V1.3 v1.5 v1.6 >(kB) MB/s >---------- ----- ----- ----- >1 27 17 20 >25 56 15 15 > > > >From: Hari Shreedharan [mailto:[email protected]] >Sent: Wednesday, July 22, 2015 1:27 PM >To: [email protected] >Subject: Re: HDFS Sink performance > >That is a bit disconcerting. Are you using the same HDFS setup and same >config for both tests? Would it be possible for you to take a look at >Flume 1.6.0? Such drops in performance should be taken care of. > > > >Thanks, >Hari > >On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton ><[email protected]> wrote: >My mailer totally scrambled the numbers, probably by inserting special >characters. >Sorry, here are the actual results.... > >All rates in MB/s >Payload in KB > >Flume 1.3.1 >Payload rate memchRate Fch >25 34 29 >25 31 27.6 >25 50 23.3 >25 46.5 27.2 >50 31.3 23.8 >50 37.4 31.3 >50 32.3 31.8 >80 30.5 25.8 >80 46.2 25.2 >80 39.1 25.8 >80 56.5 25.1 > >Flume 1.5. >Payload rate memchRate Fch >25 18.7 15.6 >50 18.3 17.3 >80 18.4 15.6 > >-----Original Message----- >From: Robert B Hamilton [mailto:[email protected]] >Sent: Wednesday, July 22, 2015 11:00 AM >To: [email protected] >Subject: RE: HDFS Sink performance > > I only see that kind of throughput for event sizes of 25kB to 50kB or >larger. > >These particular tests are done on flume version 1.3.1. >But because you asked, I thought to do a few quick runs on 1.5.0.1 and >added those results below. The results are significantly different for >1.5 and I wonder if this is a cause for concern. > >None of this has been peer reviewed so it should be considered as >tentative. > >As to the HDD, here is result of a quick and dirty dd test. > > dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync > 104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s > > >Source data: each record consists of random ascii strings of constant >length (25k,50k,or 80k depending on the run). >Source: spooldir >Channel: file channel single dataDir, or memory channel. >Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20 >seconds. > >Batch size was kept small because of memory channel capacity. Increasing >batch size for file channel did not improve performance so I kept it at >10. > >Here I have numbers for some runs where the payload is varied from >25K,50K, and 80K. I include memory channel for comparison. > >Multiple runs were peformed for each event size. As you can see the >throughput can vary from run to run because these particular measurements >were done on an environment that is not tightly controlled. Think of >them as "in situ" measurements :) > >Flume 1.3.1 memory channel and file channel >------------------------------------------------------- >Payload Rate memch Rate(filechl) >(kB)(MB/s) (MB/s) >----------------------------------------------------- >253429 >253127.6 >255023.3 >2546.527.2 >5031.223.8 >5037.431.3 >5032.331.8 >8030.525.8 >8046.225.2 >8039.125.8 >8056.525.1 > > >Flume 1.5 File Channel and Memory Channel >--------------------------------------------------- >Event size Rate memch Rate filech >(KB) (MB/s) (MB/s) >--------------------------------------------------- >2518.715.6 >5018.317.3 >8018.415.6 > >-----Original Message----- >From: Roshan Naik [mailto:[email protected]] >Sent: Friday, July 17, 2015 6:21 PM >To: [email protected] >Subject: Re: HDFS Sink performance > >I Updated the Flume wiki with my measurements. Also added section with >Hive sink measurements. > >https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements >+ >-+round+2 > > >@Robert: > What sort of a HDD are you using ? > What is event size ? > Which version of flume ? > >-roshan > > > > >On 7/17/15 12:51 PM, "Robert B Hamilton" <[email protected]> wrote: > >>Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10 >>sinks per agent, and with a file channel with a single dataDir. >> >> >>From: lohit [mailto:[email protected]] >>Sent: Wednesday, July 15, 2015 11:11 AM >>To: [email protected] >>Subject: HDFS Sink performance >> >>Hello, >> >>Does anyone have some numbers which they can share around HDFS sink >>performance. From our testing, for single sink writing to HDFS >>(CompressedStream) and reading from MemoryChannel can only do about >>35000 events per second (each event is about 1K) in size. After >>compression this turns out to be ~10MB/s write stream to HDFS file. >>Which is pretty low. Our configuration looks like this >> >>agent.sinks.hdfsSink.type = hdfs >>agent.sinks.hdfsSink.channel = memoryChannel >>agent.sinks.hdfsSink.hdfs.path = /tmp/lohit >>agent.sinks.hdfsSink.hdfs.codeC = lzo >>agent.sinks.hdfsSink.hdfs.fileType = CompressedStream >>agent.sinks.hdfsSink.hdfs.writeFormat = Writable >>agent.sinks.hdfsSink.hdfs.rollInterval = 3600 >>agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 >>agent.sinks.hdfsSink.hdfs.rollCount = 0 >>agent.sinks.hdfsSink.hdfs.batchSize = 10000 >>agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 >> >>agent.channels.memoryChannel.type = memory >> >>agent.channels.memoryChannel.capacity = 3000000 >>agent.channels.memoryChannel.transactionCapacity = 10000 >> >>-- >>Have a Nice Day! >>Lohit >> >> >>Nothing in this message is intended to constitute an electronic >>signature unless a specific statement to the contrary is included in >>this message. >> >>Confidentiality Note: This message is intended only for the person or >>entity to which it is addressed. It may contain confidential and/or >>privileged material. Any review, transmission, dissemination or other >>use, or taking of any action in reliance upon this message by persons >>or entities other than the intended recipient is prohibited and may be >>unlawful. If you received this message in error, please contact the >>sender and delete it from your computer. > > > >Nothing in this message is intended to constitute an electronic signature >unless a specific statement to the contrary is included in this message. > >Confidentiality Note: This message is intended only for the person or >entity to which it is addressed. It may contain confidential and/or >privileged material. Any review, transmission, dissemination or other >use, or taking of any action in reliance upon this message by persons or >entities other than the intended recipient is prohibited and may be >unlawful. If you received this message in error, please contact the >sender and delete it from your computer. > > >Nothing in this message is intended to constitute an electronic signature >unless a specific statement to the contrary is included in this message. > >Confidentiality Note: This message is intended only for the person or >entity to which it is addressed. It may contain confidential and/or >privileged material. Any review, transmission, dissemination or other >use, or taking of any action in reliance upon this message by persons or >entities other than the intended recipient is prohibited and may be >unlawful. If you received this message in error, please contact the >sender and delete it from your computer. > > > >Nothing in this message is intended to constitute an electronic signature >unless a specific statement to the contrary is included in this message. > >Confidentiality Note: This message is intended only for the person or >entity to which it is addressed. It may contain confidential and/or >privileged material. Any review, transmission, dissemination or other >use, or taking of any action in reliance upon this message by persons or >entities other than the intended recipient is prohibited and may be >unlawful. If you received this message in error, please contact the >sender and delete it from your computer. > > > > >-- >Have a Nice Day! >Lohit > > > > >-- >Have a Nice Day! >Lohit > > > >Nothing in this message is intended to constitute an electronic signature >unless a specific statement to the contrary is included in this message. > >Confidentiality Note: This message is intended only for the person or >entity to which it is addressed. It may contain confidential and/or >privileged material. Any review, transmission, dissemination or other >use, or taking of any action in reliance upon this message by persons or >entities other than the intended recipient is prohibited and may be >unlawful. If you received this message in error, please contact the >sender and delete it from your computer.
