That is a bit disconcerting. Are you using the same HDFS setup and same config for both tests? Would it be possible for you to take a look at Flume 1.6.0? Such drops in performance should be taken care of.
Thanks, Hari On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton <[email protected]> wrote: > My mailer totally scrambled the numbers, probably by inserting special > characters. > Sorry, here are the actual results.... > > All rates in MB/s > Payload in KB > > Flume 1.3.1 > Payload rate memchRate Fch > 25 34 29 > 25 31 27.6 > 25 50 23.3 > 25 46.5 27.2 > 50 31.3 23.8 > 50 37.4 31.3 > 50 32.3 31.8 > 80 30.5 25.8 > 80 46.2 25.2 > 80 39.1 25.8 > 80 56.5 25.1 > > Flume 1.5. > Payload rate memchRate Fch > 25 18.7 15.6 > 50 18.3 17.3 > 80 18.4 15.6 > > -----Original Message----- > From: Robert B Hamilton [mailto:[email protected]] > Sent: Wednesday, July 22, 2015 11:00 AM > To: [email protected] > Subject: RE: HDFS Sink performance > > I only see that kind of throughput for event sizes of 25kB to 50kB or > larger. > > These particular tests are done on flume version 1.3.1. > But because you asked, I thought to do a few quick runs on 1.5.0.1 and > added those results below. The results are significantly different for 1.5 > and I wonder if this is a cause for concern. > > None of this has been peer reviewed so it should be considered as > tentative. > > As to the HDD, here is result of a quick and dirty dd test. > > dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync > 104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s > > > Source data: each record consists of random ascii strings of constant > length (25k,50k,or 80k depending on the run). > Source: spooldir > Channel: file channel single dataDir, or memory channel. > Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20 > seconds. > > Batch size was kept small because of memory channel capacity. Increasing > batch size for file channel did not improve performance so I kept it at 10. > > Here I have numbers for some runs where the payload is varied from > 25K,50K, and 80K. I include memory channel for comparison. > > Multiple runs were peformed for each event size. As you can see the > throughput can vary from run to run because these particular measurements > were done on an environment that is not tightly controlled. Think of them > as "in situ" measurements :) > > Flume 1.3.1 memory channel and file channel > ------------------------------------------------------- > Payload Rate memch Rate(filechl) > (kB)(MB/s) (MB/s) > ----------------------------------------------------- > 253429 > 253127.6 > 255023.3 > 2546.527.2 > 5031.223.8 > 5037.431.3 > 5032.331.8 > 8030.525.8 > 8046.225.2 > 8039.125.8 > 8056.525.1 > > > Flume 1.5 File Channel and Memory Channel > --------------------------------------------------- > Event size Rate memch Rate filech > (KB) (MB/s) (MB/s) > --------------------------------------------------- > 2518.715.6 > 5018.317.3 > 8018.415.6 > > -----Original Message----- > From: Roshan Naik [mailto:[email protected]] > Sent: Friday, July 17, 2015 6:21 PM > To: [email protected] > Subject: Re: HDFS Sink performance > > I Updated the Flume wiki with my measurements. Also added section with > Hive sink measurements. > > https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+ > -+round+2 > > > @Robert: > What sort of a HDD are you using ? > What is event size ? > Which version of flume ? > > -roshan > > > > > On 7/17/15 12:51 PM, "Robert B Hamilton" <[email protected]> wrote: > > >Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10 > >sinks per agent, and with a file channel with a single dataDir. > > > > > >From: lohit [mailto:[email protected]] > >Sent: Wednesday, July 15, 2015 11:11 AM > >To: [email protected] > >Subject: HDFS Sink performance > > > >Hello, > > > >Does anyone have some numbers which they can share around HDFS sink > >performance. From our testing, for single sink writing to HDFS > >(CompressedStream) and reading from MemoryChannel can only do about > >35000 events per second (each event is about 1K) in size. After > >compression this turns out to be ~10MB/s write stream to HDFS file. > >Which is pretty low. Our configuration looks like this > > > >agent.sinks.hdfsSink.type = hdfs > >agent.sinks.hdfsSink.channel = memoryChannel > >agent.sinks.hdfsSink.hdfs.path = /tmp/lohit > >agent.sinks.hdfsSink.hdfs.codeC = lzo > >agent.sinks.hdfsSink.hdfs.fileType = CompressedStream > >agent.sinks.hdfsSink.hdfs.writeFormat = Writable > >agent.sinks.hdfsSink.hdfs.rollInterval = 3600 > >agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 > >agent.sinks.hdfsSink.hdfs.rollCount = 0 > >agent.sinks.hdfsSink.hdfs.batchSize = 10000 > >agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 > > > >agent.channels.memoryChannel.type = memory > > > >agent.channels.memoryChannel.capacity = 3000000 > >agent.channels.memoryChannel.transactionCapacity = 10000 > > > >-- > >Have a Nice Day! > >Lohit > > > > > >Nothing in this message is intended to constitute an electronic > >signature unless a specific statement to the contrary is included in this > message. > > > >Confidentiality Note: This message is intended only for the person or > >entity to which it is addressed. It may contain confidential and/or > >privileged material. Any review, transmission, dissemination or other > >use, or taking of any action in reliance upon this message by persons > >or entities other than the intended recipient is prohibited and may be > >unlawful. If you received this message in error, please contact the > >sender and delete it from your computer. > > > > Nothing in this message is intended to constitute an electronic signature > unless a specific statement to the contrary is included in this message. > > Confidentiality Note: This message is intended only for the person or > entity to which it is addressed. It may contain confidential and/or > privileged material. Any review, transmission, dissemination or other use, > or taking of any action in reliance upon this message by persons or > entities other than the intended recipient is prohibited and may be > unlawful. If you received this message in error, please contact the sender > and delete it from your computer. > > > Nothing in this message is intended to constitute an electronic signature > unless a specific statement to the contrary is included in this message. > > Confidentiality Note: This message is intended only for the person or > entity to which it is addressed. It may contain confidential and/or > privileged material. Any review, transmission, dissemination or other use, > or taking of any action in reliance upon this message by persons or > entities other than the intended recipient is prohibited and may be > unlawful. If you received this message in error, please contact the sender > and delete it from your computer. >
