OK your disk that is giving you 40KB/second is telling you the truth and the faster disk is lying to you. Look up "fsync lies" to see what I am referring to.
A spinning disk can do 100 fsync operations per second (this is done at the end of every batch). That is how I estimated your event size, 40KB/second is doing 40KB / 100 = 409 bytes. Once again, if you want increased performance, you should increase the batch size. Brock On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani <[email protected]> wrote: > Hi > > Yes. It is around 480 - 500 bytes. > > > On 10/10/2012 09:24 PM, Brock Noland wrote: >> >> How big are your events? Average about 400 bytes? >> >> Brock >> >> On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani >> <[email protected]> wrote: >>> >>> Hi >>> >>> Thanks for the inputs Brock. After doing several experiments >>> eventually problem boiled down to disks. >>> >>> -- But I had used the same configuration (so all software components >>> are >>> same in all 3 machines) >>> on all 3 machines. >>> -- In User guide it is written that if multiple file channel instances >>> are >>> active on the same agent then >>> different disks are preferable. But in my case only one file channel is >>> active per agent. >>> -- Only one pattern I observed that on the machines where I got better >>> performance have multiple disks. >>> But I don't understand how that will help if I have only 1 active file >>> channel. >>> -- What is the impact of the type of disk/disk device driver on >>> performance? >>> I mean I don't understand >>> with 1 disk I am getting 40 KB/sec and with other 2 MB/sec. >>> >>> Could you please elaborate on File channel and disks correlation. >>> >>> Regards, >>> Jagadish >>> >>> >>> On 10/09/2012 08:01 PM, Brock Noland wrote: >>> >>> Hi, >>> >>> Using file channel, in terms of performance, the number and type of >>> disks is going to be much more predictive of performance than CPU or >>> RAM. Note that consumer level drives/controllers will give you much >>> "better" performance because they lie to you about when your data is >>> actually written to the drive. If you search for "fsync lies" you'll >>> find more information on this. >>> >>> You probably want to increase the batch size to get better performance. >>> >>> Brock >>> >>> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani >>> <[email protected]> wrote: >>> >>> Hi >>> >>> My flume setup is: >>> >>> Source Agent : cat source - File Channel - Avro Sink >>> Dest Agent : avro source - File Channel - HDFS Sink. >>> >>> There is only 1 source agent and 1 destination agent. >>> >>> I measure throughput as amount of data written to HDFS per second. >>> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30 sec >>> the >>> throughput is : -- 2 MB/sec ). >>> >>> I have run source agent on various machines with different hardware >>> configurations : >>> (In all cases I run flume agent with JAVA OPTIONS as >>> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote >>> -XX:MaxDirectMemorySize=2g") >>> >>> JDK is 32 bit. >>> >>> Experiment 1: >>> ===== >>> RAM : 16 GB >>> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores). >>> 64 bit Processor with 64 bit Kernel. >>> Throughput: 2 MB/sec >>> >>> Experiment 2: >>> ====== >>> RAM : 4 GB >>> Processor: Intel Xeon E5504 @ 2.00GHz (4 cores). 32 bit Processor >>> 64 bit Processor with 32 bit Kernel. >>> Throughput : 30 KB/sec >>> >>> Experiment 3: >>> ====== >>> RAM : 8 GB >>> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor >>> 64 bit Processor with 32 bit Kernel. >>> Throughput : 80 KB/sec >>> >>> -- So as can be seen there is huge difference in the throughput with >>> same >>> configuration but >>> different hardware. >>> -- In the first case where throughput is more RES is around 160 MB in >>> other >>> cases it is in >>> the range of 40 MB - 50 MB. >>> >>> Can anybody please give insights that why there is this huge difference >>> in >>> the throughput? >>> What is the correlation between RAM and filechannel/HDFS sink performance >>> and also >>> with 32-bit/64 bit kernel? >>> >>> Regards, >>> Jagadish >>> >>> >>> >> >> > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
