Re: tuning performance
Yes, I am referring to HDFS taking multiple mounts points and automatically round-robin block allocation across it. A single file block will only exist on a single disk, but the extra speed you can get with raid-0 within a block can't be used effectively by almost any mapper or reducer anyway. Perhaps an identity mapper can read faster than a single disk - but certainly not if the content is compressed. \ RAID-0 may be more useful for local temp space. In effect, you can say that HDFS data nodes already do RAID-0, but with a very large block size, and where failure of a disk reduces the redundancy minimally and temporarily. For reference, today's Intel / AMD CPUs can decompress a gzip stream at less than 30MB/sec usually (50MB to 100MB of uncompressed data output a sec). On 3/14/09 1:53 AM, Vadim Zaliva kroko...@gmail.com wrote: Scott, Thanks for interesting information. By JBOD, I assume you mean just listing multiple partition mount points in hadoop config? Vadim On Fri, Mar 13, 2009 at 12:48, Scott Carey sc...@richrelevance.com wrote: On 3/13/09 11:56 AM, Allen Wittenauer a...@yahoo-inc.com wrote: On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote: When you stripe you automatically make every disk in the system have the same speed as the slowest disk. In our experiences, systems are more likely to have a 'slow' disk than a dead one and detecting that is really really hard. In a distributed system, that multiplier effect can have significant consequences on the whole grids performance. All disk are the same, so there is no speed difference. There will be when they start to fail. :) This has been discussed before: http://www.nabble.com/RAID-vs.-JBOD-td21404366.html JBOD is going to be better, the only benefit of RAID-0 is slightly easier management in hadoop config, but harder to manage at the OS level. When a single JBOD drive dies, you only lose that set of data. The datanode goes down but a restart brings back up the parts that still exist. Then you can leave it be while the replacement is procured... With RAID-0 the whole node is down until you get the new drive and recreate the RAID. With JBOD, don't forget to set the linux readahead for the drives to a decent level (you'll gain up to 25% more sequential read throughput depending on your kernel version). (blockdev -setra 8192 /dev/device). I also see good gains by using xfs instead of ext3. For a big shocker check out the difference in time to delete a bunch of large files with ext3 (long time) versus xfs (almost instant). For the newer drives, they can do about 120MB/sec at the front of the drive when tuned (xfs, readahead 4096) and the back of the drive is 60MB/sec. If you are going to not use 100% of the drive for HDFS, use this knowledge and place the partitions appropriately. The last 20% or so of the drive is a lot slower than the front 60%. Here is a typical sequential transfer rate chart for a SATA drive as a function of LBA: http://www.tomshardware.com/reviews/Seagate-Barracuda-1.5-TB,2032-5.html (graphs aare about 3/4 of the way down the page before the comments).
Re: tuning performance
Scott, Thanks for interesting information. By JBOD, I assume you mean just listing multiple partition mount points in hadoop config? Vadim On Fri, Mar 13, 2009 at 12:48, Scott Carey sc...@richrelevance.com wrote: On 3/13/09 11:56 AM, Allen Wittenauer a...@yahoo-inc.com wrote: On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote: When you stripe you automatically make every disk in the system have the same speed as the slowest disk. In our experiences, systems are more likely to have a 'slow' disk than a dead one and detecting that is really really hard. In a distributed system, that multiplier effect can have significant consequences on the whole grids performance. All disk are the same, so there is no speed difference. There will be when they start to fail. :) This has been discussed before: http://www.nabble.com/RAID-vs.-JBOD-td21404366.html JBOD is going to be better, the only benefit of RAID-0 is slightly easier management in hadoop config, but harder to manage at the OS level. When a single JBOD drive dies, you only lose that set of data. The datanode goes down but a restart brings back up the parts that still exist. Then you can leave it be while the replacement is procured... With RAID-0 the whole node is down until you get the new drive and recreate the RAID. With JBOD, don't forget to set the linux readahead for the drives to a decent level (you'll gain up to 25% more sequential read throughput depending on your kernel version). (blockdev -setra 8192 /dev/device). I also see good gains by using xfs instead of ext3. For a big shocker check out the difference in time to delete a bunch of large files with ext3 (long time) versus xfs (almost instant). For the newer drives, they can do about 120MB/sec at the front of the drive when tuned (xfs, readahead 4096) and the back of the drive is 60MB/sec. If you are going to not use 100% of the drive for HDFS, use this knowledge and place the partitions appropriately. The last 20% or so of the drive is a lot slower than the front 60%. Here is a typical sequential transfer rate chart for a SATA drive as a function of LBA: http://www.tomshardware.com/reviews/Seagate-Barracuda-1.5-TB,2032-5.html (graphs aare about 3/4 of the way down the page before the comments).
Re: tuning performance
On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote: When you stripe you automatically make every disk in the system have the same speed as the slowest disk. In our experiences, systems are more likely to have a 'slow' disk than a dead one and detecting that is really really hard. In a distributed system, that multiplier effect can have significant consequences on the whole grids performance. All disk are the same, so there is no speed difference. There will be when they start to fail. :)
tuning performance
Hi! I have a question about fine-tunining hadoop performance on 8-core machines. I have 2 machines I am testing. One is 8-core Xeon and another is 8-core Opteron. 16Gb RAM each. They both run mapreduce and dfs nodes. Currently I've set up each of them to run 32 map and 8 reduce tasks. Also, HADOOP_HEAPSIZE=2048. I see CPU is under utilized. If there is a guideline how I can find optimal number of tasks and memory setting for this kind of hardware. Also, since we going to my more machines like this, I need to decided whenever buy Xeons or Opterons. Any advise on that? Sincerely, Vadim P.S. I am using Hadoop 19 and java version 1.6.0_12: Java(TM) SE Runtime Environment (build 1.6.0_12-b04) Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)