Re: tuning performance

2009-03-16 Thread Scott Carey
Yes, I am referring to HDFS taking multiple mounts points and automatically 
round-robin block allocation across it.
A single file block will only exist on a single disk, but the extra speed you 
can get with raid-0 within a block can't be used effectively by almost any 
mapper or reducer anyway.  Perhaps an identity mapper can read faster than a 
single disk - but certainly not if the content is compressed.  \

RAID-0 may be more useful for local temp space.

In effect, you can say that HDFS data nodes already do RAID-0, but with a very 
large block size, and where failure of a disk reduces the redundancy minimally 
and temporarily.

For reference, today's Intel / AMD CPUs can decompress a gzip stream at less 
than 30MB/sec usually  (50MB to 100MB of uncompressed data output a sec).


On 3/14/09 1:53 AM, Vadim Zaliva kroko...@gmail.com wrote:

Scott,

Thanks for interesting information. By JBOD, I assume you mean just listing
multiple partition mount points in hadoop config?

Vadim

On Fri, Mar 13, 2009 at 12:48, Scott Carey sc...@richrelevance.com wrote:
 On 3/13/09 11:56 AM, Allen Wittenauer a...@yahoo-inc.com wrote:

 On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote:

When you stripe you automatically make every disk in the system have the
 same speed as the slowest disk.  In our experiences, systems are more likely
 to have a 'slow' disk than a dead one and detecting that is really
 really hard.  In a distributed system, that multiplier effect can have
 significant consequences on the whole grids performance.

 All disk are the same, so there is no speed difference.

There will be when they start to fail. :)



 This has been discussed before:
 http://www.nabble.com/RAID-vs.-JBOD-td21404366.html

 JBOD is going to be better, the only benefit of RAID-0 is slightly easier 
 management in hadoop config, but harder to manage at the OS level.
 When a single JBOD drive dies, you only lose that set of data.  The datanode 
 goes down but a restart brings back up the parts that still exist.  Then you 
 can leave it be while the replacement is procured... With RAID-0 the whole 
 node is down until you get the new drive and recreate the RAID.

 With JBOD, don't forget to set the linux readahead for the drives to a decent 
 level  (you'll gain up to 25% more sequential read throughput depending on 
 your kernel version).  (blockdev -setra 8192 /dev/device).  I also see good 
 gains by using xfs instead of ext3.  For a big shocker check out the 
 difference in time to delete a bunch of large files with ext3 (long time) 
 versus xfs (almost instant).

 For the newer drives, they can do about 120MB/sec at the front of the drive 
 when tuned (xfs, readahead 4096) and the back of the drive is 60MB/sec.  If 
 you are going to not use 100% of the drive for HDFS, use this knowledge and 
 place the partitions appropriately.  The last 20% or so of the drive is a lot 
 slower than the front 60%.  Here is a typical sequential transfer rate chart 
 for a SATA drive as a function of LBA:
 http://www.tomshardware.com/reviews/Seagate-Barracuda-1.5-TB,2032-5.html
 (graphs aare about 3/4 of the way down the page before the comments).




Re: tuning performance

2009-03-14 Thread Vadim Zaliva
Scott,

Thanks for interesting information. By JBOD, I assume you mean just listing
multiple partition mount points in hadoop config?

Vadim

On Fri, Mar 13, 2009 at 12:48, Scott Carey sc...@richrelevance.com wrote:
 On 3/13/09 11:56 AM, Allen Wittenauer a...@yahoo-inc.com wrote:

 On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote:

    When you stripe you automatically make every disk in the system have the
 same speed as the slowest disk.  In our experiences, systems are more likely
 to have a 'slow' disk than a dead one and detecting that is really
 really hard.  In a distributed system, that multiplier effect can have
 significant consequences on the whole grids performance.

 All disk are the same, so there is no speed difference.

    There will be when they start to fail. :)



 This has been discussed before:
 http://www.nabble.com/RAID-vs.-JBOD-td21404366.html

 JBOD is going to be better, the only benefit of RAID-0 is slightly easier 
 management in hadoop config, but harder to manage at the OS level.
 When a single JBOD drive dies, you only lose that set of data.  The datanode 
 goes down but a restart brings back up the parts that still exist.  Then you 
 can leave it be while the replacement is procured... With RAID-0 the whole 
 node is down until you get the new drive and recreate the RAID.

 With JBOD, don't forget to set the linux readahead for the drives to a decent 
 level  (you'll gain up to 25% more sequential read throughput depending on 
 your kernel version).  (blockdev -setra 8192 /dev/device).  I also see good 
 gains by using xfs instead of ext3.  For a big shocker check out the 
 difference in time to delete a bunch of large files with ext3 (long time) 
 versus xfs (almost instant).

 For the newer drives, they can do about 120MB/sec at the front of the drive 
 when tuned (xfs, readahead 4096) and the back of the drive is 60MB/sec.  If 
 you are going to not use 100% of the drive for HDFS, use this knowledge and 
 place the partitions appropriately.  The last 20% or so of the drive is a lot 
 slower than the front 60%.  Here is a typical sequential transfer rate chart 
 for a SATA drive as a function of LBA:
 http://www.tomshardware.com/reviews/Seagate-Barracuda-1.5-TB,2032-5.html
 (graphs aare about 3/4 of the way down the page before the comments).



Re: tuning performance

2009-03-13 Thread Allen Wittenauer



On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote:

    When you stripe you automatically make every disk in the system have the
 same speed as the slowest disk.  In our experiences, systems are more likely
 to have a 'slow' disk than a dead one and detecting that is really
 really hard.  In a distributed system, that multiplier effect can have
 significant consequences on the whole grids performance.
 
 All disk are the same, so there is no speed difference.

There will be when they start to fail. :)




tuning performance

2009-03-11 Thread Vadim Zaliva
Hi!

I have a question about fine-tunining hadoop performance on 8-core machines.
I have 2 machines I am testing. One is 8-core Xeon and another is 8-core
Opteron. 16Gb RAM each. They both run mapreduce and dfs nodes. Currently
I've set up each of them to run 32 map and 8 reduce tasks.
Also, HADOOP_HEAPSIZE=2048.

I see CPU is under utilized. If there is a guideline how I can find optimal
number of tasks and memory setting for this kind of hardware.

Also, since we going to my more machines like this, I need to decided
whenever buy Xeons or Opterons. Any advise on that?

Sincerely,
Vadim

P.S. I am using Hadoop 19 and java version 1.6.0_12:
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)