Re: MapReduce tunning

2012-02-25 Thread Jie Li
Hello Mohit,

I am looking at some hadoop tuning parameters like io.sort.mb,
 mapred.child.javaopts etc.

- My question was where to look at for current setting


The default settings as well as the documentations can be found in Hadoop
directory:

src/mapred/mapred-default.xml
src/core/core-default.xml
src/hdfs/hdfs-default.xml


 - Are these settings configured cluster wide or per job?


Some settings are configured cluster wide, e.g. the number of map/reduce
slots per node, while some settings are configured per job, e.g.
io.sort.mb. It depends on the functionality of that specific parameter.


 - What's the best way to look at reasons of slow performance?


Well, I want to introduce Starfish to you. Starfish is a self-tuning system
built on Hadoop to provide good performance automatically, without any need
for users to understand and manipulate the many tuning knobs in Hadoop.

With Starfish, you can analyze the performance of your Hadoop job at fine
grained level, e.g. the time for map processing, spilling, merging,
shuffling, sorting, and reduce processing.  So you can understand which
part is the bottleneck of the performance.

You can also ask what-if questions, e.g. What if I double io.sort.mb ?,
and Starfish will predict the new behaviour of the job, so you can better
understand how these parameters work.  In addition, you can simply delegate
Starfish to find the optimal configurations for you to achieve the best
performance.

Welcome to join our Google Group to discuss more about Starfish and any
feedback will be appreciated. If you meet any problems, please don't
hesitate to let us know. The Group address is
http://groups.google.com/group/hadoop-starfish.

Thanks,
Jie
-
Starfish Group, Duke University
Starfish Homepage: www.cs.duke.edu/starfish/
Starfish Google Group: http://groups.google.com/group/hadoop-starfish


Re: MapReduce tunning

2012-02-25 Thread sriramsrao
Use a search engine to find the Hadoop best practices blog by Arun Murthy.

Sriram

On Feb 24, 2012, at 10:36 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

 I am looking at some hadoop tuning parameters like io.sort.mb,
 mapred.child.javaopts etc.
 
 - My question was where to look at for current setting
 - Are these settings configured cluster wide or per job?
 - What's the best way to look at reasons of slow performance?


MapReduce tunning

2012-02-24 Thread Mohit Anchlia
I am looking at some hadoop tuning parameters like io.sort.mb,
mapred.child.javaopts etc.

- My question was where to look at for current setting
- Are these settings configured cluster wide or per job?
- What's the best way to look at reasons of slow performance?