I had asked a similar question recently:
First, follow the instructions in BUILDING.txt. It is a bit tedious, but if
you are careful and get all of the exact versions of everything installed
(don't rely on yum to get the right version), you will have a built hadoop.
Second, there's the question of how to run it in "single node local" mode for
debugging and development, and how to edit source in Eclipse. I have not done
that yet but got the advice below when I asked.
It would be really nice to find "Hadoop internals for newbies" FAQ, but I've
not found such a thing yet, and there is a lot of outdated information relating
to 0.1x versions that is no longer useful which turns on up web searches.
John
----------------------------------------------------------------------
Hi John,
Here's how I deploy/debug Hadoop locally:
To build and tar Hadoop:
mvn clean package -Pdist -Dtar -DskipTests=true
The tar will be located in the project directory under hadoop-dist/target/. I
untar it into my deploy directory.
I then copy these scripts into the same directory:
hadoop-dev-env.sh:
---
#!/bin/bash
export HADOOP_DEV_HOME=`pwd`
export HADOOP_MAPRED_HOME=${HADOOP_DEV_HOME}
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
hadoop-dev-setup.sh:
---
#!/bin/bash
source ./hadoop-dev-env.sh
bin/hadoop namenode -format
hadoop-dev.sh:
---
source hadoop-dev-env.sh
sbin/hadoop-daemon.sh $1 namenode
sbin/hadoop-daemon.sh $1 datanode
sbin/yarn-daemon.sh $1 resourcemanager
sbin/yarn-daemon.sh $1 nodemanager
sbin/mr-jobhistory-daemon.sh $1 historyserver
sbin/httpfs.sh $1
I copy all the files in <deploy directory>/conf into my conf directory, <deploy
directory>/etc/hadoop, and then copy the minimal site configuration into . The
advantage of using a directory that's not the /conf directory is that it won't
be overwritten the next time you untar a new build. Lastly, I copy the minimal
site configuration into the conf files. For the sake of brevity, I won't
include the properties in full xml format, but here are the ones I set:
yarn-site.xml:
yarn.nodemanager.aux-services = mapreduce.shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class =
org.apache.hadoop.mapred.ShuffleHandler
yarn.resourcemanager.scheduler.class =
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
mapred-site.xml:
mapreduce.framework.name<http://mapreduce.framework.name> = yarn
core-site.xml:
fs.default.name<http://fs.default.name> = hdfs://localhost:9000
hdfs-site.xml:
dfs.replication = 1
dfs.permissions = false
Then, to format HDFS and start our cluster, we can simply do:
./hadoop-dev-setup.sh
./hadoop-dev.sh start
To stop it:
./hadoop-dev.sh stop
Once I have this set up, for quicker iteration, I have some scripts that build
submodules (sometimes all of mapreduce, sometimes just the resourcemanager) and
copy the updated jars into my setup.
Regarding your last question, are you saying that you were able to load it into
Eclipse already, and want tips on the best way to browse within it? Or that
you're trying to get the source loaded into Eclipse?
Hope that helps!
Sandy
On Thu, May 30, 2013 at 9:32 AM, John Lilley
<[email protected]<mailto:[email protected]>> wrote:
Thanks for help me to build Hadoop! I'm through compile and install of maven
plugins into Eclipse. I could use some pointers for next steps I want to take,
which are:
* Deploy the simplest "development only" cluster (single node?) and
learn how to debug within it. I read about the "local runner" configuration
here (http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms), does that
still apply to MR2/YARN? It seems like an old page; perhaps there is a newer
FAQ?
* Build and run the ApplicationMaster "shell" sample, and use that as a
starting point for a customer AM. I would much appreciate any advice on
getting the edit/build/debug cycle ironed out for an AM.
* Setup Hadoop source for easier browsing and learning (Eclipse load?).
What is typically done to make for easy browsing of referenced classes/methods
by name?
Thanks
John
From: Lokesh Basu [mailto:[email protected]]
Sent: Sunday, June 02, 2013 11:18 PM
To: [email protected]; [email protected]
Subject: How to start developing!
Hi,
I'm new to Hadoop and wants to be a part of this community and also be able to
build/develop. I have already made myself familiar with cluster setup and
MapReduce but is finding it difficult to get into the source code build-up and
making any changes, patches etc. In short, I'm unable to get into the source
code development part. So It would be very helpful to me if I get any kind of
help regarding my initial startup and then I can indulge more into it.
Thanking you all in anticipation
Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)