Re: How to debug hadoop(or YARN) locally?

Chris Nauroth Tue, 22 Dec 2015 09:50:08 -0800

If you want the capability to run live pseudo-distributed and deploy code
changes without doing a full distro tarball build, then you can control
the classpath by setting a few more environment variables in
hadoop-env.sh.  Here is an example of what I'm doing in one of my dev
environments.


export HADOOP_USER_CLASSPATH_FIRST=1
HADOOP_REPO=~/git/hadoop
export 
HADOOP_CLASSPATH=$HADOOP_REPO/hadoop-common-project/hadoop-common/target/cl
asses:$HADOOP_REPO/hadoop-hdfs-project/hadoop-hdfs-client/target/classes:$H
ADOOP_REPO/hadoop-hdfs-project/hadoop-hdfs/target/classes


Setting HADOOP_CLASSPATH adds additional paths to the classpath before the
shell launches the JVM.  In my case, I have the source checked out to
~/git/hadoop, and I point to the target/classes sub-directories for the
individual sub-modules that I want to override and test.  Then, I can make
code changes, run "mvn compile" in the sub-module directory, and restart
the daemons.

By default, the HADOOP_CLASSPATH entries are added at the end of the
standard classpath.  Setting HADOOP_USER_CLASSPATH_FIRST=1 changes that
behavior so that the custom entries are first.  This way, my built code
changes override the classes that were bundled in the tarball distro.

--Chris Nauroth




On 12/21/15, 7:29 PM, "Allen Zhang" <[email protected]> wrote:

>
>oh, so cool. awesome. Thanks
>
>
>
>
>
>
>
>At 2015-12-22 11:01:55, "Jeff Zhang" <[email protected]> wrote:
>>If you want to change the yarn internal code, you can use MiniYarnCluster
>>for testing.
>>
>>https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-ya
>>rn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/h
>>adoop/yarn/server/MiniYARNCluster.java
>>
>>On Tue, Dec 22, 2015 at 10:00 AM, Allen Zhang <[email protected]>
>>wrote:
>>
>>>
>>>
>>> so, does it to mean that, if I change or add some code, I have to
>>> re-tarball the whole project using "mvn clean package -Pdist
>>>-DskipTests
>>> -Dtar", and then, deploy it to somewhere to remote debug?  if yes, I
>>>think
>>> it is so inconvincence. if no, can you guys explain more in this way?
>>>
>>>
>>> Thanks,
>>> Allen
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2015-12-22 08:55:01, "Jeff Zhang" <[email protected]> wrote:
>>> >+1 for Chris, remote debug will help you.
>>> >
>>> >On Tue, Dec 22, 2015 at 1:54 AM, Chris Nauroth
>>><[email protected]>
>>> >wrote:
>>> >
>>> >> If you're running the Hadoop daemons in pseudo-distributed mode
>>>(all the
>>> >> daemons running as separate processes, but on a single dev host),
>>>then
>>> >> another option is to launch the daemon's JVM with the JDWP
>>>arguments and
>>> >> attach a "remote" debugger.  This can be either the jdb CLI debugger
>>> that
>>> >> ships with the JDK or a fancier IDE like Eclipse or IntelliJ.
>>> >>
>>> >> Each daemon's JVM arguments are controlled with an environment
>>>variable
>>> >> suffixed with "_OPTS" defined in files named *-env.sh.  For
>>>example, in
>>> >> hadoop-env.sh, you could set something like this to enable remote
>>> >> debugging for the NameNode process:
>>> >>
>>> >> export
>>> >>
>>> 
>>>HADOOP_NAMENODE_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,addres
>>>s=8
>>> >> 000,suspend=n $HADOOP_NAMENODE_OPTS"
>>> >>
>>> >>
>>> >> Then, you can run "jdb -attach localhost:8000" to attach the
>>>debugger,
>>> or
>>> >> do the equivalent in your IDE of choice.
>>> >>
>>> >> --Chris Nauroth
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On 12/21/15, 7:25 AM, "Daniel Templeton" <[email protected]>
>>>wrote:
>>> >>
>>> >> >Your best bet is to find a test that includes all the bits you
>>>want and
>>> >> >execute that test in debug mode.  (You can also change an existing
>>>test
>>> >> >to include what you want, but in most cases it is easier to start
>>>with
>>> >> >an existing test than to start from scratch.)
>>> >> >
>>> >> >Daniel
>>> >> >
>>> >> >On 12/20/15 6:01 PM, Allen Zhang wrote:
>>> >> >> Hi all,
>>> >> >>
>>> >> >> I am reading hadoop-2.6.0 source code, mainly focusing on hadoop
>>> yarn.
>>> >> >> However i have some problems in reading or debugging the source
>>> >> >>code,can I debug it locally(I mean in my laptop locally with this
>>> source
>>> >> >>code I've downloaded, not remotely debug),
>>> >> >> because I need to track it execution flow stey by stey, and then
>>>I
>>> want
>>> >> >>to add a new feature or enhancement.
>>> >> >>
>>> >> >>
>>> >> >> So can anyone give some good suggestions or share your method or
>>>any
>>> >> >>wiki page?  Really appreciate!!
>>> >> >>
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Allen
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >
>>> >
>>> >--
>>> >Best Regards
>>> >
>>> >Jeff Zhang
>>>
>>
>>
>>
>>-- 
>>Best Regards
>>
>>Jeff Zhang

Re: How to debug hadoop(or YARN) locally?

Reply via email to