Getting started with Apache HTrace development
A few people have asked how to get started with HTrace development. It's a good question and we don't have a great README up about it so I thought I would write something. HTrace is all about tracing distributed systems. So the best way to get started is to plug htrace into your favorite distributed system and see what cool things happen or what bugs pop up. Since I'm an HDFS developer, that's the distributed system that I'm most familiar with. So I will do a quick writeup about how to use HTrace + HDFS. (HBase + HTrace is another very important use-case that I would like to write about later, but one step at a time.) Just a quick note: a lot of this software is relatively new. So there may be bugs or integration pain points that you encounter. There has not yet been a stable release of Hadoop that contained Apache HTrace. There have been releases that contained the pre-Apache version of HTrace, but that's no fun. If we want to do development, we want to be able to run the latest version of the code. So we will have to build it ourselves. Building HTrace is not too bad. First we install the dependencies: cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel If you have a different Linux distro this command will vary slightly, of course. On Macs, brew is a good option. Next we use Maven to build the source: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/incubator-htrace.git cmccabe@keter:~/ cd incubator-htrace cmccabe@keter:~/ git checkout master cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true -Drat.skip OK. So htrace is built and installed to the local ~/.m2 directory. We should see it under the .m2: cmccabe@keter:~/ find ~/.m2 | grep htrace-core ... /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated ... The version you built should be 3.2.0-SNAPSHOT. Next, we check out Hadoop: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/hadoop.git cmccabe@keter:~/ cd hadoop cmccabe@keter:~/ git checkout branch-2 So we are basically building a pre-release version of Hadoop 2.7, currently known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT rather than the stable 3.1.0 release which it would ordinarily use in branch-2. I applied this diff to hadoop-project/pom.xml diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml index 569b292..5b7e466 100644 --- a/hadoop-project/pom.xml +++ b/hadoop-project/pom.xml @@ -785,7 +785,7 @@ dependency groupIdorg.apache.htrace/groupId artifactIdhtrace-core/artifactId -version3.1.0-incubating/version +version3.2.0-incubating-SNAPSHOT/version /dependency dependency groupIdorg.jdom/groupId Next, I built Hadoop: cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true You should get a package with Hadoop jars named like so: ... ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar ... This package should also contain an htrace-3.2.0-SNAPSHOT jar. OK, so how can we start seeing some trace spans? The easiest way is to configure LocalFileSpanReceiver. Add this to your hdfs-site.xml: property namehadoop.htrace.spanreceiver.classes/name valueorg.apache.htrace.impl.LocalFileSpanReceiver/value /property property namehadoop.htrace.sampler/name valueAlwaysSampler/value /property When you run the Hadoop daemons, you should see them writing to files named /tmp/${PROCESS_ID} (for each different process). If this doesn't happen, try cranking up your log4j level to TRACE to see why the SpanReceiver could not be created. You should see something like this in the log4j logs: 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of type org.apache.htrace.impl.LocalFileSpanReceiver at org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92) at org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161) at org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147) at org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82) Running htraced is easy. You simply run the binary: cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced -Dlog.level=TRACE -Ddata.store.clear You should see messages like this: cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced -Dlog.level=TRACE -Ddata.store.clear 2015-03-02T19:08:33-08:00 D:
Re: Getting started with Apache HTrace development
This is dynamite and I think it would be very helpful to have it linked to from the website. Although the install and config doesn't appear too bulky, there are a number of steps and this would be non trivial for someone who is not familiarized with Hadoop xml based runtime configuration. I'm finishing off a patch for Chukwa right now then I will be building HTtace into my Nutxh 2.x search stack. My aim is to write something similar for that deployment as R would also be very helpful to see tracing for Gora data stores as well. On Monday, March 2, 2015, Colin P. McCabe cmcc...@apache.org wrote: A few people have asked how to get started with HTrace development. It's a good question and we don't have a great README up about it so I thought I would write something. HTrace is all about tracing distributed systems. So the best way to get started is to plug htrace into your favorite distributed system and see what cool things happen or what bugs pop up. Since I'm an HDFS developer, that's the distributed system that I'm most familiar with. So I will do a quick writeup about how to use HTrace + HDFS. (HBase + HTrace is another very important use-case that I would like to write about later, but one step at a time.) Just a quick note: a lot of this software is relatively new. So there may be bugs or integration pain points that you encounter. There has not yet been a stable release of Hadoop that contained Apache HTrace. There have been releases that contained the pre-Apache version of HTrace, but that's no fun. If we want to do development, we want to be able to run the latest version of the code. So we will have to build it ourselves. Building HTrace is not too bad. First we install the dependencies: cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel If you have a different Linux distro this command will vary slightly, of course. On Macs, brew is a good option. Next we use Maven to build the source: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/incubator-htrace.git cmccabe@keter:~/ cd incubator-htrace cmccabe@keter:~/ git checkout master cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true -Drat.skip OK. So htrace is built and installed to the local ~/.m2 directory. We should see it under the .m2: cmccabe@keter:~/ find ~/.m2 | grep htrace-core ... /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated ... The version you built should be 3.2.0-SNAPSHOT. Next, we check out Hadoop: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/hadoop.git cmccabe@keter:~/ cd hadoop cmccabe@keter:~/ git checkout branch-2 So we are basically building a pre-release version of Hadoop 2.7, currently known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT rather than the stable 3.1.0 release which it would ordinarily use in branch-2. I applied this diff to hadoop-project/pom.xml diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml index 569b292..5b7e466 100644 --- a/hadoop-project/pom.xml +++ b/hadoop-project/pom.xml @@ -785,7 +785,7 @@ dependency groupIdorg.apache.htrace/groupId artifactIdhtrace-core/artifactId -version3.1.0-incubating/version +version3.2.0-incubating-SNAPSHOT/version /dependency dependency groupIdorg.jdom/groupId Next, I built Hadoop: cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true You should get a package with Hadoop jars named like so: ... ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar ... This package should also contain an htrace-3.2.0-SNAPSHOT jar. OK, so how can we start seeing some trace spans? The easiest way is to configure LocalFileSpanReceiver. Add this to your hdfs-site.xml: property namehadoop.htrace.spanreceiver.classes/name valueorg.apache.htrace.impl.LocalFileSpanReceiver/value /property property namehadoop.htrace.sampler/name valueAlwaysSampler/value /property When you run the Hadoop daemons, you should see them writing to files named /tmp/${PROCESS_ID} (for each different process). If this doesn't happen, try cranking up your log4j level to TRACE to see why the SpanReceiver could not be created. You should see something like this in the log4j logs: 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of type org.apache.htrace.impl.LocalFileSpanReceiver at