Re: Getting started with Apache HTrace development

2015-03-09 Thread Stack
On Mon, Mar 9, 2015 at 2:34 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote:

 On Fri, Mar 6, 2015 at 9:21 AM, Stack st...@duboce.net wrote:
  On Thu, Mar 5, 2015 at 5:41 PM, Lewis John Mcgibbney 
  lewis.mcgibb...@gmail.com wrote:
 
  Is the website on CMS?
 
 
  No sir. svnpubsub.
 
  We could add a page under incubator wiki or add a page here
  http://wiki.apache.org/hadoop/htrace or ask infra for
  http://wiki.apache.org/htrace or not use the wiki at all but add the
 setup
  instruction as a site page (I prefer the latter; wiki rots... doc rots
 too
  but there is more obligation to keep it current).
 
  St.Ack

 Can't we have both?  Good docs AND wiki?


No. Smile. Doc it once only.



 It just feels a little weird adding text about how to build Hadoop to
 a document inside HTrace.  They're separate projects, after all.  I
 would feel a lot more comfortable with that stuff on the wiki, like it
 is in Hadoop.


Are you actually citing Hadoop doc as an example to follow (smile)?



 I guess I don't feel super-strongly about it but I feel like there
 would be a lot more intro material if we could just throw things up on
 a wiki.  Then again, we could have a section of the project docs
 dedicated to getting started and/or integrating with other projects?
 markdown supports links as well.


Yeah. Let me make site building turnkey so not onerous adding doc.

St.Ack


Re: Getting started with Apache HTrace development

2015-03-06 Thread Abraham Elmahrek
+1 as well to Stack's suggestion.

On Fri, Mar 6, 2015 at 12:08 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Afternoon,

 On Fri, Mar 6, 2015 at 9:21 AM, Stack st...@duboce.net wrote:

  ...or not use the wiki at all but add the setup
  instruction as a site page (I prefer the latter; wiki rots... doc rots
 too
  but there is more obligation to keep it current).
 
 
  +1 I agree



Re: Getting started with Apache HTrace development

2015-03-05 Thread Colin P. McCabe
Can we set up a wiki?  Stuff like this needs to be updated
periodically and it would be nice to have something like the hadoop
wiki.  Of course there may be some out of date stuff from time to
time, but it's better than nothing...

On Mon, Mar 2, 2015 at 8:52 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 This is dynamite and I think it would be very helpful to have it linked to
 from the website.
 Although the install and config doesn't appear too bulky, there are a
 number of steps and this would be non trivial for someone who is not
 familiarized with Hadoop xml based runtime configuration.
 I'm finishing off a patch for Chukwa right now then I will be building
 HTtace into my Nutxh 2.x search stack. My aim is to write something similar
 for that deployment as R would also be very helpful to see tracing for Gora
 data stores as well.

Awesome.

best,
Colin


 On Monday, March 2, 2015, Colin P. McCabe cmcc...@apache.org wrote:

 A few people have asked how to get started with HTrace development.  It's a
 good question and we don't have a great README up about it so I thought I
 would
 write something.

 HTrace is all about tracing distributed systems.  So the best way to get
 started is to plug htrace into your favorite distributed system and see
 what
 cool things happen or what bugs pop up.  Since I'm an HDFS developer,
 that's
 the distributed system that I'm most familiar with.  So I will do a quick
 writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
 important use-case that I would like to write about later, but one step at
 a
 time.)

 Just a quick note: a lot of this software is relatively new.  So there may
 be
 bugs or integration pain points that you encounter.

 There has not yet been a stable release of Hadoop that contained Apache
 HTrace.
 There have been releases that contained the pre-Apache version of HTrace,
 but
 that's no fun.  If we want to do development, we want to be able to run the
 latest version of the code.  So we will have to build it ourselves.

 Building HTrace is not too bad.  First we install the dependencies:

 cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel

 If you have a different Linux distro this command will vary slightly, of
 course.  On Macs, brew is a good option.
 Next we use Maven to build the source:

   cmccabe@keter:~/ git clone
 https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
   cmccabe@keter:~/ cd incubator-htrace
   cmccabe@keter:~/ git checkout master
   cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true
 -Drat.skip

 OK.  So htrace is built and installed to the local ~/.m2 directory.

 We should see it under the .m2:
 cmccabe@keter:~/ find ~/.m2 | grep htrace-core
 ...
  
 /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
  

 /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
  

 /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
 ...

 The version you built should be 3.2.0-SNAPSHOT.

 Next, we check out Hadoop:

   cmccabe@keter:~/ git clone
 https://git-wip-us.apache.org/repos/asf/hadoop.git
   cmccabe@keter:~/ cd hadoop
   cmccabe@keter:~/ git checkout branch-2

 So we are basically building a pre-release version of Hadoop 2.7, currently
 known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
 rather
 than the stable 3.1.0 release which it would ordinarily use in branch-2.  I
 applied this diff to hadoop-project/pom.xml

   diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
   index 569b292..5b7e466 100644
   --- a/hadoop-project/pom.xml
   +++ b/hadoop-project/pom.xml
   @@ -785,7 +785,7 @@
  dependency
groupIdorg.apache.htrace/groupId
artifactIdhtrace-core/artifactId
   -version3.1.0-incubating/version
   +version3.2.0-incubating-SNAPSHOT/version
  /dependency
  dependency
groupIdorg.jdom/groupId

 Next, I built Hadoop:

 cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true

 You should get a package with Hadoop jars named like so:

 ...

 ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar

 ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
 ...

 This package should also contain an htrace-3.2.0-SNAPSHOT jar.

 OK, so how can we start seeing some trace spans?  The easiest way is to
 configure LocalFileSpanReceiver.

 Add this to your hdfs-site.xml:

   property
 namehadoop.htrace.spanreceiver.classes/name
 valueorg.apache.htrace.impl.LocalFileSpanReceiver/value
   /property
   property
 namehadoop.htrace.sampler/name
 valueAlwaysSampler/value
   /property

 When you run the Hadoop daemons, you should see them writing to files named
 

Getting started with Apache HTrace development

2015-03-02 Thread Colin P. McCabe
A few people have asked how to get started with HTrace development.  It's a
good question and we don't have a great README up about it so I thought I
would
write something.

HTrace is all about tracing distributed systems.  So the best way to get
started is to plug htrace into your favorite distributed system and see what
cool things happen or what bugs pop up.  Since I'm an HDFS developer, that's
the distributed system that I'm most familiar with.  So I will do a quick
writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
important use-case that I would like to write about later, but one step at a
time.)

Just a quick note: a lot of this software is relatively new.  So there may
be
bugs or integration pain points that you encounter.

There has not yet been a stable release of Hadoop that contained Apache
HTrace.
There have been releases that contained the pre-Apache version of HTrace,
but
that's no fun.  If we want to do development, we want to be able to run the
latest version of the code.  So we will have to build it ourselves.

Building HTrace is not too bad.  First we install the dependencies:

cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel

If you have a different Linux distro this command will vary slightly, of
course.  On Macs, brew is a good option.
Next we use Maven to build the source:

  cmccabe@keter:~/ git clone
https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
  cmccabe@keter:~/ cd incubator-htrace
  cmccabe@keter:~/ git checkout master
  cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true
-Drat.skip

OK.  So htrace is built and installed to the local ~/.m2 directory.

We should see it under the .m2:
cmccabe@keter:~/ find ~/.m2 | grep htrace-core
...
  /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
 
/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
 
/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
...

The version you built should be 3.2.0-SNAPSHOT.

Next, we check out Hadoop:

  cmccabe@keter:~/ git clone
https://git-wip-us.apache.org/repos/asf/hadoop.git
  cmccabe@keter:~/ cd hadoop
  cmccabe@keter:~/ git checkout branch-2

So we are basically building a pre-release version of Hadoop 2.7, currently
known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
rather
than the stable 3.1.0 release which it would ordinarily use in branch-2.  I
applied this diff to hadoop-project/pom.xml

  diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
  index 569b292..5b7e466 100644
  --- a/hadoop-project/pom.xml
  +++ b/hadoop-project/pom.xml
  @@ -785,7 +785,7 @@
 dependency
   groupIdorg.apache.htrace/groupId
   artifactIdhtrace-core/artifactId
  -version3.1.0-incubating/version
  +version3.2.0-incubating-SNAPSHOT/version
 /dependency
 dependency
   groupIdorg.jdom/groupId

Next, I built Hadoop:

cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true

You should get a package with Hadoop jars named like so:

...
./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
...

This package should also contain an htrace-3.2.0-SNAPSHOT jar.

OK, so how can we start seeing some trace spans?  The easiest way is to
configure LocalFileSpanReceiver.

Add this to your hdfs-site.xml:

  property
namehadoop.htrace.spanreceiver.classes/name
valueorg.apache.htrace.impl.LocalFileSpanReceiver/value
  /property
  property
namehadoop.htrace.sampler/name
valueAlwaysSampler/value
  /property

When you run the Hadoop daemons, you should see them writing to files named
/tmp/${PROCESS_ID} (for each different process).  If this doesn't happen,
try
cranking up your log4j level to TRACE to see why the SpanReceiver could not
be
created.

You should see something like this in the log4j logs:

  13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of
type org.apache.htrace.impl.LocalFileSpanReceiver
 at
org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
 at
org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
 at
org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
 at
org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)

Running htraced is easy.  You simply run the binary:

  cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced
-Dlog.level=TRACE -Ddata.store.clear

You should see messages like this:

  cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced
-Dlog.level=TRACE -Ddata.store.clear
  2015-03-02T19:08:33-08:00 D: 

Re: Getting started with Apache HTrace development

2015-03-02 Thread Lewis John Mcgibbney
This is dynamite and I think it would be very helpful to have it linked to
from the website.
Although the install and config doesn't appear too bulky, there are a
number of steps and this would be non trivial for someone who is not
familiarized with Hadoop xml based runtime configuration.
I'm finishing off a patch for Chukwa right now then I will be building
HTtace into my Nutxh 2.x search stack. My aim is to write something similar
for that deployment as R would also be very helpful to see tracing for Gora
data stores as well.

On Monday, March 2, 2015, Colin P. McCabe cmcc...@apache.org wrote:

 A few people have asked how to get started with HTrace development.  It's a
 good question and we don't have a great README up about it so I thought I
 would
 write something.

 HTrace is all about tracing distributed systems.  So the best way to get
 started is to plug htrace into your favorite distributed system and see
 what
 cool things happen or what bugs pop up.  Since I'm an HDFS developer,
 that's
 the distributed system that I'm most familiar with.  So I will do a quick
 writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
 important use-case that I would like to write about later, but one step at
 a
 time.)

 Just a quick note: a lot of this software is relatively new.  So there may
 be
 bugs or integration pain points that you encounter.

 There has not yet been a stable release of Hadoop that contained Apache
 HTrace.
 There have been releases that contained the pre-Apache version of HTrace,
 but
 that's no fun.  If we want to do development, we want to be able to run the
 latest version of the code.  So we will have to build it ourselves.

 Building HTrace is not too bad.  First we install the dependencies:

 cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel

 If you have a different Linux distro this command will vary slightly, of
 course.  On Macs, brew is a good option.
 Next we use Maven to build the source:

   cmccabe@keter:~/ git clone
 https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
   cmccabe@keter:~/ cd incubator-htrace
   cmccabe@keter:~/ git checkout master
   cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true
 -Drat.skip

 OK.  So htrace is built and installed to the local ~/.m2 directory.

 We should see it under the .m2:
 cmccabe@keter:~/ find ~/.m2 | grep htrace-core
 ...
  
 /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
  

 /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
  

 /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
 ...

 The version you built should be 3.2.0-SNAPSHOT.

 Next, we check out Hadoop:

   cmccabe@keter:~/ git clone
 https://git-wip-us.apache.org/repos/asf/hadoop.git
   cmccabe@keter:~/ cd hadoop
   cmccabe@keter:~/ git checkout branch-2

 So we are basically building a pre-release version of Hadoop 2.7, currently
 known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
 rather
 than the stable 3.1.0 release which it would ordinarily use in branch-2.  I
 applied this diff to hadoop-project/pom.xml

   diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
   index 569b292..5b7e466 100644
   --- a/hadoop-project/pom.xml
   +++ b/hadoop-project/pom.xml
   @@ -785,7 +785,7 @@
  dependency
groupIdorg.apache.htrace/groupId
artifactIdhtrace-core/artifactId
   -version3.1.0-incubating/version
   +version3.2.0-incubating-SNAPSHOT/version
  /dependency
  dependency
groupIdorg.jdom/groupId

 Next, I built Hadoop:

 cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true

 You should get a package with Hadoop jars named like so:

 ...

 ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar

 ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
 ...

 This package should also contain an htrace-3.2.0-SNAPSHOT jar.

 OK, so how can we start seeing some trace spans?  The easiest way is to
 configure LocalFileSpanReceiver.

 Add this to your hdfs-site.xml:

   property
 namehadoop.htrace.spanreceiver.classes/name
 valueorg.apache.htrace.impl.LocalFileSpanReceiver/value
   /property
   property
 namehadoop.htrace.sampler/name
 valueAlwaysSampler/value
   /property

 When you run the Hadoop daemons, you should see them writing to files named
 /tmp/${PROCESS_ID} (for each different process).  If this doesn't happen,
 try
 cranking up your log4j level to TRACE to see why the SpanReceiver could not
 be
 created.

 You should see something like this in the log4j logs:

   13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of
 type org.apache.htrace.impl.LocalFileSpanReceiver
  at