from:"Colin P. McCabe"

Re: Gsoc 2016 with apache HTrace

2016-03-19 Thread Colin P. McCabe

Hi Madhawa,

s3 and other alternate FS connector support for HTrace is a really
great project, I think.  The connector code is getting a lot of
real-world use, and it would be interesting to get performance numbers
here.  It would be especially interesting to see some graphs of
latencies for various operations.  I'll ask some folks I know who are
working on the connector code if they have any suggestions.

best,
Colin

On Wed, Mar 16, 2016 at 10:21 PM, Madhawa Kasun Gunasekara
 wrote:
> Hi,
>
> I am a final year student in IESL College of Engineering, Sri lanka. I am
> interested with working with apache htrace. I'm interested in "Add HTrace
> distributed tracing for s3 and other alternative Hadoop FS implementations"
> project [1]. Please kindly give me further information on how I could
> proceed.
>
> [1] https://issues.apache.org/jira/browse/COMDEV-191
>
> Thanks
>
> Madhawa

Re: Gsoc 2016 with apache HTrace

2016-03-19 Thread Colin P. McCabe

Thanks for your interest, Madhawa!

As a next step, I suggest filing a JIRA on the HADOOP bug tracker.
This bug tracker is used for things in the "Hadoop common" subproject.
This is where the code for the s3 connector and other filesystem
connectors live.  Just add a few paragraphs of description of what
you'd like to do.

Note that Hadoop actually has three different s3 connectors at the
moment.  "s3", "s3n", and "s3a".  s3a is the most modern connector and
probably the one that you should focus on.  The other connectors are
older and we don't recommend using them.

The code is here:
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

Once you feel confident in the scope of your proposal, you can fill
out an application as described here:
http://community.apache.org/gsoc.html
See the "Application template" section.

There's a timeline here:
https://developers.google.com/open-source/gsoc/timeline.  I believe
the application deadline is March 25.

best,
Colin

On Thu, Mar 17, 2016 at 12:38 PM, Madhawa Kasun Gunasekara
<madhaw...@gmail.com> wrote:
> Hi Colin,
>
> Thanks for the response, yes I have seen the value of this project.
> I would like to know the suggestions, before drafting the project proposal.
>
> Please kindly give me further information on how I could proceed.
>
> Thanks,
> Madhawa
>
> Madhawa
>
> On Thu, Mar 17, 2016 at 11:54 PM, Colin P. McCabe <cmcc...@apache.org>
> wrote:
>>
>> Hi Madhawa,
>>
>> s3 and other alternate FS connector support for HTrace is a really
>> great project, I think.  The connector code is getting a lot of
>> real-world use, and it would be interesting to get performance numbers
>> here.  It would be especially interesting to see some graphs of
>> latencies for various operations.  I'll ask some folks I know who are
>> working on the connector code if they have any suggestions.
>>
>> best,
>> Colin
>>
>> On Wed, Mar 16, 2016 at 10:21 PM, Madhawa Kasun Gunasekara
>> <madhaw...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am a final year student in IESL College of Engineering, Sri lanka. I
>> > am
>> > interested with working with apache htrace. I'm interested in "Add
>> > HTrace
>> > distributed tracing for s3 and other alternative Hadoop FS
>> > implementations"
>> > project [1]. Please kindly give me further information on how I could
>> > proceed.
>> >
>> > [1] https://issues.apache.org/jira/browse/COMDEV-191
>> >
>> > Thanks
>> >
>> > Madhawa
>
>

Re: Experiences Using Apache HTrace (Incubating) in Distributed Web Search

2016-03-04 Thread Colin P. McCabe

Awesome!  Looking forward to checking out the slides.

best,
Colin

On Thu, Mar 3, 2016 at 5:00 PM, Lewis John Mcgibbney
 wrote:
> Hi Folks,
> A heads up that I sent in a proposal to Apache Big Data on the above topic.
> Very pleased that it was accepted and I hope to be in Vancouver to share
> experiences.
> The is based on my ongoing work on
> https://issues.apache.org/jira/browse/NUTCH-2005
> I would like to share slides and get feedback closer to the event prior to
> me submitting the slides so I will update this thread nearer the time.
> Thanks for now.
> Lewis
>
> --
> *Lewis*

Re: HTrace 4.1 release candidate 2

2016-02-24 Thread Colin P. McCabe

Thanks, guys.  I will post to the incubator list in a sec.

best,
Colin

On Wed, Feb 24, 2016 at 9:56 AM, Elliott Clark <ecl...@apache.org> wrote:
> +1 on the RC as well.
>
> Checked the hash.
> Created a test app using the api.
> Built from src.
>
>
> On Mon, Feb 22, 2016 at 8:37 PM, Stack <st...@duboce.net> wrote:
>
>> On Mon, Feb 22, 2016 at 3:03 PM, Colin P. McCabe <cmcc...@apache.org>
>> wrote:
>>
>> > There are at least 3 RPC compatibility-breaking changes in
>> > htrace-htraced between 4.0.1 and 4.1.0:
>> >
>> > HTRACE-315 changed the default port for htraced's HTTP interface from
>> > 9095 to 9096.
>> > HTRACE-237 changes the HTTP wire format slightly for htraced.
>> > Previous to this, we just sent a whitespace-separated list of trace
>> > spans.  After this, we send an actual JSON object.
>> > HTRACE-308 (Deserialize WriteSpans requests incrementally rather than
>> > all at once) changes the field "Spans" in the HRPC header to
>> > "NumSpans".  It also decreases the maximum RPC size,
>> > MAX_HRPC_BODY_LENGTH, from 64 MB to 32 MB.
>> >
>> >
>> Thanks Colin.
>>
>>
>> > To be honest, the main point of 4.0.1 was to stabilize the
>> > htrace-core4 API and make a first release with the GUI.  There was a
>> > lot of unfinished business in htraced-- things that we only got to
>> > 4.1.  htraced is way more stable in 4.1 since we dealt with things
>> > like GC pressure, the client side, and so forth.  And also just fixing
>> > bugs.  I think we should accept the compatibility break with 4.0.1.
>> > However, I agree that it would be nice to support the "old client, new
>> > server" case.  I filed HTRACE-344 to add a mechanism that makes it
>> > easier to detect the case where the client is too new, and give back a
>> > reasonable response.  I don't think we should adopt a formal
>> > compatibility policy-- htraced is not mature enough for that.  But we
>> > can strive to maintain compatibility harder than we have in 4.2 (or
>> > whatever the next release ends up being.)
>> >
>> >
>> Ok. Thats good enough I'd say.
>>
>> +1 on the 4.1 RC.
>>
>> I checked hash and signature. Built from src and all unit tests passed.
>> Started up htraced and put in a few spans with the client.
>>
>> St.Ack
>>
>>
>>
>>
>>
>>
>> > best,
>> > Colin
>> >
>> >
>> > On Mon, Feb 22, 2016 at 10:37 AM, Stack <st...@duboce.net> wrote:
>> > > On Mon, Feb 22, 2016 at 9:51 AM, Colin P. McCabe <cmcc...@apache.org>
>> > wrote:
>> > >
>> > >> On Sun, Feb 21, 2016 at 8:10 PM, Stack <st...@duboce.net> wrote:
>> > >> > "The rationale for this limitation is that tracing can simply be
>> > disabled
>> > >> > for a brief period during the rolling upgrade process."
>> > >> >
>> > >> > The second time an operator has to do this, they'll just throw away
>> > >> tracing
>> > >> > as a PITA.
>> > >>
>> > >> Let's put this in perspective.  Apache Spark recently transitioned
>> > >> from Scala 2.10 to 2.11.  Those Scala releases aren't binary
>> > >> compatible.  They aren't even source-code compatible, which means that
>> > >> people potentially had to rewrite their Spark jobs just to perform an
>> > >> upgrade... let alone have things continue to work during the upgrade.
>> > >> And people didn't throw away Spark; it's more popular than ever.
>> > >>
>> > >>
>> > > By 'perspective', you mean others made a mess so we can too?
>> > >
>> > >
>> > >
>> > >> Now: It makes sense for a storage system (particularly a mature and
>> > >> widely-deployed one) to bend over backwards to stay up during
>> > >> upgrades.  That's why HDFS is so strict about this, and HBase as well.
>> > >> But they weren't always that strict; we used to break RPC
>> > >> compatibility with every release in the earlier days.  Also, HTrace is
>> > >> not a storage system!  It's a tracing system.  It can be unavailable
>> > >> for a few hours.  It will be OK.
>> > >>
>> > >> What's not OK is for us to have CLASSPATH conflicts within a minor
>> > >> version of htrace-core4 th

Re: PRs submitted over public Github repo

2016-02-22 Thread Colin P. McCabe

Thanks, Jake.  That will be helpful.

Best,
Colin
On Feb 22, 2016 8:27 AM, "Jake Farrell" <jfarr...@apache.org> wrote:

> If the github issue has the jira ticket id in it then we can enable the
> github webhook to comment on the jira issue each time the pr is updated,
> this is part of our typical github integrations setup
>
> -Jake
>
> On Fri, Feb 19, 2016 at 8:25 PM, Colin P. McCabe <cmcc...@apache.org>
> wrote:
>
> > I think it would be more appropriate to shadow the pull request to
> > JIRA, since in order to go in, all contributions need a JIRA + review
> > from committers.  Let's continue this discussion on INFRA-11298.
> >
> > best,
> > Colin
> >
> > On Fri, Feb 19, 2016 at 3:27 PM, Lewis John Mcgibbney
> > <lewis.mcgibb...@gmail.com> wrote:
> > > I am of the opinion that we should definitely have the PR's being
> > shadowed
> > > to this list.
> > > It provides us with context about new contributions and more
> importantly
> > > new community members.
> > > Thanks Andrey for hitting the lists and letting us know about that.
> > > I filed https://issues.apache.org/jira/browse/INFRA-11298 to address
> > this.
> > > Thanks
> > >
> > > On Fri, Feb 19, 2016 at 3:22 PM, <
> > > dev-digest-h...@htrace.incubator.apache.org> wrote:
> > >
> > >>
> > >> -- Forwarded message --
> > >> From: Andrey Redko <drr...@gmail.com>
> > >> To: dev@htrace.incubator.apache.org
> > >> Cc:
> > >> Date: Thu, 18 Feb 2016 07:47:50 -0500
> > >> Subject: PRs submitted over public Github repo
> > >> Hey Devs,
> > >>
> > >> I am wondering if anyone is watching the PRs submitted over public
> > Github
> > >> repo  (http://github.com/apache/incubator-htrace/)?
> > >> Would be great to have some feedback on those.
> > >> Thank you.
> > >>
> > >> Best Regards,
> > >> Andriy Redko
> > >>
> > >>
> > >>
> >
>

Re: PRs submitted over public Github repo

2016-02-19 Thread Colin P. McCabe

I think it would be more appropriate to shadow the pull request to
JIRA, since in order to go in, all contributions need a JIRA + review
from committers.  Let's continue this discussion on INFRA-11298.

best,
Colin

On Fri, Feb 19, 2016 at 3:27 PM, Lewis John Mcgibbney
 wrote:
> I am of the opinion that we should definitely have the PR's being shadowed
> to this list.
> It provides us with context about new contributions and more importantly
> new community members.
> Thanks Andrey for hitting the lists and letting us know about that.
> I filed https://issues.apache.org/jira/browse/INFRA-11298 to address this.
> Thanks
>
> On Fri, Feb 19, 2016 at 3:22 PM, <
> dev-digest-h...@htrace.incubator.apache.org> wrote:
>
>>
>> -- Forwarded message --
>> From: Andrey Redko 
>> To: dev@htrace.incubator.apache.org
>> Cc:
>> Date: Thu, 18 Feb 2016 07:47:50 -0500
>> Subject: PRs submitted over public Github repo
>> Hey Devs,
>>
>> I am wondering if anyone is watching the PRs submitted over public Github
>> repo  (http://github.com/apache/incubator-htrace/)?
>> Would be great to have some feedback on those.
>> Thank you.
>>
>> Best Regards,
>> Andriy Redko
>>
>>
>>

Re: HTrace 4.1 release candidate 2

2016-02-19 Thread Colin P. McCabe

Our compatibility policy (see
http://mail-archives.apache.org/mod_mbox/htrace-dev/201509.mbox/%3c55f8badc.4050...@oss.nttdata.co.jp%3E
) only covers the htrace-core4 API right now.  So we can guarantee
that any projects using htrace-core 4.0.1 can upgrade to htrace-core
4.1.0 without breaking anything.  (This is a more painful guarantee
than it sounds since it means we can't remove functions, only
deprecate them...  And so forth.)  But it's a very useful guarantee
for our downstream projects.

However, we don't support mixing and matching versions of the
SpanReceiver client and server components.  The admin has to roll out
a uniform version of those components-- for example, using htraced
4.0.1 with htrace-htraced.jar 4.1.0 is not supported.  The rationale
for this limitation is that tracing can simply be disabled for a brief
period during the rolling upgrade process.  Also, the different
SpanReceiver subprojects are at different levels of maturity, and
imposing heavy compatibility guarantees would slow down development
for no real gain.

best,
Colin


On Fri, Feb 19, 2016 at 4:32 PM, Stack <st...@duboce.net> wrote:
> Can a 4.0.1 client talk to a 4.1.0 htrace? Has it been tested?
> St.Ack
>
> On Tue, Feb 9, 2016 at 7:00 PM, Colin P. McCabe <cmcc...@apache.org> wrote:
>
>> Hi all,
>>
>> I've posted the second release candidate for HTrace 4.1 here:
>>
>> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc2/
>>
>> The jars have been staged here:
>>
>> https://repository.apache.org/content/repositories/orgapachehtrace-1022
>>
>> Compared to rc1, this rc includes HTRACE-334 and HTRACE-342.
>>
>> HTrace 4.1 brings a lot of robustness improvements.  There were major
>> improvements to htraced and the web UI, as well as new metrics added.
>> There were numerous build fixups, and we added Docker support, to
>> ensure a repeatable build.
>>
>> Check it out.  The vote will run for 5 days.
>>
>> cheers,
>> Colin
>>
>>
>> Release Notes - HTrace - Version 4.1
>> ** Bug
>> * [HTRACE-114] - Fix compilation error of htrace-hbase against
>> hbase-1.0.0
>> * [HTRACE-238] - Change maven compiler source level to 1.7 to
>> match targetJdk
>> * [HTRACE-243] - Remove duplicate maven-assembly-plugin
>> configuration section in htrace-htraced/pom.xml
>> * [HTRACE-245] - NOTICE.txt: change "developed by The Apache
>> Software...” to "developed at The Apache Software...”
>> * [HTRACE-246] - HTrace WebApp not properly defined and therefore
>> not packaged into .war
>> * [HTRACE-248] - HTraced should gracefully shutdown if stopped
>> * [HTRACE-249] - Script and doc on how to publish website
>> * [HTRACE-251] - Fix "mvn clean" target
>> * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs
>> are too chatty
>> * [HTRACE-256] - Change the artifactId for htrace-core in branch
>> 4.0 to be htrace-core4
>> * [HTRACE-257] - htrace-htraced: add web symlink rather than
>> generating programmatically
>> * [HTRACE-262] - Temporarily suppress doclint for Java 8 to
>> prevent build failure
>> * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY
>> config key more consistent with other configs
>> * [HTRACE-267] - Move owl logo licensing information from NOTICE to
>> LICENSE
>> * [HTRACE-268] - Remove Units and go-codec from LICENSE since they
>> are not contained in the source release
>> * [HTRACE-272] - TracerPool must not load multiple inscance of
>> same receiver class when a simple classname is given
>> * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was
>> using the wrong JSON serialization for spans and add validation to
>> htraced REST ingest path
>> * [HTRACE-280] - htraced: add metrics about total spans added and
>> dropped per address
>> * [HTRACE-281] - htraced: add example/htraced-conf.xml
>> * [HTRACE-282] - htraced: reap spans which are older than a
>> configurable interval
>> * [HTRACE-283] - Heartbeater should wait for goroutine to finish on
>> close
>> * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the
>> shaded version of commons-logging as provided
>> * [HTRACE-285] - htraced tool: fix query parsing and add query_test
>> * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for
>> conditional logging
>> * [HTRACE-294] - htraced: fix some metrics issues
>> * [HTRACE-297] - htraced: avoid serializing spans to json unless
>> TRACE logging is enabled
>> * [HTRACE-300] - Reaper should be initialized

Re: PRs submitted over public Github repo

2016-02-18 Thread Colin P. McCabe

Hi Andrey,

Please file a JIRA for any issues you find.  It's at
https://issues.apache.org/jira/browse/htrace

We should probably put this in the README.md so that it shows up on github.

We were talking about enabling gerrit a while back, but unfortunately
there needs to be some more work on the ASF infrastructure side to
make that work.

best,
Colin

On Thu, Feb 18, 2016 at 4:47 AM, Andrey Redko  wrote:
> Hey Devs,
>
> I am wondering if anyone is watching the PRs submitted over public Github
> repo  (http://github.com/apache/incubator-htrace/)?
> Would be great to have some feedback on those.
> Thank you.
>
> Best Regards,
> Andriy Redko

Re: HTrace 4.1 release candidate 1

2016-02-09 Thread Colin P. McCabe

I have sunk this RC, and posted RC2.  Thanks for the feedback, all!

best,
Colin

On Mon, Feb 8, 2016 at 2:37 PM, Colin P. McCabe <cmcc...@apache.org> wrote:
> Thanks, guys.  I will spin another RC with HTRACE-334 added.  Can I
> get a review of https://issues.apache.org/jira/browse/HTRACE-342 so
> that I can add that as well?  It's a very simple docs change.
>
> best,
> Colin
>
> On Sat, Feb 6, 2016 at 4:04 PM, Masatake Iwasaki
> <iwasak...@oss.nttdata.co.jp> wrote:
>> Thanks for putting this up, Colin.
>>
>>> * [HTRACE-334] - htrace-web: Make limit of search and children API
>>> configurable
>>
>> This seemed not to be cherry-picked to branch-4.1.
>> I do not think this is critical but would like it to be in.
>>
>> Except for this, the RC is good.
>>
>> I built Hadoop against 4.1.0-incubating,
>> run HDFS operations with tracing enabled,
>> saw tracing by Web-UI of htraced.
>> It worked fine.
>>
>> Masatake Iwasaki
>>
>>
>> On 2/3/16 09:50, Colin P. McCabe wrote:
>>>
>>> Hi all,
>>>
>>> I've posted the first release candidate for HTrace 4.1 here:
>>>
>>> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/
>>>
>>> The jars have been staged here:
>>>
>>> https://repository.apache.org/content/repositories/orgapachehtrace-1021
>>>
>>> HTrace 4.1 brings a lot of robustness improvements.  There were major
>>> improvements to htraced and the web UI, as well as new metrics added.
>>> There were numerous build fixups, and we added Docker support, to
>>> ensure a repeatable build.
>>>
>>> Check it out.  The vote will run for 5 days.
>>>
>>> cheers,
>>> Colin
>>>
>>> Release Notes - HTrace - Version 4.1
>>> ** Bug
>>>  * [HTRACE-114] - Fix compilation error of htrace-hbase against
>>> hbase-1.0.0
>>>  * [HTRACE-238] - Change maven compiler source level to 1.7 to
>>> match targetJdk
>>>  * [HTRACE-243] - Remove duplicate maven-assembly-plugin
>>> configuration section in htrace-htraced/pom.xml
>>>  * [HTRACE-245] - NOTICE.txt: change "developed by The Apache
>>> Software...” to "developed at The Apache Software...”
>>>  * [HTRACE-246] - HTrace WebApp not properly defined and therefore
>>> not packaged into .war
>>>  * [HTRACE-248] - HTraced should gracefully shutdown if stopped
>>>  * [HTRACE-249] - Script and doc on how to publish website
>>>  * [HTRACE-251] - Fix "mvn clean" target
>>>  * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs
>>> are too chatty
>>>  * [HTRACE-256] - Change the artifactId for htrace-core in branch
>>> 4.0 to be htrace-core4
>>>  * [HTRACE-257] - htrace-htraced: add web symlink rather than
>>> generating programmatically
>>>  * [HTRACE-262] - Temporarily suppress doclint for Java 8 to
>>> prevent build failure
>>>  * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY
>>> config key more consistent with other configs
>>>  * [HTRACE-267] - Move owl logo licensing information from NOTICE to
>>> LICENSE
>>>  * [HTRACE-268] - Remove Units and go-codec from LICENSE since they
>>> are not contained in the source release
>>>  * [HTRACE-272] - TracerPool must not load multiple inscance of
>>> same receiver class when a simple classname is given
>>>  * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was
>>> using the wrong JSON serialization for spans and add validation to
>>> htraced REST ingest path
>>>  * [HTRACE-280] - htraced: add metrics about total spans added and
>>> dropped per address
>>>  * [HTRACE-281] - htraced: add example/htraced-conf.xml
>>>  * [HTRACE-282] - htraced: reap spans which are older than a
>>> configurable interval
>>>  * [HTRACE-283] - Heartbeater should wait for goroutine to finish on
>>> close
>>>  * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the
>>> shaded version of commons-logging as provided
>>>  * [HTRACE-285] - htraced tool: fix query parsing and add query_test
>>>  * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for
>>> conditional logging
>>>  * [HTRACE-294] - htraced: fix some metrics issues
>>>  * [HTRACE-297] - htraced: avoid serializing spans to json unless
>>> TRACE logging is enabled
>>

HTrace 4.1 release candidate 2

2016-02-09 Thread Colin P. McCabe

Hi all,

I've posted the second release candidate for HTrace 4.1 here:

http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc2/

The jars have been staged here:

https://repository.apache.org/content/repositories/orgapachehtrace-1022

Compared to rc1, this rc includes HTRACE-334 and HTRACE-342.

HTrace 4.1 brings a lot of robustness improvements.  There were major
improvements to htraced and the web UI, as well as new metrics added.
There were numerous build fixups, and we added Docker support, to
ensure a repeatable build.

Check it out.  The vote will run for 5 days.

cheers,
Colin


Release Notes - HTrace - Version 4.1
** Bug
* [HTRACE-114] - Fix compilation error of htrace-hbase against hbase-1.0.0
* [HTRACE-238] - Change maven compiler source level to 1.7 to
match targetJdk
* [HTRACE-243] - Remove duplicate maven-assembly-plugin
configuration section in htrace-htraced/pom.xml
* [HTRACE-245] - NOTICE.txt: change "developed by The Apache
Software...” to "developed at The Apache Software...”
* [HTRACE-246] - HTrace WebApp not properly defined and therefore
not packaged into .war
* [HTRACE-248] - HTraced should gracefully shutdown if stopped
* [HTRACE-249] - Script and doc on how to publish website
* [HTRACE-251] - Fix "mvn clean" target
* [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs
are too chatty
* [HTRACE-256] - Change the artifactId for htrace-core in branch
4.0 to be htrace-core4
* [HTRACE-257] - htrace-htraced: add web symlink rather than
generating programmatically
* [HTRACE-262] - Temporarily suppress doclint for Java 8 to
prevent build failure
* [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY
config key more consistent with other configs
* [HTRACE-267] - Move owl logo licensing information from NOTICE to LICENSE
* [HTRACE-268] - Remove Units and go-codec from LICENSE since they
are not contained in the source release
* [HTRACE-272] - TracerPool must not load multiple inscance of
same receiver class when a simple classname is given
* [HTRACE-279] - Fix issues where the HTracedSpanReceiver was
using the wrong JSON serialization for spans and add validation to
htraced REST ingest path
* [HTRACE-280] - htraced: add metrics about total spans added and
dropped per address
* [HTRACE-281] - htraced: add example/htraced-conf.xml
* [HTRACE-282] - htraced: reap spans which are older than a
configurable interval
* [HTRACE-283] - Heartbeater should wait for goroutine to finish on close
* [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the
shaded version of commons-logging as provided
* [HTRACE-285] - htraced tool: fix query parsing and add query_test
* [HTRACE-289] - Fix TraceEnabled, etc. logger methods for
conditional logging
* [HTRACE-294] - htraced: fix some metrics issues
* [HTRACE-297] - htraced: avoid serializing spans to json unless
TRACE logging is enabled
* [HTRACE-300] - Reaper should be initialized before shards are activated
* [HTRACE-301] - htraced: fix unit tests that aren't waiting for
spans to be written, use semaphore for WrittenSpans
* [HTRACE-302] - htraced: Add admissions control to HRPC to limit
the number of incoming messages
* [HTRACE-304] - htraced: fix bug with GREATER_THAN queries
* [HTRACE-307] - htraced: queries sometimes return no results even
when many results exist due to confusion in iterator usage
* [HTRACE-311] - htraced: Fix logging to stdout via -Dlog.path=
* [HTRACE-316] - htrace-web: span.js issue: span ID string length
is 32, not 36
* [HTRACE-317] - Fix the documentation for adding tracing to an
application to reflect HTrace 4.x API changes
* [HTRACE-328] - htraced continues scanning in some cases even
when no more results are possible

** Improvement
* [HTRACE-342] - centralize building instructions in BUILDING.txt
* [HTRACE-334] - htrace-web: Make limit of search and children API
configurable
* [HTRACE-129] - htraced: add /server/stats REST endpoint
* [HTRACE-156] - HTrace GUI: add about view
* [HTRACE-181] - gui: Split "about" screen
* [HTRACE-237] - Optimize htraced span receiver
* [HTRACE-239] - Add htrace/impl/TestZipkinSpanReceiver.java
* [HTRACE-260] - htrace-zipkin should not set the obsolete
duration field in thrift
* [HTRACE-271] - Add log4j.properties to all submodule tests
* [HTRACE-276] - Shade classes into org.apache.htrace.shaded
rather than org.apache.htrace
* [HTRACE-286] - htraced: improvements to logging, daemon startup,
and configuration
* [HTRACE-290] - htraced: Fix per-faculty log level settings and
add unit tests for conditional logging
* [HTRACE-291] - rename bin/htrace to bin/htracedTool
* [HTRACE-292] - "htracedTool version" should display the git
hash, and -Dgit.version option should be available for build
* [HTRACE-295] - htraced: setting span.expiry.ms to 0 should
disable span expiry
*

Re: HTrace 4.1 release candidate 1

2016-02-08 Thread Colin P. McCabe

Thanks, guys.  I will spin another RC with HTRACE-334 added.  Can I
get a review of https://issues.apache.org/jira/browse/HTRACE-342 so
that I can add that as well?  It's a very simple docs change.

best,
Colin

On Sat, Feb 6, 2016 at 4:04 PM, Masatake Iwasaki
<iwasak...@oss.nttdata.co.jp> wrote:
> Thanks for putting this up, Colin.
>
>> * [HTRACE-334] - htrace-web: Make limit of search and children API
>> configurable
>
> This seemed not to be cherry-picked to branch-4.1.
> I do not think this is critical but would like it to be in.
>
> Except for this, the RC is good.
>
> I built Hadoop against 4.1.0-incubating,
> run HDFS operations with tracing enabled,
> saw tracing by Web-UI of htraced.
> It worked fine.
>
> Masatake Iwasaki
>
>
> On 2/3/16 09:50, Colin P. McCabe wrote:
>>
>> Hi all,
>>
>> I've posted the first release candidate for HTrace 4.1 here:
>>
>> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/
>>
>> The jars have been staged here:
>>
>> https://repository.apache.org/content/repositories/orgapachehtrace-1021
>>
>> HTrace 4.1 brings a lot of robustness improvements.  There were major
>> improvements to htraced and the web UI, as well as new metrics added.
>> There were numerous build fixups, and we added Docker support, to
>> ensure a repeatable build.
>>
>> Check it out.  The vote will run for 5 days.
>>
>> cheers,
>> Colin
>>
>> Release Notes - HTrace - Version 4.1
>> ** Bug
>>  * [HTRACE-114] - Fix compilation error of htrace-hbase against
>> hbase-1.0.0
>>  * [HTRACE-238] - Change maven compiler source level to 1.7 to
>> match targetJdk
>>  * [HTRACE-243] - Remove duplicate maven-assembly-plugin
>> configuration section in htrace-htraced/pom.xml
>>  * [HTRACE-245] - NOTICE.txt: change "developed by The Apache
>> Software...” to "developed at The Apache Software...”
>>  * [HTRACE-246] - HTrace WebApp not properly defined and therefore
>> not packaged into .war
>>  * [HTRACE-248] - HTraced should gracefully shutdown if stopped
>>  * [HTRACE-249] - Script and doc on how to publish website
>>  * [HTRACE-251] - Fix "mvn clean" target
>>  * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs
>> are too chatty
>>  * [HTRACE-256] - Change the artifactId for htrace-core in branch
>> 4.0 to be htrace-core4
>>  * [HTRACE-257] - htrace-htraced: add web symlink rather than
>> generating programmatically
>>  * [HTRACE-262] - Temporarily suppress doclint for Java 8 to
>> prevent build failure
>>  * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY
>> config key more consistent with other configs
>>  * [HTRACE-267] - Move owl logo licensing information from NOTICE to
>> LICENSE
>>  * [HTRACE-268] - Remove Units and go-codec from LICENSE since they
>> are not contained in the source release
>>  * [HTRACE-272] - TracerPool must not load multiple inscance of
>> same receiver class when a simple classname is given
>>  * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was
>> using the wrong JSON serialization for spans and add validation to
>> htraced REST ingest path
>>  * [HTRACE-280] - htraced: add metrics about total spans added and
>> dropped per address
>>  * [HTRACE-281] - htraced: add example/htraced-conf.xml
>>  * [HTRACE-282] - htraced: reap spans which are older than a
>> configurable interval
>>  * [HTRACE-283] - Heartbeater should wait for goroutine to finish on
>> close
>>  * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the
>> shaded version of commons-logging as provided
>>  * [HTRACE-285] - htraced tool: fix query parsing and add query_test
>>  * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for
>> conditional logging
>>  * [HTRACE-294] - htraced: fix some metrics issues
>>  * [HTRACE-297] - htraced: avoid serializing spans to json unless
>> TRACE logging is enabled
>>  * [HTRACE-300] - Reaper should be initialized before shards are
>> activated
>>  * [HTRACE-301] - htraced: fix unit tests that aren't waiting for
>> spans to be written, use semaphore for WrittenSpans
>>  * [HTRACE-302] - htraced: Add admissions control to HRPC to limit
>> the number of incoming messages
>>  * [HTRACE-304] - htraced: fix bug with GREATER_THAN queries
>>  * [HTRACE-307] - htraced: queries sometimes return no results even
>> when many results exist due to confusion in iter

HTrace 4.1 release candidate 1

2016-02-02 Thread Colin P. McCabe

Hi all,

I've posted the first release candidate for HTrace 4.1 here:

http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/

The jars have been staged here:

https://repository.apache.org/content/repositories/orgapachehtrace-1021

HTrace 4.1 brings a lot of robustness improvements.  There were major
improvements to htraced and the web UI, as well as new metrics added.
There were numerous build fixups, and we added Docker support, to
ensure a repeatable build.

Check it out.  The vote will run for 5 days.

cheers,
Colin

Release Notes - HTrace - Version 4.1
** Bug
* [HTRACE-114] - Fix compilation error of htrace-hbase against hbase-1.0.0
* [HTRACE-238] - Change maven compiler source level to 1.7 to
match targetJdk
* [HTRACE-243] - Remove duplicate maven-assembly-plugin
configuration section in htrace-htraced/pom.xml
* [HTRACE-245] - NOTICE.txt: change "developed by The Apache
Software...” to "developed at The Apache Software...”
* [HTRACE-246] - HTrace WebApp not properly defined and therefore
not packaged into .war
* [HTRACE-248] - HTraced should gracefully shutdown if stopped
* [HTRACE-249] - Script and doc on how to publish website
* [HTRACE-251] - Fix "mvn clean" target
* [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs
are too chatty
* [HTRACE-256] - Change the artifactId for htrace-core in branch
4.0 to be htrace-core4
* [HTRACE-257] - htrace-htraced: add web symlink rather than
generating programmatically
* [HTRACE-262] - Temporarily suppress doclint for Java 8 to
prevent build failure
* [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY
config key more consistent with other configs
* [HTRACE-267] - Move owl logo licensing information from NOTICE to LICENSE
* [HTRACE-268] - Remove Units and go-codec from LICENSE since they
are not contained in the source release
* [HTRACE-272] - TracerPool must not load multiple inscance of
same receiver class when a simple classname is given
* [HTRACE-279] - Fix issues where the HTracedSpanReceiver was
using the wrong JSON serialization for spans and add validation to
htraced REST ingest path
* [HTRACE-280] - htraced: add metrics about total spans added and
dropped per address
* [HTRACE-281] - htraced: add example/htraced-conf.xml
* [HTRACE-282] - htraced: reap spans which are older than a
configurable interval
* [HTRACE-283] - Heartbeater should wait for goroutine to finish on close
* [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the
shaded version of commons-logging as provided
* [HTRACE-285] - htraced tool: fix query parsing and add query_test
* [HTRACE-289] - Fix TraceEnabled, etc. logger methods for
conditional logging
* [HTRACE-294] - htraced: fix some metrics issues
* [HTRACE-297] - htraced: avoid serializing spans to json unless
TRACE logging is enabled
* [HTRACE-300] - Reaper should be initialized before shards are activated
* [HTRACE-301] - htraced: fix unit tests that aren't waiting for
spans to be written, use semaphore for WrittenSpans
* [HTRACE-302] - htraced: Add admissions control to HRPC to limit
the number of incoming messages
* [HTRACE-304] - htraced: fix bug with GREATER_THAN queries
* [HTRACE-307] - htraced: queries sometimes return no results even
when many results exist due to confusion in iterator usage
* [HTRACE-311] - htraced: Fix logging to stdout via -Dlog.path=
* [HTRACE-316] - htrace-web: span.js issue: span ID string length
is 32, not 36
* [HTRACE-317] - Fix the documentation for adding tracing to an
application to reflect HTrace 4.x API changes
* [HTRACE-328] - htraced continues scanning in some cases even
when no more results are possible

** Improvement
* [HTRACE-129] - htraced: add /server/stats REST endpoint
* [HTRACE-156] - HTrace GUI: add about view
* [HTRACE-181] - gui: Split "about" screen
* [HTRACE-237] - Optimize htraced span receiver
* [HTRACE-239] - Add htrace/impl/TestZipkinSpanReceiver.java
* [HTRACE-260] - htrace-zipkin should not set the obsolete
duration field in thrift
* [HTRACE-271] - Add log4j.properties to all submodule tests
* [HTRACE-276] - Shade classes into org.apache.htrace.shaded
rather than org.apache.htrace
* [HTRACE-286] - htraced: improvements to logging, daemon startup,
and configuration
* [HTRACE-290] - htraced: Fix per-faculty log level settings and
add unit tests for conditional logging
* [HTRACE-291] - rename bin/htrace to bin/htracedTool
* [HTRACE-292] - "htracedTool version" should display the git
hash, and -Dgit.version option should be available for build
* [HTRACE-295] - htraced: setting span.expiry.ms to 0 should
disable span expiry
* [HTRACE-296] - htraced tests: make sure local settings for
HTRACED_WEB_DIR and HTRACE_CONF_DIR don't affect unit tests
* [HTRACE-298] - htraced: improve datastore serialization and metrics
* [HTRACE-303] - Add

Re: HTrace 4.1 release candidate 1

2016-02-02 Thread Colin P. McCabe

Thanks for looking at this, Lewis.

On Tue, Feb 2, 2016 at 6:25 PM, Lewis John Mcgibbney
 wrote:
> Hi Colin,
>
> Signatures Good
> Aggregated results of running DRAT over the release candidate
>
> Notes Binaries Archives Standards Apache Generated Unknown
> 0 0 0 142 118 0 15
> Unapproved licenses include
>
>
> /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap-theme.css
>
> /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap-theme.min.css
>   /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap.css
>   /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap.min.css
>   /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/backbone-1.1.2.js
>   /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/bootstrap.js
>   /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/bootstrap.min.js
>   /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/d3.min.js
>   /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/jquery-2.1.4.js
>   /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/moment-2.10.3.js
>   /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/npm.js
>
> /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/underscore-1.7.0.js
>   /usr/local/drat/deploy/data/jobs/rat/1454465688785/input/SpanProtos.java
>
> /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml
>
> /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml_02022016_1814
>
> I understand that the .css and .js files above are covered in LICENSE at
> the bottom however we need to address the following files
>
>   /usr/local/drat/deploy/data/jobs/rat/1454465688785/input/SpanProtos.java
>
> /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml
>
> /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml_02022016_1814

Hmm.  I think we talked about SpanProtos.java,
dependency-reduced-pom.xml, etc. during the previous release and
concluded that they are generated files, and hence exempt from the
license requirement according to
http://incubator.apache.org/guides/releasemanagement.html#notes-license-headers

>
> NOTICE includes
> Copyright 2015 The Apache Software Foundation
> This should be
> Copyright 2016 The Apache Software Foundation
>

OK

> There seems to be a bit on confusion between instructions for Building the
> code. We have the note in README.md and then a separate note within
> BUILDING.txt. We should probably resolve this and include them both in
> README.md
>

OK, I created HTRACE-342 to fix this.

> Build and tests pass fine.
>
> Typically the absence of the license header in the above files would be a
> -1 from me. I will wait to see how others review the candidate before
> VOTE'ing.
> Good job putting this together.

Thanks

best,
Colin

>
>
> On Tue, Feb 2, 2016 at 4:51 PM, > wrote:
>
>>
>> Hi all,
>>
>> I've posted the first release candidate for HTrace 4.1 here:
>>
>> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/
>>
>> The jars have been staged here:
>>
>> https://repository.apache.org/content/repositories/orgapachehtrace-1021
>>
>> HTrace 4.1 brings a lot of robustness improvements.  There were major
>> improvements to htraced and the web UI, as well as new metrics added.
>> There were numerous build fixups, and we added Docker support, to
>> ensure a repeatable build.
>>
>> Check it out.  The vote will run for 5 days.
>>
>> cheers,
>> Colin
>>
>> Release Notes - HTrace - Version 4.1
>> ** Bug
>> * [HTRACE-114] - Fix compilation error of htrace-hbase against
>> hbase-1.0.0
>> * [HTRACE-238] - Change maven compiler source level to 1.7 to
>> match targetJdk
>> * [HTRACE-243] - Remove duplicate maven-assembly-plugin
>> configuration section in htrace-htraced/pom.xml
>> * [HTRACE-245] - NOTICE.txt: change "developed by The Apache
>> Software...” to "developed at The Apache Software...”
>> * [HTRACE-246] - HTrace WebApp not properly defined and therefore
>> not packaged into .war
>> * [HTRACE-248] - HTraced should gracefully shutdown if stopped
>> * [HTRACE-249] - Script and doc on how to publish website
>> * [HTRACE-251] - Fix "mvn clean" target
>> * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs
>> are too chatty
>> * [HTRACE-256] - Change the artifactId for htrace-core in branch
>> 4.0 to be htrace-core4
>> * [HTRACE-257] - htrace-htraced: add web symlink rather than
>> generating programmatically
>> * [HTRACE-262] - Temporarily suppress doclint for Java 8 to
>> prevent build failure
>> * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY
>> config key more consistent with other configs
>> * [HTRACE-267] - Move owl logo licensing information from NOTICE to
>> LICENSE
>> * [HTRACE-268] - Remove Units and go-codec from LICENSE since they
>> are not contained in

Re: git tag and branch naming

2016-01-27 Thread Colin P. McCabe

Thanks, Sean.  That sounds like a good idea.  I guess we can drop the
"-release" suffix then.  "rel/4.0" and "rel/4.0.1", etc. seem pretty
self-explanatory.  My main goal was just to make branches look
different than tags.  I would prefer to keep the "-branch" suffix on
branches just to make that clear as well.

Sean, would RCs also receive the "rel/" prefix, or not?  I'm guessing
not, since we don't need to preserve them forever.

best,
Colin


On Tue, Jan 26, 2016 at 9:26 PM, Sean Busbey <bus...@cloudera.com> wrote:
> with the new ASF release tag policy, this would make our release tags look
> like 'rel/4.0-release' and 'rel/4.0.1-release'.
>
> the 'rel' prefix makes the distinction between branches and tagged releases
> clear to me. what do others think?
>
> On Tue, Jan 26, 2016 at 10:41 PM, Masatake Iwasaki <
> iwasak...@oss.nttdata.co.jp> wrote:
>
>> Sorry for late reply.
>>
>> I agree with the proposed naming conversion for branches and tags.
>> If there is no objection further, we should close HTRACE-331 and
>> prepare for the next release.
>>
>> Thanks,
>> Masatake Iwasaki
>>
>>
>> On 12/15/15 04:53, Colin P. McCabe wrote:
>>
>>> As part of our release process, we create git tags for each release
>>> candidate (RC)... for example, 3.1.0RC9 and 4.0.1RC1.  We also often
>>> use release branches-- for example, the "4.0" branch.
>>>
>>> As Sean Busbey pointed out, we should also be creating "release" tags,
>>> so that people who want to check out the release can do so without
>>> having to figure out which RC was anointed as the release.  I also
>>> think we should adopt a naming convention for release branches and
>>> tags so that people attempting to check out tags don't accidentally
>>> check out branches, and vice versa.
>>>
>>> The branch and tag naming is confusing right now.  For example,
>>> someone running "git checkout 4.0" might be surprised to learn that
>>> this checks out a branch currently containing 4.0.1, not the git tag
>>> for the 4.0 release.
>>>
>>> I'm thinking we should adopt the following convention:
>>> * release tags should have "release" in the name. So the tag for
>>> htrace 4.1 should be "4.1-release"
>>> * RC tags continue to be "4.1-RC1" and so forth.
>>> * release branches should have "branch" in the name. So the branch for
>>> 4.1 should be "branch-4.1".  In general, branches should not include
>>> "RC[0-9]" or "release" in the names, to avoid confusion with the tags.
>>>
>>> Let me know what you think.  If you guys agree, I will also create
>>> 4.0-release and 4.0.1-release tags corresponding to those releases.
>>>
>>> best,
>>> Colin
>>>
>>
>>
>
>
> --
> Sean

Re: Time for a new release?

2016-01-27 Thread Colin P. McCabe

Ah, sorry.  I did some preliminary work (like cutting the branch) but
I haven't made the RC yet.  I'll try to get it out there soon.

best,
Colin

On Tue, Jan 26, 2016 at 8:47 PM, Masatake Iwasaki
<iwasak...@oss.nttdata.co.jp> wrote:
> Hi Colin,
>
> I can see branch-4.1 created in the repo.
> Have you already started to create RC?
>
> Regards,
> Masatake Iwasaki
>
> On 1/15/16 10:18, Masatake Iwasaki wrote:
>>>
>>> I think it's time to cut the 4.1 release.
>>
>>
>> +1.
>>
>> I volunteer to do release management work, if you like.
>>
>> Thanks,
>> Masatake Iwasaki
>>
>>
>> On 1/14/16 07:52, Colin P. McCabe wrote:
>>>
>>> Hi all,
>>>
>>> Happy new year!
>>>
>>> I think it's time to cut the 4.1 release.  We've fixed up a lot of
>>> bugs, and made a lot of progress since 4.0.1.
>>>
>>> If everyone agrees, I'll post something in the next few days.
>>>
>>> best,
>>> Colin
>>
>>
>

Time for a new release?

2016-01-13 Thread Colin P. McCabe

Hi all,

Happy new year!

I think it's time to cut the 4.1 release.  We've fixed up a lot of
bugs, and made a lot of progress since 4.0.1.

If everyone agrees, I'll post something in the next few days.

best,
Colin

git tag and branch naming

2015-12-14 Thread Colin P. McCabe

As part of our release process, we create git tags for each release
candidate (RC)... for example, 3.1.0RC9 and 4.0.1RC1.  We also often
use release branches-- for example, the "4.0" branch.

As Sean Busbey pointed out, we should also be creating "release" tags,
so that people who want to check out the release can do so without
having to figure out which RC was anointed as the release.  I also
think we should adopt a naming convention for release branches and
tags so that people attempting to check out tags don't accidentally
check out branches, and vice versa.

The branch and tag naming is confusing right now.  For example,
someone running "git checkout 4.0" might be surprised to learn that
this checks out a branch currently containing 4.0.1, not the git tag
for the 4.0 release.

I'm thinking we should adopt the following convention:
* release tags should have "release" in the name. So the tag for
htrace 4.1 should be "4.1-release"
* RC tags continue to be "4.1-RC1" and so forth.
* release branches should have "branch" in the name. So the branch for
4.1 should be "branch-4.1".  In general, branches should not include
"RC[0-9]" or "release" in the names, to avoid confusion with the tags.

Let me know what you think.  If you guys agree, I will also create
4.0-release and 4.0.1-release tags corresponding to those releases.

best,
Colin

htrace jenkins build failures

2015-11-02 Thread Colin P. McCabe

Does anyone know what's up with the 30+ jenkins failure emails we got
this weekend.  It looks like "mvn clean" was failing with an
AccessDeniedException... what could have caused that?  Perhaps somehow
we created a directory without "execute" permission?  I remember we
had problems with Maven clean being unable to delete those in the
Hadoop build.  In any case, the permission denied exception is gone
now.

best,
Colin

Re: [DISCUSS] Release for HTrace/Drive Towards Graduation

2015-10-31 Thread Colin P. McCabe

Thanks, Adrian.  I've been talking to some folks who might be
interested in setting up gerrit for htrace.  Stay tuned.

cheers,
Colin

On Fri, Oct 30, 2015 at 1:25 PM, Adrian Cole  wrote:
> OK, well do announce once you've decided how to do a CI pipeline that
> includes pre-commit testing and context-sensitive review comments. I look
> forward to it.

[ANNOUNCE] Apache HTrace 4.0.1 (Incubating) released

2015-09-26 Thread Colin P. McCabe

The Apache HTrace (Incubating) team is pleased to announce the release
of HTrace 4.0.1.

HTrace is a tracing framework for use with distributed systems.

This dot release fixes some build issues, including the generation of
war files for the webapp, the naming of the htrace-core4 artifact, the
"mvn clean" target, and the go build on Mac OS X.

The release is available in maven:
https://repo1.maven.org/maven2/org/apache/htrace/

The full change log is available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315924=12333550

Your help and feedback is more than welcome. For more information on how to
report problems and to get involved, visit the project website at
http://incubator.apache.org/projects/htrace.html.

cheers,
The Apache HTrace (Incubating) Team

Re: Making Website Update Part of Release Process

2015-09-16 Thread Colin P. McCabe

Thanks, stack!  That's awesome, will be great for remembering the
steps in making a release.

Colin

On Tue, Sep 15, 2015 at 9:40 PM, Stack  wrote:
> On Mon, Sep 14, 2015 at 10:14 PM, Stack  wrote:
>
>> They are up now looking at latest mirror:
>> http://mirrors.gigenet.com/apache/incubator/htrace/
>>
>> Doc on how to release is coming.
>>
>>
> A start on RM doc can be found here:
> http://htrace.incubator.apache.org/building.html
>
>
>
>> On how to update the website, currently best doc is in HTRACE-19. I can
>> update in morning unless you beat me to it.
>>
>>
> Here is how to update website:
> http://htrace.incubator.apache.org/building.html#Publishing_htraceincubatorapacheorg_website
>
> St.Ack
>
>
>
>> Thanks Lewis,
>> St.Ack
>>
>>
>>
>>
>>
>> On Mon, Sep 14, 2015 at 9:41 PM, Lewis John Mcgibbney <
>> lewis.mcgibb...@gmail.com> wrote:
>>
>>> Hi Folks,
>>> Excellent work on getting recent 4.0.0-incubating release out the door.
>>> I wonder if there is documentation available for a release manager? If so
>>> does it currently contain advice on how to update the HTrace website with
>>> the info for the release?
>>> Right now I don't even see the HTrace artifacts available
>>> http://www.apache.org/dyn/closer.cgi/incubator/htrace/
>>> I am assuming they are not available for public consumption yet.
>>> Is this the case?
>>> Thanks
>>> Lewis
>>>
>>> --
>>> *Lewis*
>>>
>>
>>

Re: htrace-core compatibility policy for htrace 4.x

2015-09-16 Thread Colin P. McCabe

On Tue, Sep 15, 2015 at 5:42 PM, Masatake Iwasaki
 wrote:
> I agree with the policy.
>
>
>> Major releases should change the namespace of htrace-core classes so that
>> both a 4.x and a 5.x jar can reside on the same CLASSPATH
>
> Should we have a namespace convention for this?
> We moved classes from org.apache.htrace to org.apache.htrace.core on 4.0
> but need new word other than "core" for 5.0.
> Simple way for this would be containing version number in package name
> like "org.apache.htrace.5".

Hmm, interesting idea.  org.apache.htrace5 might be nice too.

We can probably wait a bit before deciding on the namespace name...
hopefully we won't need 5.x for a while :)

Colin

>
>
>> Let's focus on just compatibility rules for htrace-core right now,
>> since that's where our integration issues are. The other subprojects
>> of htrace generally don't have the same integration issues.
>
> I thinks it is reasonable.
> Any version of receiver with the same major version keeps working
> as far as htrace-core keeps compatibility.
>
>
> Thanks,
> Masatake Iwasaki
>
>
>
> On 9/15/15 09:13, Colin McCabe wrote:
>>
>> Hi all,
>>
>> In the recent 4.0 release, we changed the htrace-core API. The API
>> that programs use to create traces, annotations, etc. (aka the "Java
>> client API") went through some changes. This was necessary to clean up
>> some core architectural issues (such as the use of overly short 64 bit
>> IDs that will collide in a real-world deployment, or the overuse of
>> globals.)
>>
>> Since we want to make it easy for projects to integrate with HTrace, I
>> think we should have some compatibility rules for htrace-core for the
>> future.
>>
>> Specifically, I think that we should include only backwards-compatible
>> changes to the htrace-core API in HTrace 4.x So, for example, adding a
>> new function is OK. Deleting an existing function or altering it in an
>> incompatible way is not. It is OK to add a new function to a public
>> abstract base class (provided you also add a default implementation in
>> the base), but not to add a new function to a public interface, since
>> that would break compilation.
>>
>> We should save incompatible changes for HTrace 5.x. In general, each
>> "major release" such as 4.x or 5.x should contain only compatible
>> changes to htrace-core. There should be no guarantees between 4.x and
>> 5.x, or between any major releases-- this is the time to address
>> architectural debt that can't be resolved any other way. Major
>> releases should change the namespace of htrace-core classes so that
>> both a 4.x and a 5.x jar can reside on the same CLASSPATH, similar to
>> how we did with 3.x and 4.x. This is important because it will require
>> some time for downstream projects to upgrade from 4.x to 5.x, and in
>> the meantime we must avoid CLASSPATH conflict issues. There is no
>> requirement that tracing work when the major version of the client and
>> the span receivers are different. However, the programs themselves
>> should function.
>>
>> Let's focus on just compatibility rules for htrace-core right now,
>> since that's where our integration issues are. The other subprojects
>> of htrace generally don't have the same integration issues. For
>> example, it is easy for an admin to standardize on a single version of
>> htrace-hbase or htrace-htraced across the entire cluster. They simply
>> install the jars for the version they want. It is not easy for that
>> same admin to standardize htrace-core, since they might have Hadoop
>> pulling in 4.1 and HBase pulling in 4.0. The different subprojects are
>> also at different levels of maturity. For example, htrace-flume is
>> still very immature, whereas htrace-htraced is starting to get more
>> mature. So I think the subprojects should come up with their own
>> compatibility policies rather than trying to be one-size-fits-all.
>>
>> This policy should only apply to publicly visible symbols in
>> htrace-core, not to private or package-private symbols.  Test
>> functions should also not be covered, since they don't appear in the
>> final htrace-core jar.
>>
>> I think having a compatibility policy for htrace-core will be very
>> nice for the users of our core APIs. Let me know what you think.
>>
>> best,
>> Colin
>
>

Re: [VOTE] HTrace 4.0 Release Candidate 0

2015-09-09 Thread Colin P. McCabe

Quick reminder that the vote closes in 4 hours.  Check it out.

Lewis, check out my NOTICE.txt explanation as well.

You can take a look at my public key at
http://people.apache.org/~cmccabe/htrace/releases/KEYS.  Stack signed when
we met up in San Francisco.

I also checked the key into
https://dist.apache.org/repos/dist/release/incubator/htrace


r10450 | cmccabe | 2015-09-08 18:41:03 + (Tue, 08 Sep 2015) | 1 line

Add public key for cmcc...@apache.org


I'm not sure why it hasn't shown up on
http://archive.apache.org/dist/incubator/htrace/KEYS ... is there something
I need to do to "publish" it from the svn repo?

best,
Colin

On Fri, Sep 4, 2015 at 4:02 PM, Colin P. McCabe <cmcc...@apache.org> wrote:

> I've posted the first release candidate here:
>
> http://people.apache.org/~cmccabe/htrace/releases/4.0.0/rc0
>
> The jars have been staged here:
>
> https://repository.apache.org/content/repositories/orgapachehtrace-1017
>
> There's a lot of great stuff in this release, including a new web UI,
> many bug fixes, API improvements and enlargement of span IDs to 128
> bits to avoid conflicts.
>
> The vote will run for 5 days.
>
> cheers,
> Colin
>
> Release Notes - HTrace - Version 4.0
>
> ** Sub-task
> * [HTRACE-208] - Remove deprecated addKVAnnotation(byte[], byte[])
> method
> * [HTRACE-209] - Make span ID 128 bit to avoid collisions
> * [HTRACE-210] - Remove TrueIfTracingSampler
> * [HTRACE-211] - Move htrace-core classes to the
> org.apache.htrace.core namespace
> * [HTRACE-212] - Change version to 4.0
> * [HTRACE-214] - De-globalize Tracer.java
> * [HTRACE-215] - Simplify the Sampler type
> * [HTRACE-216] - SpanReceivers should not fill in ProcessId
> * [HTRACE-217] - Rename ProcessId to TracerId
> * [HTRACE-222] - Add SpanReceiverPool
> * [HTRACE-228] - Fix subprojects to refer to new
> org.apache.htrace.core namespace
> * [HTRACE-229] - htrace-webapp needs to be updated to refer to
> "tracerid" not "processid"
> ** Bug
> * [HTRACE-159] - libhtrace.so: use HRPC endpoint of htraced
> * [HTRACE-164] - htrace hrpc: use msgpack for serialization
> * [HTRACE-166] - Add tabbed view
> * [HTRACE-167] - Update go build instructions in BUILDING.txt
> * [HTRACE-171] - htraced godeps should use
> github.com/ugorji/go/codec rather than github.com/ugorji/go
> * [HTRACE-174] - Refactor GUI
> * [HTRACE-177] - htrace-zipkin: shade all dependencies
> * [HTRACE-182] - htraced: add rpm build via -Prpm
> * [HTRACE-189] - gui: fix error handling in a few places
> * [HTRACE-190] - htraced: allow querying by process ID
> * [HTRACE-191] - gui: add "duration" to span details, filter out
> "selected"
> * [HTRACE-192] - gui: when expanding parents or children, sort the
> spans by begin time
> * [HTRACE-193] - gui: avoid doing multiple redraws when
> spanResults is updated
> * [HTRACE-196] - gui: add scrolling for spans view
> * [HTRACE-201] - htrace-web: URL-Encode query JSON
> * [HTRACE-202] - htrace-web: fix "converting circular object to
> JSON" error when pressing "clear" button
> * [HTRACE-218] - Fix issues with finding json-c includes and librt
> in the native library
> * [HTRACE-219] - Add -Dleveldb.prefix and -Djsonc.prefix build options
> * [HTRACE-220] - htraced: should be able to set log.path to the
> empty string via "-Dlog.path=" on the command line
> * [HTRACE-223] - gobuild.sh: fix issue where maven succeeds if go
> build fails
> * [HTRACE-224] - htrace C client: htrace_conf_get_u64,
> htrace_conf_get_double can't handle spaces at the end of strings
> * [HTRACE-230] - Make TracerBuilder like all other Builders; an
> internal rather than adjacent class
> * [HTRACE-233] - htrace-zipkin should explicitly include slf4j-api
> to avoid ClassNotFoundException
> * [HTRACE-234] - Add workaround to prevent htrace-hbase from
> getting in an infinite loop while creating the dependency-reduced pom
> ** Improvement
> * [HTRACE-29] - add javascript web UI for htraced
> * [HTRACE-160] - htraced: support continuing a query from where
> the client left it off by sending a previous span
> * [HTRACE-162] - htraced hrpc: some logging improvements
> * [HTRACE-170] - Optimize use of Random in htrace-core by using
> ThreadLocalRandom
> * [HTRACE-172] - Move minJdk to 1.7 (JDK 7)
> * [HTRACE-175] - Add Trace#addKVAnnotation convenience method
> * [HTRACE-176] - Expose ZipkinSpanReceiver c

Re: Dependency Request

2015-09-09 Thread Colin P. McCabe

Lewis has been looking into creating a Docker image with all our build
dependencies.  Then we can just use Docker to build and run our unit
tests inside that container.

I think this will be a better solution than installing things on the
build machines since with Docker we don't have to keep bugging INFRA
about installing new dependencies.  It also makes it easy to move the
build from one machine to another, and control the versions of our
dependencies.  Finally, we can just point new contributors at the
Docker image and get them compiling things in seconds rather than
hunting around for dependencies on their local system. There's more
discussion on HTRACE-241 and HTRACE-157.

best,
Colin

On Wed, Sep 9, 2015 at 10:09 AM, Andrew Bayer  wrote:
> I think BUILDS is still there?
>
> On Wed, Sep 9, 2015 at 9:46 AM, Stack  wrote:
>
>> I opened INFRA-10401
>>
>> (Andrew, is the special 'BUILDS' project gone now and we should just file
>> against INFRA for all build issues going forward?)
>>
>> Thanks,
>> St.Ack
>>
>> On Tue, Sep 8, 2015 at 7:41 PM, Andrew Bayer 
>> wrote:
>>
>> > Please open a JIRA?
>> > On Sep 8, 2015 18:39, "lewis john mcgibbney"  wrote:
>> >
>> > > Hi builds@,
>> > > The Apache HTrace (incubating) project recently established a build [0]
>> > for
>> > > our codebase on b.a.o.
>> > > In order to build and test we require the leveldb-devel package to be
>> > > installed and libleveldb.so to be available on the PATH.
>> > > Is it possible for someone to install this one one or more of the build
>> > > slaves?
>> > > Thanks in advance.
>> > > Lewis
>> > >
>> > > [0] https://builds.apache.org/view/All/job/HTrace-Master
>> > >
>> > >
>> > > --
>> > >
>> > > ` :
>> > > :   , :
>> > >  #+`. ,,`,
>> > > ` ;##`  .`,.  ;;':;`
>> > >  `` ##@.;.;: ,;+;;;';;';;';'`
>> > >   ```,###:  .,;; +;;'';;+;;;';;`
>> > > ```#+##'``;+ '';;;'';;';;;';;;`
>> > >  ```,##+#@:: ''';';;';+;;';;':::+:
>> > >```.#'';';+;;';';';;';;';;':,;:
>> > >  '#+#+#';';''';;';';;';;';'::
>> > >   ;;:';,##''';'';;';';;'';;;'::';;;':.```
>> > > `.,`;;;++';'';;';'';;';;;';;'::';;:;';;;::
>> > > :`,.,.`:';+#+;;''';'';';';;';;';;';;;'::;';:;.
>> > >.`..;,:`';;';';;;'+#+';;''+';;';:'';;';';;;':::;,:`
>> > > ` ,`:.
>> > ;;;';';;;++#+'';''';''+;;';;';::';';;:..
>> > >   ` ``
>> > > ;;;';';';';;'+###+';';'';;';;';;';;';;;';;',:.
>> > >   ` `
>> > > `;:;;';';';;;'+;';';;';;';;';;';;'';;';';::;
>> > >
>> > >
>> `.;,:::;::;';';;'#++''';;';;;'';+';:::''::;;..:
>> > >
>> > >
>> >
>> ```:,'::,;';';;;';;;''##+++'';;';;';;;''';;':,,,:.:,.`
>> > >
>> > >
>> >
>> ```..::,;';:;';';';;;';';';';'''++###+'+;';;;';;;';;:;.:..:..,
>> > >
>> > >
>> >
>> ,;;:;:;';''';''++##+++.:..:.,;
>> > > `
>> > >
>> > >
>> `.``,,:,';;::;;::';';;;';';;';';;';';;';;';';';'++#+###@#++:...,,.;:.
>> > >
>> > > `:.';.,;;',,;;;';';;';;':;;;';';;';;';';';;';;;''.:,:.,:'#@'::,
>> > >
>> > > ```.:,';;.::':';';',;;;';;':;';;';;';;;';;';'';;.;.,.:..,:.::
>> > >
>> > > ``:::',:;';;,:;;',:';';;':';';;;';;'::';;;,..,.,.,:+`
>> > >
>> > > `..:'+:';;',;';,:;:';;;,,';::,';;',,';;.:.:;,
>> > >
>> > > ``,.';;:':,;:;,,:;:::``..,:,``
>> > >
>> > > :`;;`
>> > >
>> > > ``: ,:`
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > http://people.apache.org/~lewismc || @hectorMcSpector ||
>> > > http://www.linkedin.com/in/lmcgibbney
>> > >
>> > >   Apache Gora V.P || Apache Nutch PMC || Apache Any23 V.P ||
>> > > Apache OODT PMC
>> > >Apache Open Climate Workbench PMC || Apache Tika PMC ||
>> Apache
>> > > TAC
>> > > Apache Usergrid || Apache HTrace (incubating) || Apache CommonsRDF
>> > > (incubating)
>> > >
>> >
>>

Re: [VOTE] HTrace 4.0 Release Candidate 0

2015-09-08 Thread Colin P. McCabe

On Mon, Sep 7, 2015 at 8:47 AM, Lewis John Mcgibbney
 wrote:
> Hi Colin,
> Nice work in getting the release candidate prepared and available for
> review.
> Where is the KEYS file?
> I tried the one here
>

Good call.  I'll append my key to the KEYS file.  My public key is
here: http://pgp.mit.edu/pks/lookup?op=get=0xDE78987A9CD4D9D3

> It does not contain your sig so I cannot verify the release sigs.
> Moving on,
>
> Notes Binaries Archives Standards Apache Generated Unknown 0 0 0 116 92 0 15
> Unapproved licenses are as follows
>
> bootstrap-theme.css
> bootstrap-theme.min.css
> bootstrap.css
> bootstrap.min.css
> backbone-1.1.2.js
> bootstrap.js
> bootstrap.min.js
> d3.min.js
> jquery-2.1.4.js
> moment-2.10.3.js
> npm.js
> underscore-1.7.0.js
> SpanProtos.java
> dependency-reduced-pom.xml
>
> None of the above libraries are declared within NOTICE.txt

bootstrap-theme.css, bootstrap-theme.min.css, bootstrap.css,
bootstrap.min.css, bootstrap.js, bootstrap.min.js, and
bootstrap-3.3.1/js/npm.js are described in LICENSE.txt as:

> Bootstrap, an html, css, and javascript framework, is
> Copyright (c) 2011-2015 Twitter, Inc and MIT licensed:
> https://github.com/twbs/bootstrap/blob/master/LICENSE

backbone-1.1.2.js is described in LICENSE.txt as:

> backbone, is a javascript library, that is Copyright (c) 2010-2014
> Jeremy Ashkenas, DocumentCloud. It is MIT licensed:
> https://github.com/jashkenas/backbone/blob/master/LICENSE

d3.min.js is described in LICENSE.txt as:

> D3, a javascript library for manipulating data, used by htrace-hbase
> is Copyright 2010-2014, Michael Bostock and BSD licensed:
> https://github.com/mbostock/d3/blob/master/LICENSE

jquery-2.1.4.js is described in LICENSE.txt as:

> jquery, a javascript library, is Copyright jQuery Foundation and other
> contributors, https://jquery.org/. The software consists of
> voluntary contributions made by many individuals. For exact
> contribution history, see the revision history
> available at https://github.com/jquery/jquery
> It is MIT licensed:
> https://github.com/jquery/jquery/blob/master/LICENSE.txt

moment-2.10.3.js is described in LICENSE.txt as:

> moment.js is a front end time conversion project.
> It is (c) 2011-2014 Tim Wood, Iskren Chernev, Moment.js contributors
> and shared under the MIT license:
> https://github.com/moment/moment/blob/develop/LICENSE

underscore-1.7.0.js is described in LICENSE.txt as:

> underscore, a javascript library of functional programming helpers, is
> (c) 2009-2014 Jeremy Ashkenas, DocumentCloud and Investigative Reporters
> & Editors and an MIT license:
> https://github.com/jashkenas/underscore/blob/master/LICENSE

According to http://www.apache.org/dev/licensing-howto.html:

> Bundling a dependency which is issued under one of the following
> licenses is straightforward, assuming that said license applies
> uniformly to all files within the dependency:
> 1. BSD (without advertising clause)
> 2. MIT/X11
> In LICENSE, add a pointer to the dependency's license within the source tree
> and a short note summarizing its licensing...
> Under normal circumstances, there is no need to modify NOTICE.

Since these are all BSD or MIT licensed, I would interpret this to
mean there is no need for us to modify NOTICE.  Does that make sense?

SpanProtos.java and dependency-reduced-pom.xml are generated files.
According to 
http://incubator.apache.org/guides/releasemanagement.html#notes-license-headers
:

> The issue of licenses on generated documentation is a little controversial.
> Copyright may not subsist in a document which is generated by an 
> transformation
> from an original. In which case, the license header may be unnecessary. 
> License
> headers should always be present in the original. Where it is reasonable to 
> do so,
> the templates should also add the license header to the generated documents.

I looked at how Apache Hadoop is handling this, and they do not have
license headers on their protobuf-generated files.  So I think this is
fine.  (From a technical point of view, I think the PB compiler also
provides no way to do this, as far as I know.)  The situation for
dependency-reduced-pom.xml is the same-- it is a file generated by
Maven.  This is similar to jar files, which are also generated by the
build process, and do not contain a license header.

Also, I believe all these library and generated  files were in the
last release we did (although they moved around a bit).

>
> I navigate to the decompressed RC directory and try mvn clean install... it
> enters into a loop!

As Masatake commented, this is HTRACE-236.  There is a note about it
in the README.md.  For a workaround, you can use the solution
described in HTRACE-234 to get the build to run on your version of
Maven (which I assume is 3.3)...

Thanks for trying out the release-- I will take a look at updating
KEYS tomorrow.  So far I haven't seen anything that would need a
respin (please let me know if I missed

Re: HTrace on Jira

2015-09-08 Thread Colin P. McCabe

Hi Lewis,

At this point, nothing else is going into 4.0 unless it's a blocker
for the release.  The master branch version is now at 4.1.  Pretty
much all open JIRAs should be targeted at 4.1.

You can see the JIRAs fixed in 4.0 here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315924=12333022

I'm not sure what you mean when you say that "most issues are not
assigned to any release at all."  Pretty much every issue has a "fix
version" which is the release it was fixed in, and an "affects
version", which is the first release it appeared in.  Do you see any
issues without those fields set?  If so, let's fill them in.

One thing that frustrates me is that there is no "Target version:"
field in our JIRA, like there is on Hadoop's JIRA.  I'm not sure what
we have to configure to get a field like that.

best,
Colin

On Tue, Sep 8, 2015 at 4:46 PM, Lewis John Mcgibbney
 wrote:
> Hi Folks,
> What is happening with versioning on Jira?
> Master branch runs off of 4.0.0-incubating-SNAPSHOT, the only version which
> exists within Jira is 4.1.
> Most issues are not assigned to any release at all.
> Can someone sort it out and we could potentially set a roadmap for
> releasing HTrace 4.0/4.0.0-incubating?
> Thanks folks
> lewis
>
> --
> *Lewis*

Re: HTrace on Jira

2015-09-08 Thread Colin P. McCabe

Good idea... I created some components:

build: The Maven build system, pom.xml, Makefile, CMake, Docker, etc.
flume: The htrace-flume span receiver
zipkin: The htrace-zipkin span receiver
hbase: The htrace-hbase span receiver
core: The htrace-core subproject which contains the main htrace
library used by applications to initiate tracing
htraced: The htraced daemon and span receiver
ui: htrace-web graphical user interface

I think it probably makes sense to combine docs with the website since
we generate our website from the docs via Maven.  Feel free to add
more stuff, of course.

cheers,
Colin

On Tue, Sep 8, 2015 at 6:40 PM, Lewis John Mcgibbney
 wrote:
> Hi Folks,
> Further to this, is it possible for someone to go through the components
> and create components for each module as well as one for website?
> Thanks
> Lewis
>
> On Tue, Sep 8, 2015 at 4:46 PM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> Hi Folks,
>> What is happening with versioning on Jira?
>> Master branch runs off of 4.0.0-incubating-SNAPSHOT, the only version
>> which exists within Jira is 4.1.
>> Most issues are not assigned to any release at all.
>> Can someone sort it out and we could potentially set a roadmap for
>> releasing HTrace 4.0/4.0.0-incubating?
>> Thanks folks
>> lewis
>>
>> --
>> *Lewis*
>>
>
>
>
> --
> *Lewis*

Re: [VOTE] HTrace 4.0 Release Candidate 0

2015-09-08 Thread Colin P. McCabe

Thanks for looking at this, Masatake and Lewis!  Lewis, is my
explanation of NOTICE.txt correct, or did I miss something?

Remember to vote... the vote will be open until tomorrow.

Here is my +1. ;)

cheers,
Colin

On Tue, Sep 8, 2015 at 6:32 PM, Masatake Iwasaki
<iwasak...@oss.nttdata.co.jp> wrote:
> Thanks for putting this up, Colin.
>
> - verified mds and signature
> - ran "mvn install" without test failure
> - ran "mvn package -Pnative" without test failure in htrace-c
> - built src tarball by running "mvn clean install -DskipTests
> assembly:single -Pdist"
> - launched htraced and sent test tracing spans by HTracedRESTReceiver and
> checked the spans by Web-UI
>
> I'm +1 if the issue about NOTICE.txt Lewis pointed out is not critical for
> release.
>
> Masatake Iwasaki
>
>
> On 9/5/15 08:02, Colin P. McCabe wrote:
>>
>> I've posted the first release candidate here:
>>
>> http://people.apache.org/~cmccabe/htrace/releases/4.0.0/rc0
>>
>> The jars have been staged here:
>>
>> https://repository.apache.org/content/repositories/orgapachehtrace-1017
>>
>> There's a lot of great stuff in this release, including a new web UI,
>> many bug fixes, API improvements and enlargement of span IDs to 128
>> bits to avoid conflicts.
>>
>> The vote will run for 5 days.
>>
>> cheers,
>> Colin
>>
>> Release Notes - HTrace - Version 4.0
>>
>> ** Sub-task
>>  * [HTRACE-208] - Remove deprecated addKVAnnotation(byte[], byte[])
>> method
>>  * [HTRACE-209] - Make span ID 128 bit to avoid collisions
>>  * [HTRACE-210] - Remove TrueIfTracingSampler
>>  * [HTRACE-211] - Move htrace-core classes to the
>> org.apache.htrace.core namespace
>>  * [HTRACE-212] - Change version to 4.0
>>  * [HTRACE-214] - De-globalize Tracer.java
>>  * [HTRACE-215] - Simplify the Sampler type
>>  * [HTRACE-216] - SpanReceivers should not fill in ProcessId
>>  * [HTRACE-217] - Rename ProcessId to TracerId
>>  * [HTRACE-222] - Add SpanReceiverPool
>>  * [HTRACE-228] - Fix subprojects to refer to new
>> org.apache.htrace.core namespace
>>  * [HTRACE-229] - htrace-webapp needs to be updated to refer to
>> "tracerid" not "processid"
>> ** Bug
>>  * [HTRACE-159] - libhtrace.so: use HRPC endpoint of htraced
>>  * [HTRACE-164] - htrace hrpc: use msgpack for serialization
>>  * [HTRACE-166] - Add tabbed view
>>  * [HTRACE-167] - Update go build instructions in BUILDING.txt
>>  * [HTRACE-171] - htraced godeps should use
>> github.com/ugorji/go/codec rather than github.com/ugorji/go
>>  * [HTRACE-174] - Refactor GUI
>>  * [HTRACE-177] - htrace-zipkin: shade all dependencies
>>  * [HTRACE-182] - htraced: add rpm build via -Prpm
>>  * [HTRACE-189] - gui: fix error handling in a few places
>>  * [HTRACE-190] - htraced: allow querying by process ID
>>  * [HTRACE-191] - gui: add "duration" to span details, filter out
>> "selected"
>>  * [HTRACE-192] - gui: when expanding parents or children, sort the
>> spans by begin time
>>  * [HTRACE-193] - gui: avoid doing multiple redraws when
>> spanResults is updated
>>  * [HTRACE-196] - gui: add scrolling for spans view
>>  * [HTRACE-201] - htrace-web: URL-Encode query JSON
>>  * [HTRACE-202] - htrace-web: fix "converting circular object to
>> JSON" error when pressing "clear" button
>>  * [HTRACE-218] - Fix issues with finding json-c includes and librt
>> in the native library
>>  * [HTRACE-219] - Add -Dleveldb.prefix and -Djsonc.prefix build
>> options
>>  * [HTRACE-220] - htraced: should be able to set log.path to the
>> empty string via "-Dlog.path=" on the command line
>>  * [HTRACE-223] - gobuild.sh: fix issue where maven succeeds if go
>> build fails
>>  * [HTRACE-224] - htrace C client: htrace_conf_get_u64,
>> htrace_conf_get_double can't handle spaces at the end of strings
>>  * [HTRACE-230] - Make TracerBuilder like all other Builders; an
>> internal rather than adjacent class
>>  * [HTRACE-233] - htrace-zipkin should explicitly include slf4j-api
>> to avoid ClassNotFoundException
>>  * [HTRACE-234] - Add workaround to prevent htrace-hbase from
>> getting in an infinite loop while creating the dependency-reduced pom
>> ** Improvement
>>  * [HTRACE-29] - add javascript web UI for htraced
>>  * [HTRACE-160] - htraced: support continuing a query fr

[VOTE] HTrace 4.0 Release Candidate 0

2015-09-04 Thread Colin P. McCabe

I've posted the first release candidate here:

http://people.apache.org/~cmccabe/htrace/releases/4.0.0/rc0

The jars have been staged here:

https://repository.apache.org/content/repositories/orgapachehtrace-1017

There's a lot of great stuff in this release, including a new web UI,
many bug fixes, API improvements and enlargement of span IDs to 128
bits to avoid conflicts.

The vote will run for 5 days.

cheers,
Colin

Release Notes - HTrace - Version 4.0

** Sub-task
* [HTRACE-208] - Remove deprecated addKVAnnotation(byte[], byte[]) method
* [HTRACE-209] - Make span ID 128 bit to avoid collisions
* [HTRACE-210] - Remove TrueIfTracingSampler
* [HTRACE-211] - Move htrace-core classes to the
org.apache.htrace.core namespace
* [HTRACE-212] - Change version to 4.0
* [HTRACE-214] - De-globalize Tracer.java
* [HTRACE-215] - Simplify the Sampler type
* [HTRACE-216] - SpanReceivers should not fill in ProcessId
* [HTRACE-217] - Rename ProcessId to TracerId
* [HTRACE-222] - Add SpanReceiverPool
* [HTRACE-228] - Fix subprojects to refer to new
org.apache.htrace.core namespace
* [HTRACE-229] - htrace-webapp needs to be updated to refer to
"tracerid" not "processid"
** Bug
* [HTRACE-159] - libhtrace.so: use HRPC endpoint of htraced
* [HTRACE-164] - htrace hrpc: use msgpack for serialization
* [HTRACE-166] - Add tabbed view
* [HTRACE-167] - Update go build instructions in BUILDING.txt
* [HTRACE-171] - htraced godeps should use
github.com/ugorji/go/codec rather than github.com/ugorji/go
* [HTRACE-174] - Refactor GUI
* [HTRACE-177] - htrace-zipkin: shade all dependencies
* [HTRACE-182] - htraced: add rpm build via -Prpm
* [HTRACE-189] - gui: fix error handling in a few places
* [HTRACE-190] - htraced: allow querying by process ID
* [HTRACE-191] - gui: add "duration" to span details, filter out "selected"
* [HTRACE-192] - gui: when expanding parents or children, sort the
spans by begin time
* [HTRACE-193] - gui: avoid doing multiple redraws when
spanResults is updated
* [HTRACE-196] - gui: add scrolling for spans view
* [HTRACE-201] - htrace-web: URL-Encode query JSON
* [HTRACE-202] - htrace-web: fix "converting circular object to
JSON" error when pressing "clear" button
* [HTRACE-218] - Fix issues with finding json-c includes and librt
in the native library
* [HTRACE-219] - Add -Dleveldb.prefix and -Djsonc.prefix build options
* [HTRACE-220] - htraced: should be able to set log.path to the
empty string via "-Dlog.path=" on the command line
* [HTRACE-223] - gobuild.sh: fix issue where maven succeeds if go
build fails
* [HTRACE-224] - htrace C client: htrace_conf_get_u64,
htrace_conf_get_double can't handle spaces at the end of strings
* [HTRACE-230] - Make TracerBuilder like all other Builders; an
internal rather than adjacent class
* [HTRACE-233] - htrace-zipkin should explicitly include slf4j-api
to avoid ClassNotFoundException
* [HTRACE-234] - Add workaround to prevent htrace-hbase from
getting in an infinite loop while creating the dependency-reduced pom
** Improvement
* [HTRACE-29] - add javascript web UI for htraced
* [HTRACE-160] - htraced: support continuing a query from where
the client left it off by sending a previous span
* [HTRACE-162] - htraced hrpc: some logging improvements
* [HTRACE-170] - Optimize use of Random in htrace-core by using
ThreadLocalRandom
* [HTRACE-172] - Move minJdk to 1.7 (JDK 7)
* [HTRACE-175] - Add Trace#addKVAnnotation convenience method
* [HTRACE-176] - Expose ZipkinSpanReceiver configuration keys externally
* [HTRACE-180] - Move the GUI to a top-level subproject
* [HTRACE-184] - Expose PROCESS_ID_KEY configuration key
* [HTRACE-186] - gui: support finding the parents and children of
spans, add owl
* [HTRACE-194] - gui: support multiple selections, zooming to fit
a group of spans, deleting a group of spans
* [HTRACE-197] - htraced build: set RUNPATH if possible
* [HTRACE-199] - gui: Double clicking on spans should bring up span details
* [HTRACE-203] - htrace-web: pressing enter should dismiss the
modal dialog box
* [HTRACE-204] - htrace-web: add draggable bar which allows more
or less visual space for process name in search view
* [HTRACE-205] - htrace-web: Width of SearchResultsView should be
uppdated along with resizing of browser window
* [HTRACE-206] - htrace-web: when the canvas has focus, the delete
key should clear, z key should zoom
* [HTRACE-221] - htraced: search /etc/htraced/conf for the htraced
configuration by default
* [HTRACE-227] - Remove dependency to non-public API of
hadoop-common from htrace-hbase
** New Feature
* [HTRACE-143] - htraced search GUI enhancements
** Task
* [HTRACE-183] - htraced: move src/go directory to go
** Test
* [HTRACE-213] - Add test for ZipkinSpanReceiver

Preparing for HTrace 4.0

2015-09-03 Thread Colin P. McCabe

Hi guys,

I think it's time for a new release of HTrace.  We've resolved 58
JIRAs since our last release!  I can RM.

cheers,
Colin

Re: Build instructions for Htrace

2015-09-02 Thread Colin P. McCabe

On Wed, Sep 2, 2015 at 9:32 AM, Jake Farrell  wrote:

> mvnw looks to be gradle training wheels, not sure that its needed. perhaps
> a dockerfile would help here and also give us a build env we could run
> tests within jenkins/travis with, just a thought
>
>
+1.  Creating a docker build environment for Jenkins has been on our to-do
list for a while.  It would be a great contribution for someone.  It seems
like it would also help anyone who wanted to build the project as well.

> On Wed, Sep 2, 2015 at 12:20 PM, Adrian Cole 
> wrote:
>
> > What do you think about moving to maven wrapper (ex ./mvnw install as
> > opposed to maven install). This sorts out the maven version issues by
> > baking them into the wrapper.
> >
> > I can raise a patch if you think this would help.
> >
>

Thanks for the offer, but I'm not sure this is a good direction for us to
go in.  If we start putting shell scripts and other wrappers around Maven,
things get even more complex.  I've been down the autotools / virtualenv
road, and it's not pretty.  I have a suspicion that we can make later
versions of maven work if we fiddle with our pom.xml dependency lists a bit.

best,
Colin

Re: Introduction

2015-08-30 Thread Colin P. McCabe

Hi Arun,

Thanks for trying out HTrace! If you want, just file a JIRA for the
javadoc issue and we can fix that up. It sounds like it might be something
that only shows up on jdk8.

Since you mentioned C and C++, you might also be interested in our C
client. It's somewhat alpha right now but we hope more folks will try it
out soon.

Cheers,
Colin
On Aug 30, 2015 3:21 AM, Arun Khetarpal akhet...@gmail.com wrote:

Super! Sounds like a plan.

The first thing i tried to do was actually get the source code and tried
compiling.
-- I wasn't able to find any section on build tools mentioned anywhere.

-- On my box - mac + maven3 + java1.8, the code did not compile because
of javadoc comments.
*Example error: *
[ERROR]

apache/htrace/incubator-htrace/htrace-core/src/main/java/org/apache/htrace/core/Tracer.java:264:
warning: no @return

I was able to fix them by disabling doc lint checks, but we can fix
them as well. Any thoughts?

Regards,
Arun

On 30 August 2015 at 00:16, Stack st...@duboce.net wrote:

Welcome Arun.

Glad of the offer of help; especially from someone with experience
tracing
already (Brave is cool). If starting out, best thing you could do (IMO)
would be to try and use the new 4.0 APIs to tell a trace story in an
application that you are familiar with. File issues against our doc where
it is unclear and ditto on difficulty using new APIs and getting a trace
rig running (We've not pushed the website in a while so look at doc in
the
repo for the moment -- it was updated recently).

Thank you,

St.Ack

On Sat, Aug 29, 2015 at 9:32 AM, Arun Khetarpal akhet...@gmail.com
wrote:

Hi Team,

My name is Arun. I found this project to be very appealing and would
like
to contribute to it. I have some experience with Java and with C and
C++.

I had faced a similar challenge of tracing in my organisation where I
ended
up integrating it with Brave (https://github.com/openzipkin/brave).

I was wondering on how to participate in this project. Any pointers
would
be highly appreciated.

Thanks,
Arun

Enabling the release-notes field on JIRA?

2015-08-24 Thread Colin P. McCabe

Sorry if this is a dumb question, but does anyone know how to enable
the release-notes field on JIRA?  It seems like we don't have it on
our issues.

If you hit edit on a Hadoop JIRA issue, you see multiple fields--
including a text box for release-notes.  It seems to be missing from
the HTRACE JIRA.  Anyone know how to enable this?  I took a glance at
the admin console but I don't see anything there.

best,
Colin

Re: Tracing a chain of iterators -- htrace 2.04

2015-08-19 Thread Colin P. McCabe

Hi Andrew,

Thanks for posting!

Is the concern just that there would be too many spans if each call to
next() created a span?  Does sampling help address this concern?

You said, what we'd really want, is to aggregate the time spent in
each call to next for each iterator, and then send the spans at the
end.  But HTrace already does this, right?  Most span receivers will
batch up the spans they receive and send them all in one big batch,
probably at the end.  What am I missing?

cheers,
Colin


On Tue, Aug 18, 2015 at 3:03 PM, Andrew Mains
andrew.ma...@kontagent.com wrote:
 Hi all,

 This is really more of a user question than a dev question, but I'm
 posting here since I was unable to find a user list for the project; hope
 that's alright.

 I was hoping to get some input on the best way to trace execution through a
 chain of iterators. Specifically, we have a database-like application which
 pipes data through multiple iterators, performing some transformation at
 each step. We'd like insight into how long each step is taking in total for
 a particular request. That is, for a chain of iterators iter_1... iter_i, we
 want the total time spent in each iter_i for that request.

 The naive implementation would be to start a span in each call to next, but
 that's far too fine grained, given that we'd be starting a new span for each
 row. What we'd really want, is to aggregate the time spent in each call to
 next for each iterator, and then send the spans at the end. This would
 require implementing a new span subclass, which is a bit tricky to integrate
 at the moment (since it prevents us from using the static helpers in Trace).

 Any thoughts on the best way to approach this issue? Is there something I'm
 missing, or some way that we can reframe the problem such that it makes
 sense with what's currently in htrace?

 Let me know if there's anything that's unclear, or any further info I can
 provide about our use case.

 Thanks for the help!

 Andrew

Re: HTRACE-215 Simplify the Sampler type - discussion

2015-07-27 Thread Colin P. McCabe

Hi Daniel,

The problem with the T in SamplerT is that it's
application-specific.  The code for each application needs to be
modified specifically to make use of a different T.  Ideally, Samplers
should be pluggable, so that you can use any sampler with any HTraced
code.  For example, I might run a test application with sampling set
to always but in production, I would run with a probability sampler
with some specific sampling rate.  But you can't do that when your
sampler depends on being passed some application-specific data.
You're stuck with only samplers that can work with that specific T.

Consider a specific example: tracing Hadoop.  I'd like to be able to
turn on tracing in Hadoop just by changing a config key.  But if I'm
using a SamplerT with a non-trivial T, I can't do that.  I have to
tell the customer, first apply this patch to your Hadoop code to add
the Ts, do a full build, and then put it into live production...  The
customer won't even follow me to step #1, let alone deploying the
patched code in production.  It totally wrecks the usefulness of
HTrace if you need to rebuild your code to use it.

Another thing to think about is that we'd like to reduce the
boilerplate code needed to add HTrace to an application.  Ideally
the system would create the samplers you need from your
HTraceConfiguration, rather than requiring the application to create
and manage them manually.  Of course, applications should be able to
programmatically add and remove Samplers as well, but only if they
have a specific need to do that.

I think that tracing different events with different probabilities is
a nice feature.  There is a way to do that through the new API that I
think is cleaner.  You would create multiple Tracer objects (Tracer
will no longer be a singleton).  Each tracer would be configured with
ProbabilitySampler, but they would have a different sampling rate set.
For the Foo code, you would call fooTracer.newTopLevelSpan(...), for
the Bar code, you would call barTracer.newTopLevelSpan(...), and so
forth.  In the new API, spans are always created from a specific
Tracer and use the Samplers associated with that Tracer.

This is similar to having different Log objects in log4j.  Perhaps you
think the Foo system is not that interesting most of the time, so its
log level defaults to WARN.  But if you think you're having a problem
in the Foo system, you can set its log level to TRACE and then you see
all the log messages that the Foo system has.  Same thing here, except
that instead of Log objects, we have Tracer objects.  Instead of log
messages, we have trace spans.  But we still have a lot of flexibility
at runtime as a result of this.  And we don't need to recompile to
trace.

regards,
Colin

On Mon, Jul 27, 2015 at 11:33 AM, Daniel Lee dan...@slice.com wrote:
 RE: https://issues.apache.org/jira/browse/HTRACE-215

 I was previously making use of this feature. I was using it to trace
 different types of inputs with different probabilities. It looks like
 now I'll either have move all tracing logic completely outside of
 htrace related classes and only use Always and Never sampler which
 seems weird? Why even bother with providing ProbabilitySampler when
 (rand.nextDouble()  X ? AlwaysSampler.INSTANCE :
 NeverSampler.INSTANCE) is available.

 Daniel

Re: HTRACE-215 Simplify the Sampler type - discussion

2015-07-27 Thread Colin P. McCabe

On Mon, Jul 27, 2015 at 2:32 PM, Daniel Lee dan...@slice.com wrote:
 Hi Colin,

 I'm not sure how Hadoop tracing is setup but I also enable tracing via
 a config setting.

 I'm not sure I agree creating multiple new Tracer objects each with
 their own Probability samplers is an acceptable solution from a
 usability standpoint. Consider an application that receives messages
 from clients and wants to trace different message and client types
 with different probabilities. Now, for every tuple of (message,
 client) type there has to be a new Tracer and Sampler created so this
 gets ugly quickly. It also sounds like having multiple tracers could
 get confusing quickly under this scenario. I'm just going to wrap
 everything in a custom class that includes the logic I used to have in
 the Sampler.

Hi Daniel,

Tracer objects are not that big, and you can create a lot of them if
you like.  Again, it's very similar to log4j Log objects.  If you
want to have one for each message type, or even two for each message
type, it's not more than a kilobyte or two of memory.

Another way to pass information to Sampler objects is by using
thread-local data.  You can just set the thread-local data before
calling Tracer#newSpan... if your custom sampler is called, it will
access this thread local data, which can be anything you want.

Maybe at some point we can have a span importance field which could
be supplied by the programmer to the Tracer#newSpan function and then
used by Samplers.  That would allow anyone to write a sampler which
took advantage of this, rather than coupling it tightly to one sampler
implementation.  But I'd rather put that off until after 4.0, since we
have so many other things to do.

Out of curiosity, what projects are you using HTrace for?  Is it
something you can share?

best,
Colin


 Thanks,
 Daniel

 On Mon, Jul 27, 2015 at 12:00 PM, Colin P. McCabe cmcc...@apache.org wrote:
 Hi Daniel,

 The problem with the T in SamplerT is that it's
 application-specific.  The code for each application needs to be
 modified specifically to make use of a different T.  Ideally, Samplers
 should be pluggable, so that you can use any sampler with any HTraced
 code.  For example, I might run a test application with sampling set
 to always but in production, I would run with a probability sampler
 with some specific sampling rate.  But you can't do that when your
 sampler depends on being passed some application-specific data.
 You're stuck with only samplers that can work with that specific T.

 Consider a specific example: tracing Hadoop.  I'd like to be able to
 turn on tracing in Hadoop just by changing a config key.  But if I'm
 using a SamplerT with a non-trivial T, I can't do that.  I have to
 tell the customer, first apply this patch to your Hadoop code to add
 the Ts, do a full build, and then put it into live production...  The
 customer won't even follow me to step #1, let alone deploying the
 patched code in production.  It totally wrecks the usefulness of
 HTrace if you need to rebuild your code to use it.

 Another thing to think about is that we'd like to reduce the
 boilerplate code needed to add HTrace to an application.  Ideally
 the system would create the samplers you need from your
 HTraceConfiguration, rather than requiring the application to create
 and manage them manually.  Of course, applications should be able to
 programmatically add and remove Samplers as well, but only if they
 have a specific need to do that.

 I think that tracing different events with different probabilities is
 a nice feature.  There is a way to do that through the new API that I
 think is cleaner.  You would create multiple Tracer objects (Tracer
 will no longer be a singleton).  Each tracer would be configured with
 ProbabilitySampler, but they would have a different sampling rate set.
 For the Foo code, you would call fooTracer.newTopLevelSpan(...), for
 the Bar code, you would call barTracer.newTopLevelSpan(...), and so
 forth.  In the new API, spans are always created from a specific
 Tracer and use the Samplers associated with that Tracer.

 This is similar to having different Log objects in log4j.  Perhaps you
 think the Foo system is not that interesting most of the time, so its
 log level defaults to WARN.  But if you think you're having a problem
 in the Foo system, you can set its log level to TRACE and then you see
 all the log messages that the Foo system has.  Same thing here, except
 that instead of Log objects, we have Tracer objects.  Instead of log
 messages, we have trace spans.  But we still have a lot of flexibility
 at runtime as a result of this.  And we don't need to recompile to
 trace.

 regards,
 Colin

 On Mon, Jul 27, 2015 at 11:33 AM, Daniel Lee dan...@slice.com wrote:
 RE: https://issues.apache.org/jira/browse/HTRACE-215

 I was previously making use of this feature. I was using it to trace
 different types of inputs with different probabilities. It looks like
 now I'll either have

Re: Accumulo SpanReceiver and related questions

2015-06-08 Thread Colin P. McCabe

On Thu, Jun 4, 2015 at 12:04 PM, Young, Philip ptyo...@tycho.ncsc.mil wrote:
Hi All,

I have only just started using Apache Htrace (so bear with me) in one of my
projects and I would like to send all the traces that are created to an
Apache Accumulo to store, similar to how the HBaseSpanReceiver currently
does it. I have developed an initial version of an AccumuloSpanReceiver and
as such, I have a few questions:

1. How do I contribute my new AccumuloSpanReceiver back into the HTrace
codebase? Is this by
creating a Jira ticket and attaching a patch to it or would you
rather a pull request in Github?

The Accumulo project is maintaining its own SpanReceiver in the
Accumulo code base. Take a look at:
https://github.com/apache/accumulo/blob/3a99300a2b897f9f8740a3efa0271c43536ef0cc/core/src/main/java/org/apache/accumulo/core/trace/DistributedTrace.java

So if storing spans in Accumulo is a requirement for you, you could
check out what they've done and contribute there.

You could also check out htraced, which stores spans directly in
leveldb. See htrace-htraced.

2. I see that there is currently a couple of different GUI's for the
display of traces, eg. one in
htrace-hbase and another in htrace-htraced. Is there going to be
some consolidation for the GUIs
for the visualisation of traces with plugins for connecting to
different data stores, eg. HBase,
Accumulo, htraced, etc. or is it up to each sub-module to provide
their own GUI implementation?

Yes, I think some consolidation on GUIs would be nice. One thing
we've talked about in the past is having htraced interface with these
other data stores.

In the interim, do you think that it would be a good approach for
me to adapt the HBase GUI to
also be able to display traces that are stored in Accumulo? If I
was to do this, I think that pulling the
webapp out of the htrace-hbase module into something like
htrace-webapp would be better along with
all the protobuf code into there also?

I think the best approach would be to interface htraced with accumulo
so that you could re-use all the work we've been doing on the htraced
GUI. Similarly with HBase, Cassandra, and other data stores we might
like to support.

best,
Colin

If I could get some advice before I get to far into this would be fantastic!

Cheers
Phil Young

Re: HTrace development environment CI

2015-05-13 Thread Colin P. McCabe

Hi Vladimir,

Welcome. Great to see you're interested, and HTRACE-170 is a nice improvement!

I agree that it is a little tough to kick the tires on the project
right now, just because you have to build all the various downstream
projects against HTrace. And you have the usual build project X
against project Y problems. The situation is improving over time as
we get more downstream projects (such as Hadoop) now supporting the
Apache version of HTrace. You should be able to build Hadoop 2.7 or
HBase 1.0.0 against your version of htrace just by doing a mvn
install of your htrace, and then modifying the Hadoop pom.xml
slightly to build against your new version.

We've been talking about setting up Docker images for a while. Maybe
I'm getting this wrong, but Vagrant seems kind of similar... except
that it uses VM images instead of Linux containers. Vagrant also
seems to be focused on setting up installation scripts, similar to how
people use Chef or Puppet, whereas I believe Docker mostly leaves that
up to you. Does that make any sense, or did I misunderstand?

I'm curious if these tools could make it easier for people to test out
a Hadoop+HTrace setup, or create a development environment. That
would be really cool. We also need some kind of environment to run
Jenkins jobs in, and all the people who know about such things tell me
it ought to be a Docker image or VM, for better control over our
environment.

cheers,
Colin

On Wed, May 13, 2015 at 1:25 PM, Vladimir Sitnikov
sitnikov.vladi...@gmail.com wrote:
Hi,

I somehow noticed HTrace, and I might start using it in the near future.
My main areas of interest are performance concurrency.

While I lurk around the code I am a bit puzzled on how you create
development environment.

I saw this thread:
http://mail-archives.apache.org/mod_mbox/htrace-dev/201503.mbox/%3CCA+qbEUOMgw1OZP=achvf5y8ha1wy75kz_isprsvmdfhudve...@mail.gmail.com%3E

It looks like there is quite a few steps to complete in order to
launch HTrace+HDFS (or whatever else is in trend).

For Apache Calcite (a framework to translate SQL queries to different
storage engines) I created a Vagrant virtual machine that provisions
third party tools: https://github.com/vlsi/calcite-test-dataset

Although it looks like a maven project, it allows you to provision a
test machine with all the stuff installed.
From user perspective, you run mvn install and it provisions a VM for you.

Do you think it makes sense implementing similar test VM for HTrace
integration testing?
While I can help with Vagrant stuff, I am not sure where to start from
HTrace point of view. I have very brief understanding of
connectors/etc.

For Calcite we host test VM in another repository to avoid main
repository bloat.

--
Regards,
Vladimir Sitnikov

Re: [VOTE] HTrace 3.2.0 - Release Candidate 2

2015-05-12 Thread Colin P. McCabe

Thanks, Abe!

It's a good thing we skipped from 0 to 2... saved a lot of time.

:)
C.

On Sat, May 9, 2015 at 11:45 PM, Abraham Elmahrek a...@cloudera.com wrote:
 Heads up folks. I've started the vote in gene...@incubator.apache.org.

 On Fri, May 8, 2015 at 11:29 AM, Abraham Elmahrek a...@cloudera.com wrote:

 Closing the vote:

- +1s - 6
- 0s - 0
- -1s - 0

 Thanks every one for helping out and verifying this release.

 -Abe

 On Thu, May 7, 2015 at 5:57 AM, Jake Farrell jfarr...@apache.org wrote:

 agree, there are no binary files, so the extra artifacts are not a release
 blocker, would remove for the next release.

 +1 for this rc from me

 -Jake

 On Thu, May 7, 2015 at 1:33 AM, Lewis John Mcgibbney 
 lewis.mcgibb...@gmail.com wrote:

  I would say no, don't roll a new RC.
  If there is a way to ensure that generated files have ALv2.0 headers
 moving
  forward and committed to trunk then that would be my advice.
  Good job with RC.
  +1 from me
 
  On Wednesday, May 6, 2015, Abraham Elmahrek a...@cloudera.com wrote:
 
   Lewis,
  
   These are third party packages and a generated file. The JS
 dependencies
   are listed in LICENSE.txt. I don't see licenses for
   dependency-reduced-pom.xml in general... but I think it might be
  generated
   by the maven shading plugin. It looks like some of these generated
 files
   made it into the source tarball. Do you guys think it's worth
 spinning a
   new RC for this?
  
   -Abe
  
   On Wed, May 6, 2015 at 6:05 PM, Lewis John Mcgibbney 
   lewis.mcgibb...@gmail.com javascript:; wrote:
  
Hi Folks,
I ran DRAT over the codebase
   
  Notes Binaries Archives Standards Apache Generated Unknown  0 0 0
 120
   83
0
28
   
28 unknown licenses flagged up
   
Upon further investigation these were
   
Unapproved licenses:
   
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backbone-1.1.2.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backbone.marionette-2.4.1.min.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backbone.paginator-2.0.2.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backgrid-0.3.5.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backgrid-paginator-0.3.5.js
   
  /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/bootstrap.js
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/bootstrap.min.js
   
  /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/d3-3.5.5.js
  /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/d3.min.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/jquery-2.1.3.min.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/moment-2.9.0.min.js
  /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/npm.js
  /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.js
   
  /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.min.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.standalone.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.standalone.min.js
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/underscore-1.7.0.js
   
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/backgrid-0.3.5.min.css
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/backgrid-paginator-0.3.5.min.css
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap-theme.css
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap-theme.min.css
   
   /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap.css
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap.min.css
  /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/rome.css
   
  /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/rome.min.css
   
   
  
 /usr/local/drat/deploy/data/jobs/rat/1430960456828/input/SpanProtos.java
   
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960457162/input/dependency-reduced-pom.xml
   
   
  
 
 /usr/local/drat/deploy/data/jobs/rat/1430960457162/input/dependency-reduced-pom.xml_05062015_1800
   
If we can clarify the above then I am +1 to the release.
   
There's nothing more I can add over and above what has been stated
 by
others.
SIGS check
Builds and Tests in native check
Nice release candidate.
Thanks
Lewis
   
   
On Thu, Apr 30, 2015 at 9:16 PM, Abraham Elmahrek a...@apache.org
   javascript:; wrote:
   
 I've the second release candidate here:

   *http://people.apache.org/~abe/htrace/releases/3.2.0/rc1/
 http://people.apache.org/~abe/htrace/releases/3.2.0/rc1/*

 The jars have been staged here:

   *
   
  https://repository.apache.org/content/repositories/orgapachehtrace-1016

Moving to JDK7?

2015-05-12 Thread Colin P. McCabe

Hi all,

What do y'all think about moving our minJdk version to JDK7?

It is set at JDK6 right now mostly because that was where Hadoop was
stuck for a long time.  But now Hadoop is on JDK7, and so is HBase.
Is there any reason to keep supporting JDK6 in our next release? There
are some nice things in JDK7 like ThreadLocalRandom,
try-with-resources, etc. and I would hate to ugly up the code for no
reason.

best,
Colin

Re: Moving to JDK7?

2015-05-12 Thread Colin P. McCabe

Thanks, all.  Filed https://issues.apache.org/jira/browse/HTRACE-172 for this.

Colin

On Tue, May 12, 2015 at 12:50 PM, Abraham Elmahrek a...@cloudera.com wrote:
 +1 as well. Plenty of performance improvements in JDK7.

 On Tue, May 12, 2015 at 12:23 PM, Jake Farrell jfarr...@apache.org wrote:

 +1, would be nice to jump to 8 since 7 is now eol, but deprecating 6
 support is a good start

 -Jake

 On Tue, May 12, 2015 at 3:11 PM, Colin P. McCabe cmcc...@apache.org
 wrote:

  Hi all,
 
  What do y'all think about moving our minJdk version to JDK7?
 
  It is set at JDK6 right now mostly because that was where Hadoop was
  stuck for a long time.  But now Hadoop is on JDK7, and so is HBase.
  Is there any reason to keep supporting JDK6 in our next release? There
  are some nice things in JDK7 like ThreadLocalRandom,
  try-with-resources, etc. and I would hate to ugly up the code for no
  reason.
 
  best,
  Colin

Re: [DISCUSS] Github integration

2015-04-26 Thread Colin P. McCabe

When I was first getting started with open source development, the
de-facto standard hosting place was SourceForge.  It was never
perfect, but it provided basic web hosting, source control (in those
days, everything was svn or cvs), wiki, bug tracker, etc.

SourceForge had a lot of momentum in those days.  Somehow, though,
over the course of 10 or 15 years, things just went wrong for the
site.  It started requiring more and more clicks to download anything,
and more and more ads started popping up.  Now it's pushing malware
installers, at least according to this article on gluster's blog.
http://blog.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/

Other competitors popped to try to take on the mantle of best open
source hosting site.  Google Code (now shutting down), BitBucket, now
Github.  I've used 'em all.  I have repos hosted on SourceForge,
BitBucket, Github, and Google Code.  (Which reminds me, I have to
migrate the projects I still have on Google Code soon...)

What I found is that putting stuff up on github doesn't guarantee pull
requests, or really any kind of community engagement.  I think I've
gotten a grand total of 3 pull requests for my github repos over the
last 5 years.  And the repos that I got pull requests for were ones
that I gave talks on at conferences, or talked to co-workers about.
HTrace's own experience was similar... we had very few contributors
back in the github days.  In my experience, a quiet fade into
obscurity is the fate of most github projects.  Getting people
interested and contributing to projects requires stepping away from
the computer and interacting with them on a personal level, not
whiz-bang software.

I think the most important thing that github provides projects with is
a standard landing page, a standard way to contact the author, and a
standard way to send in patches.  If you are just perusing someone's
private website, it may not be obvious how to contact them or what
format to send the patches in.  Github alleviates that problem.  For
HTrace, we already have our own landing page, our own mailing list and
bug tracker, and our own conventions about how to send in patches.
Github adds a lot less value for us.

The idea that projects get popular because of where they host their
code or what software they use to do reviews seems really questionable
to me.  Some of the most popular projects, like the Linux kernel, have
really esoteric review systems (i.e. sending in carefully formatted
patches to a mailing list).  Hadoop is another example... we have a
lot of antiquated practices like CHANGES.txt, but still get a lot of
contributors.

JIRA adds a lot of value for us because it lets us search discussions
that go back potentially years.  On Hadoop, I often find myself
referring to old discussions to see why something was done one way or
another.  I also use it to see what is in one release or another
release (it can search by version).  It can do searches across
multiple projects (In JQL terms, project = hadoop or project = htrace
and ...)  JIRA is also a place where people can comment even if they
are not developers.  If they are users who just want to see something
fixed, they can comment on the jira asking what is going on with the
issue.  Or they can open a JIRA to suggest a feature, link one JIRA to
another, etc.  Yes, JIRA comments are mirrored to email, but the
mirroring is a pretty lossy process.  You can't search emails by
version, or across projects, or in any structured way.

If we can have github comments mirrored to JIRA as you describe, maybe
it's not so bad.  But it's not so good either.  Looking at that JIRA
conversation, I find it a bit harder to follow than a standard one.
The software seems to be quoting huge amounts of code context, making
it a little bit tougher to follow.  A human would have only quoted the
lines he was interested in.  And I wonder whether, if I made a comment
on the JIRA, the person would see it, or whether he'd only be
following the github.  In other words, is the mirroring two-way?

I don't know.  I have to think about this more.  If there is really a
huge demand by people outside the project to use github, and we have
JIRA integration, then maybe it could work.  So far all of the demand
seems to be from people who are already contributors.  This makes me
think that something like Crucible would actually be a better fit,
since it fits into our existing workflow rather than grafting a
3rd-party website to it.

best,
Colin


On Sun, Apr 26, 2015 at 12:36 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 For reference, here's a ticket [0] from Phoenix which makes used of the
 Github PR integration. As you can see, the PR comments are posted on the
 JIRA. In this regard, it's actually easier to track patch comments than in
 RB, by simply looking at the JIRA comments.

 [0] https://issues.apache.org/jira/browse/PHOENIX-628

 On Fri, Apr 24, 2015 at 11:38 AM, Roman Shaposhnik ro...@shaposhnik.org
 wrote:

 On Tue,

releasing native build artifacts

2015-04-23 Thread Colin P. McCabe

Hi all,

Based on the responses in the earlier thread, it sounds like it's
getting to be about that time again... time for a new HTrace release
:)

I've been thinking about what we should do with our native build
artifacts.  With Java, we only ever have to build one version of
everything, since the jar files can run on any architecture and
supported OS.  But we now have a C client (which is only at an alpha
level of stability, but still...) and the htraced server, which is
also not Java.

Do we want to do builds of this stuff for the common OS/arch
configurations?  If we did a RHEL6 / x86_64 build and an Ubuntu 12.04
/ x86_64 build, that would cover most of our users, I think.  We could
also do 32-bit builds, but I don't think anyone is actually using
32-bit any more in big data land (and if they are, they need to stop
:)

This also means we would have 3 downloads available in the next release:
* Just jar files + source
* Jar files + source + RHEL6/x86_64 libhtrace.so + htraced
* Jar files + source + Ubuntu 12.04/x86_64 libhtrace.so + htraced

thoughts?

Colin

Re: [DISCUSS] Github integration

2015-04-21 Thread Colin P. McCabe

Hi guys,

I have an objection.  In the past, I've found it frustrating to search
through github pull requests.  There is no interface (like there is on
JIRA) to search using any kind of structured query language, and we
don't have the tools to track things by release, contributor, etc.

If we start having some of our patch discussions on github, JIRA will
become a lot less useful.  We might run into a situation like on Spark
where people open multiple pull requests for the same thing, not
knowing about each other.  Or people have a discussion on JIRA, not
aware that a parallel discussion is going on on github.

I think we should take more time to think this through.  1 hour is not
enough time to decide to switch away from JIRA :)

best,
Colin

On Tue, Apr 21, 2015 at 10:21 AM, Jake Farrell jfarr...@apache.org wrote:
 Once Github picks up the mirror i'll enable the remaining integrations
 steps so we will start getting notices on our dev@ list and can close our
 pull requests through commits. Here are some docs I did for Thrift that
 would be good to adopt or change for how people contribute or commit to
 HTrace.

 http://thrift.apache.org/docs/HowToContribute
 http://thrift.apache.org/docs/committers/HowToCommit

 If you have any other questions let me know

 -Jake





 On Tue, Apr 21, 2015 at 12:57 PM, Nick Dimiduk ndimi...@apache.org wrote:

 That was fast, thanks Jake :)

 What else do we need to to to get the fancy PR integration i've seen in
 other projects? I see there's a specific task type for that on INFRA Jira.
 Is there a doc for Apacheer-but-not-githubbers on what the workflow looks
 like? Or is it just read the Github docs on PR's?

 Thanks,
 Nick

 On Tue, Apr 21, 2015 at 9:54 AM, Jake Farrell jfarr...@apache.org wrote:

 Hey Eliott
 Great idea, I have setup the git.a.o mirror for us and will enable the
 Github integrations as soon as Github picks up the repo from git.a.o
 (usually within 24 hours)

 -Jake

 On Tue, Apr 21, 2015 at 12:46 PM, Elliott Clark ecl...@apache.org
 wrote:

  That would be great.
 
  On Tue, Apr 21, 2015 at 9:34 AM, Nick Dimiduk ndimi...@apache.org
 wrote:
 
   Do we have any kind of github integration setup? I can't even find a
  mirror
   of HTrace on the apache account. I think we'll make it easier for
 folks
  to
   contribute if they can send PR's.
  
   I'd like to open an INFRA ticket to get us setup with this
 integration.
  Are
   there any objections?
  
   Thanks,
   Nick

Re: [DISCUSS] Github integration

2015-04-21 Thread Colin P. McCabe

 out, its hard to track
 everything in split systems

 -Jake

 On Tue, Apr 21, 2015 at 1:53 PM, Colin P. McCabe cmcc...@apache.org
 wrote:

 Hi guys,

 I have an objection.  In the past, I've found it frustrating to search
 through github pull requests.  There is no interface (like there is on
 JIRA) to search using any kind of structured query language, and we
 don't have the tools to track things by release, contributor, etc.

 If we start having some of our patch discussions on github, JIRA will
 become a lot less useful.  We might run into a situation like on Spark
 where people open multiple pull requests for the same thing, not
 knowing about each other.  Or people have a discussion on JIRA, not
 aware that a parallel discussion is going on on github.

 I think we should take more time to think this through.  1 hour is not
 enough time to decide to switch away from JIRA :)

 best,
 Colin

 On Tue, Apr 21, 2015 at 10:21 AM, Jake Farrell jfarr...@apache.org
 wrote:
  Once Github picks up the mirror i'll enable the remaining
  integrations
  steps so we will start getting notices on our dev@ list and can close
  our
  pull requests through commits. Here are some docs I did for Thrift
  that
  would be good to adopt or change for how people contribute or commit
  to
  HTrace.
 
  http://thrift.apache.org/docs/HowToContribute
  http://thrift.apache.org/docs/committers/HowToCommit
 
  If you have any other questions let me know
 
  -Jake
 
 
 
 
 
  On Tue, Apr 21, 2015 at 12:57 PM, Nick Dimiduk ndimi...@apache.org
  wrote:
 
  That was fast, thanks Jake :)
 
  What else do we need to to to get the fancy PR integration i've seen
  in
  other projects? I see there's a specific task type for that on INFRA
  Jira.
  Is there a doc for Apacheer-but-not-githubbers on what the workflow
  looks
  like? Or is it just read the Github docs on PR's?
 
  Thanks,
  Nick
 
  On Tue, Apr 21, 2015 at 9:54 AM, Jake Farrell jfarr...@apache.org
  wrote:
 
  Hey Eliott
  Great idea, I have setup the git.a.o mirror for us and will enable
  the
  Github integrations as soon as Github picks up the repo from
  git.a.o
  (usually within 24 hours)
 
  -Jake
 
  On Tue, Apr 21, 2015 at 12:46 PM, Elliott Clark ecl...@apache.org
  wrote:
 
   That would be great.
  
   On Tue, Apr 21, 2015 at 9:34 AM, Nick Dimiduk
   ndimi...@apache.org
  wrote:
  
Do we have any kind of github integration setup? I can't even
find a
   mirror
of HTrace on the apache account. I think we'll make it easier
for
  folks
   to
contribute if they can send PR's.
   
I'd like to open an INFRA ticket to get us setup with this
  integration.
   Are
there any objections?
   
Thanks,
Nick

Re: [DISCUSS] Github integration

2015-04-21 Thread Colin P. McCabe

The argument keeps getting made that we have to be on github to make
it easy for outsiders to contribute but I don't see any evidence to
back that up.  Quite the contrary, during the time HTrace was a github
project, the number of contributions and contributors were much
smaller than now.

Objectively, the JIRA workflow is not difficult to learn.  The number
of new and recent contributors that Hadoop has is a testament to that.
And many other very successful projects use the same model.  I would
argue that to the average developer, attaching a text file to a JIRA
is easier to understand than creating a branch and a pull request in
github.  It's certainly easier for a first-timer than the upload
process of reviewboard or gerrit.

I think if we are being honest with ourselves, the only valid reason
to switch away from patch attachments on JIRA is the convenience of
developers.  Elliot has said that he doesn't like having to click on
attach patch.  Some things that haven't been brought up, but which
ought to be, are that reviews in JIRA require some cut-n-paste, and
that you need to install a Google Chrome extension to see side-by-side
diffs.

My opinion is that while these things are kind of annoying, they're
really not that bad.  Having to explain what the difference is in my
latest patch versus the previous one takes much more time and mental
effort than clicking on attach patch.  There are even scripts out
there to automatically attach patches.  Copying a few lines to the
clipboard to suggest changes during a review isn't bad... in some ways
I prefer it to clicking all those expand discussion arrows in other
code review tools.

Colin

On Tue, Apr 21, 2015 at 6:18 PM, Nick Dimiduk ndimi...@apache.org wrote:
 There's a joke here about N devs in a room and N opinions that are all right
 (and all wrong)!

 All I'm asking for here is to make it easy for outsiders to contribute.
 Having HTrace show up in the mirror is a big step. The next logical thing is
 folks will click the fork button. We should be ready to receive the incoming
 help; the details of that implementation are less important to me.

 Whatever our individual opinions, GH is a defacto place for developers these
 days -- their tools are extremely well socialized. It's a shame to cut
 ourselves off from users of that community. I happen to share Colin's
 opinions about the inferiority of GH's interface for historical comments (I
 personally like gerrit the best of the tools I've used), but that doesn't
 mean we should shun it. (I also generally loath JIRA, on par with Elliott's
 thoughts).

 I think the Apache infra allows comments on PRs that are tied to a JIRA to
 land in the comments on the associated JIRA. Is that right Jake? It doesn't
 prevent the patch from disappearing from github, but at least the trail of
 discussion is preserved and the single page scroll down consumption is
 still possible. I think we as a project can make it a policy that a patch
 must be attached to the JIRA, not just living in a PR (we'll want that for
 pre-commit build bot support anyway, right?) Use the PR as another means of
 review, not the source of truth on the the change itself. Would that be
 enough for you Colin?

 On the topic of Gerrit, there was a discussion about bringing it about for
 Apache projects. It's been raised and died and raised a number of times.
 Gerrit for reviews and push gating + github style build hook detection would
 be a great setup for me as well. Maybe we should investigate that as a
 separate thread?

 -n

 On Tue, Apr 21, 2015 at 6:07 PM, Elliott Clark ecl...@apache.org wrote:

 For me pull requests show great history for the issue if things don't get
 bounced around too many different creators. Github really struggles when
 there are issues that hang around for a long time, either because they don't
 have patches yet, or because lots of different people are creating candidate
 patches. However for me email copies of everything that's from github
 provide all the search-ability that I would need to just use github.

 However for me Jira is just so disconnected from the code that it's a
 total time sink. I want to create code, look at code, and have my code
 tested.  Every time I have to create a patch and attach it it's a total
 context switch (better than RB but that's not saying much). The integration
 of jira and jenkins just feels like duct-tape and hope when compared to the
 hooks provided by github. So for me jira seems bad at creating patches,
 reviewing patches, and testing patches.

 I've used gerrit before and it's awesome. Just a joy to use once things
 are set up and moving. However I don't think that it will work since it's
 not supported by infra and it needs to be the source of truth for a git
 repo.

 My preferences, in order, would be

 * Gerrit
 * Github only
 * Github with Jira integration
 * Phabricator with jira
 * Review board
 * Jira only

 On Tue, Apr 21, 2015 at 5:19 PM, Colin P. McCabe cmcc...@apache.org

Re: [DISCUSS] Github integration

2015-04-21 Thread Colin P. McCabe

On Tue, Apr 21, 2015 at 7:08 PM, Nick Dimiduk ndimi...@apache.org wrote:
 I would associate the upswing in introductions to increased marketing from
 joining incubator; orthogonal to moving out of github.

 No one has suggested moving away from patches attached to JIRA. As I said,
 patch on JIRA is what we'll eventually need for pre-commit checking anyway.

 I'd like the github mirror to be activated, which Jake has done. I'd also
 like PR's to show up as a mail to the dev list and, if possible, also land
 on the associated JIRA as a comment. I maintain that this will make it
 easier for non-Apache folks who fork-and-PR to get our attention without
 much fuss on either end.

 Does your -1 apply to PRs resulting in a mail on the dev list?

I think the minimum it would need to be usable to me would be some
kind of integration with JIRA, so that I could review the patch there.
I suppose we could set up some kind of system whereby comments made on
github were mirrored to JIRA.  I also don't think we should activate
any of this stuff before we have consensus on all the issues involved.

Colin



 -n


 On Tuesday, April 21, 2015, Colin P. McCabe cmcc...@apache.org wrote:

 The argument keeps getting made that we have to be on github to make
 it easy for outsiders to contribute but I don't see any evidence to
 back that up.  Quite the contrary, during the time HTrace was a github
 project, the number of contributions and contributors were much
 smaller than now.

 Objectively, the JIRA workflow is not difficult to learn.  The number
 of new and recent contributors that Hadoop has is a testament to that.
 And many other very successful projects use the same model.  I would
 argue that to the average developer, attaching a text file to a JIRA
 is easier to understand than creating a branch and a pull request in
 github.  It's certainly easier for a first-timer than the upload
 process of reviewboard or gerrit.

 I think if we are being honest with ourselves, the only valid reason
 to switch away from patch attachments on JIRA is the convenience of
 developers.  Elliot has said that he doesn't like having to click on
 attach patch.  Some things that haven't been brought up, but which
 ought to be, are that reviews in JIRA require some cut-n-paste, and
 that you need to install a Google Chrome extension to see side-by-side
 diffs.

 My opinion is that while these things are kind of annoying, they're
 really not that bad.  Having to explain what the difference is in my
 latest patch versus the previous one takes much more time and mental
 effort than clicking on attach patch.  There are even scripts out
 there to automatically attach patches.  Copying a few lines to the
 clipboard to suggest changes during a review isn't bad... in some ways
 I prefer it to clicking all those expand discussion arrows in other
 code review tools.

 Colin

 On Tue, Apr 21, 2015 at 6:18 PM, Nick Dimiduk ndimi...@apache.org wrote:
  There's a joke here about N devs in a room and N opinions that are all
  right
  (and all wrong)!
 
  All I'm asking for here is to make it easy for outsiders to contribute.
  Having HTrace show up in the mirror is a big step. The next logical
  thing is
  folks will click the fork button. We should be ready to receive the
  incoming
  help; the details of that implementation are less important to me.
 
  Whatever our individual opinions, GH is a defacto place for developers
  these
  days -- their tools are extremely well socialized. It's a shame to cut
  ourselves off from users of that community. I happen to share Colin's
  opinions about the inferiority of GH's interface for historical comments
  (I
  personally like gerrit the best of the tools I've used), but that
  doesn't
  mean we should shun it. (I also generally loath JIRA, on par with
  Elliott's
  thoughts).
 
  I think the Apache infra allows comments on PRs that are tied to a JIRA
  to
  land in the comments on the associated JIRA. Is that right Jake? It
  doesn't
  prevent the patch from disappearing from github, but at least the trail
  of
  discussion is preserved and the single page scroll down consumption is
  still possible. I think we as a project can make it a policy that a
  patch
  must be attached to the JIRA, not just living in a PR (we'll want that
  for
  pre-commit build bot support anyway, right?) Use the PR as another means
  of
  review, not the source of truth on the the change itself. Would that be
  enough for you Colin?
 
  On the topic of Gerrit, there was a discussion about bringing it about
  for
  Apache projects. It's been raised and died and raised a number of times.
  Gerrit for reviews and push gating + github style build hook detection
  would
  be a great setup for me as well. Maybe we should investigate that as a
  separate thread?
 
  -n
 
  On Tue, Apr 21, 2015 at 6:07 PM, Elliott Clark ecl...@apache.org
  wrote:
 
  For me pull requests show great history for the issue if things don't
  get
  bounced around

Re: HTrace GUI meetup

2015-03-27 Thread Colin P. McCabe

Hmm. Sounds like May is unavailable on Tuesday. Does Friday (4/3) at
5PM PDT work for everyone? Masatake, I'd especially like to talk to
you about some ideas for the spans GUI...

Abe, to respond to your points: We will post the minutes to the
mailing list. Also, obviously the actual implementation of any of
these ideas will happen on JIRA through the usual process. I view
this as kind of similar to the meetings we do occasionally on Hadoop
to coordinate a new feature that the community is working on.

Also, on a semi-related note, I'm going to try to check some
real-world span data into the repo to use when looking at the GUI.

cheers,
Colin

On Thu, Mar 26, 2015 at 3:46 PM, Abraham Elmahrek a...@cloudera.com wrote:
Hey Colin,

I'm hugely +1 on this! I'd prefer Tuesday @ 5PM PST.

There has been some discussion about doing this in the Sqoop community as
well, so I thought I'd relay some of the ideas that popped up there (
http://mail-archives.apache.org/mod_mbox/sqoop-dev/201502.mbox/%3CCAHUddLM98%3DM4N4qNGfpAThQ%2BEpRf0war0FN4WM%3D3T6t7Owt4nQ%40mail.gmail.com%3E
):

1. Meeting minutes are persisted (wiki) and communicated (mailing list)
2. No concrete decisions are made (we should run votes first so every
one can participate and make sure full context is provided some how)
3. Proper notice is given to the community and the meeting is globally
available (which it seems we have).

I think HTrace is a different community and can run it however we see fit.
But I hope the above helps at least for reference.

-Abe

On Thu, Mar 26, 2015 at 3:29 PM, Colin P. McCabe cmcc...@apache.org wrote:

Hi all,

There's been a lot of really great work on the HTrace GUI recently.
Masatake's span visualization screen, Abe's work on the details page,
and May's work come to mind.

I was thinking, we should have a phone call to talk about the GUI. I
have some ideas that might be really cool. Abe also came up with some
mockups.

Some time next week would work well for me. Maybe next Tuesday (3/31)
at 5pm, or next Friday (4/3) at 5pm PST? (I realize 5pm PST is late
in California but I'm trying to come up with something that works for
all the time zones.) Does that work for you guys?

I'm thinking we can provide a plain ol' telephone dial-in number and
use Google Hangouts for screen sharing. (We could use Google Hangouts
for voice as well, but in my experience, it's best to use regular
phone for voice to avoid glitches.)

best,
Colin

HTrace GUI meetup

2015-03-26 Thread Colin P. McCabe

Hi all,

There's been a lot of really great work on the HTrace GUI recently.
Masatake's span visualization screen, Abe's work on the details page,
and May's work come to mind.

I was thinking, we should have a phone call to talk about the GUI.  I
have some ideas that might be really cool.  Abe also came up with some
mockups.

Some time next week would work well for me.  Maybe next Tuesday (3/31)
at 5pm, or next Friday (4/3) at 5pm PST?  (I realize 5pm PST is late
in California but I'm trying to come up with something that works for
all the time zones.)  Does that work for you guys?

I'm thinking we can provide a plain ol' telephone dial-in number and
use Google Hangouts for screen sharing.  (We could use Google Hangouts
for voice as well, but in my experience, it's best to use regular
phone for voice to avoid glitches.)

best,
Colin

Re: Getting started with Apache HTrace development

2015-03-05 Thread Colin P. McCabe

Can we set up a wiki? Stuff like this needs to be updated
periodically and it would be nice to have something like the hadoop
wiki. Of course there may be some out of date stuff from time to
time, but it's better than nothing...

On Mon, Mar 2, 2015 at 8:52 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
This is dynamite and I think it would be very helpful to have it linked to
from the website.
Although the install and config doesn't appear too bulky, there are a
number of steps and this would be non trivial for someone who is not
familiarized with Hadoop xml based runtime configuration.
I'm finishing off a patch for Chukwa right now then I will be building
HTtace into my Nutxh 2.x search stack. My aim is to write something similar
for that deployment as R would also be very helpful to see tracing for Gora
data stores as well.

Awesome.

best,
Colin

On Monday, March 2, 2015, Colin P. McCabe cmcc...@apache.org wrote:

A few people have asked how to get started with HTrace development. It's a
good question and we don't have a great README up about it so I thought I
would
write something.

HTrace is all about tracing distributed systems. So the best way to get
started is to plug htrace into your favorite distributed system and see
what
cool things happen or what bugs pop up. Since I'm an HDFS developer,
that's
the distributed system that I'm most familiar with. So I will do a quick
writeup about how to use HTrace + HDFS. (HBase + HTrace is another very
important use-case that I would like to write about later, but one step at
a
time.)

Just a quick note: a lot of this software is relatively new. So there may
be
bugs or integration pain points that you encounter.

There has not yet been a stable release of Hadoop that contained Apache
HTrace.
There have been releases that contained the pre-Apache version of HTrace,
but
that's no fun. If we want to do development, we want to be able to run the
latest version of the code. So we will have to build it ourselves.

Building HTrace is not too bad. First we install the dependencies:

cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel

If you have a different Linux distro this command will vary slightly, of
course. On Macs, brew is a good option.
Next we use Maven to build the source:

cmccabe@keter:~/ git clone
https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
cmccabe@keter:~/ cd incubator-htrace
cmccabe@keter:~/ git checkout master
cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true
-Drat.skip

OK. So htrace is built and installed to the local ~/.m2 directory.

We should see it under the .m2:
cmccabe@keter:~/ find ~/.m2 | grep htrace-core
...

/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT

/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated

/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
...

The version you built should be 3.2.0-SNAPSHOT.

Next, we check out Hadoop:

cmccabe@keter:~/ git clone
https://git-wip-us.apache.org/repos/asf/hadoop.git
cmccabe@keter:~/ cd hadoop
cmccabe@keter:~/ git checkout branch-2

So we are basically building a pre-release version of Hadoop 2.7, currently
known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT
rather
than the stable 3.1.0 release which it would ordinarily use in branch-2. I
applied this diff to hadoop-project/pom.xml

diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
index 569b292..5b7e466 100644
--- a/hadoop-project/pom.xml
+++ b/hadoop-project/pom.xml
@@ -785,7 +785,7 @@
dependency
groupIdorg.apache.htrace/groupId
artifactIdhtrace-core/artifactId
-version3.1.0-incubating/version
+version3.2.0-incubating-SNAPSHOT/version
/dependency
dependency
groupIdorg.jdom/groupId

Next, I built Hadoop:

cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true

You should get a package with Hadoop jars named like so:

...

./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar

./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
...

This package should also contain an htrace-3.2.0-SNAPSHOT jar.

OK, so how can we start seeing some trace spans? The easiest way is to
configure LocalFileSpanReceiver.

Add this to your hdfs-site.xml:

property
namehadoop.htrace.spanreceiver.classes/name
valueorg.apache.htrace.impl.LocalFileSpanReceiver/value
/property
property
namehadoop.htrace.sampler/name
valueAlwaysSampler/value
/property

When you run the Hadoop daemons, you should see them writing to files named
/tmp

Getting started with Apache HTrace development

2015-03-02 Thread Colin P. McCabe

A few people have asked how to get started with HTrace development. It's a
good question and we don't have a great README up about it so I thought I
would
write something.

HTrace is all about tracing distributed systems. So the best way to get
started is to plug htrace into your favorite distributed system and see what
cool things happen or what bugs pop up. Since I'm an HDFS developer, that's
the distributed system that I'm most familiar with. So I will do a quick
writeup about how to use HTrace + HDFS. (HBase + HTrace is another very
important use-case that I would like to write about later, but one step at a
time.)

Just a quick note: a lot of this software is relatively new. So there may
be
bugs or integration pain points that you encounter.

Building HTrace is not too bad. First we install the dependencies:

cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel

If you have a different Linux distro this command will vary slightly, of
course. On Macs, brew is a good option.
Next we use Maven to build the source:

OK. So htrace is built and installed to the local ~/.m2 directory.

We should see it under the .m2:
cmccabe@keter:~/ find ~/.m2 | grep htrace-core
...
/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT

/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated

/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
...

The version you built should be 3.2.0-SNAPSHOT.

Next, we check out Hadoop:

cmccabe@keter:~/ git clone
https://git-wip-us.apache.org/repos/asf/hadoop.git
cmccabe@keter:~/ cd hadoop
cmccabe@keter:~/ git checkout branch-2

Next, I built Hadoop:

cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true

You should get a package with Hadoop jars named like so:

...
./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
...

This package should also contain an htrace-3.2.0-SNAPSHOT jar.

OK, so how can we start seeing some trace spans? The easiest way is to
configure LocalFileSpanReceiver.

Add this to your hdfs-site.xml:

property
namehadoop.htrace.spanreceiver.classes/name
valueorg.apache.htrace.impl.LocalFileSpanReceiver/value
/property
property
namehadoop.htrace.sampler/name
valueAlwaysSampler/value
/property

When you run the Hadoop daemons, you should see them writing to files named
/tmp/${PROCESS_ID} (for each different process). If this doesn't happen,
try
cranking up your log4j level to TRACE to see why the SpanReceiver could not
be
created.

You should see something like this in the log4j logs:

13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of
type org.apache.htrace.impl.LocalFileSpanReceiver
at
org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
at
org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
at
org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
at
org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)

Running htraced is easy. You simply run the binary:

cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced
-Dlog.level=TRACE -Ddata.store.clear

You should see messages like this:

cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced
-Dlog.level=TRACE -Ddata.store.clear
2015-03-02T19:08:33-08:00 D:

Re: HTrace for Nutch 2.x Search Stack

2015-02-27 Thread Colin P. McCabe

Hi Lewis,

Good questions. I would say HTrace differs from TRACE logging (or
other single-node metrics, JMX, audit logs, etc.) in that it pulls
together information from across the cluster. This is something that
is a major pain point when using a distributed system such as HDFS.
Just to diagnose a slow write, you might have to match up logs from a
client log and the logs of 3 different datanodes. The big idea behind
htrace is two things: integrating those logging sources, and using
sampling to instrument performance in production. The main thing
htrace deals with is spans which are lengths of time.

We're working on a web UI that will allow people to search for spans
by time, duration, and name (among other things). It's not quite
finished now (hoping to have something usable in HTrace 3.2.0 or maybe
3.3.0... but abe can comment more on that.)

Here's an early screenshot (probably way out of date now):
https://issues.apache.org/jira/secure/attachment/12689757/Search%20page%20skeleton%20-%200.png

There is also a plan to create a visualization of parent/child
relationships on the web UI, by using the d3 library (which can draw
graphs, and do many other things besides.)

In the meantime, there's an option to product a graphviz file from a
file containing span JSON. That way you can draw a graph of
parent/child relationships with the dot tool, available on Linux.
Uh... unfortunately it's broken right now... let me file a JIRA for
that :P This is a very new feature, got added earlier this week.

The web UI is a great place to get involved right now... there is a
lot of work going on there and we've been adding new contributors.

Colin

On Thu, Feb 26, 2015 at 1:46 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Nick,

Grand. Thank you

What is visualization looking like right now? It there currently a
mechanism for visualizing HTrace structures?
Is it worth considering posting something like this as a GSoC project is
one does not currently exist?
Thanks
Lewis

On Thu, Feb 26, 2015 at 1:31 PM, Nick Dimiduk ndimi...@gmail.com wrote:

Hi Lewis,

Re: [REPORT] HTrace March 2015

2015-02-26 Thread Colin P. McCabe

+1.

Thanks, Lewis.
C.

On Thu, Feb 26, 2015 at 10:58 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Folks,
 Please see below, this has been added to the wiki

 HTraceHTrace is a tracing framework intended for use with distributed
 systemswritten in java.HTrace has been incubating since 2014-11.Three
 most important issues to address in the move towards graduation:  1.
 Continue to grow the HTrace community  2. Continue to develop and
 release stable HTrace incubating artifacts  3. Continue to explore the
 integration of the HTrace framework into other Apache productsAny
 issues that the Incubator PMC (IPMC) or ASF Board wish/need to beaware
 of?NoHow has the community developed since the last report?There has
 been a bunch of mailing list activity relating directlyto issue 3
 above e.g. better integration of HTrace into HBase/HDFS.HTrace is
 being represented at ApacheCon 2015 NA in April with a
 presentationIntroducing Apache HTrace: An End-to-End Tracing
 Framework for Distributed Systems - Colin McCabe, Cloudera -
 http://sched.co/2P8QHow has the project developed since the last
 report?The codebase has seen about 30 odd commits since last
 reporting.Jira continues to see activity which is encouraging as
 HTrace communityprogresses towards next incubating release.Date of
 last release:   2015-20-01 htrace-3.1.0-incubatingWhen were the last
 committers or PMC members elected?Abraham Elmahrek was elected to
 become an HTrace committer onWed, 11 Feb, 2-15.Signed-off-by:   [
 ](htrace) Jake Farrell  [ ](htrace) Todd Lipcon  [X](htrace) Lewis
 John Mcgibbney  [ ](htrace) Andrew Purtell  [ ](htrace) Billie Rinaldi
  [ ](htrace) Michael StackShepherd/Mentor notes:


 Ta
 Lewis


 --
 *Lewis*

Re: Trace HBase/HDFS with HTrace

2015-02-25 Thread Colin P. McCabe

Hmm.  Looking at that error, my guess would be that there is an
incorrect usage of TraceScope#detach going on somewhere in hbase...
perhaps a double detach.  But I could be wrong.  We added some code
recently to catch issues like this.

best,
Colin

On Wed, Feb 25, 2015 at 12:28 AM, Masatake Iwasaki
iwasak...@oss.nttdata.co.jp wrote:
 I tried hbase-1 built against today's htrace-3.2.0-SNAPSHOT (with quick fix
 to TestHTraceHooks).
 I got the error below in regionserver log.
 I will dig this tomorrow.::

   2015-02-25 00:18:29,270 ERROR [RS_OPEN_META-centos7:16201-0]
 htrace.Tracer: Tried to detach trace span null but it has already been
 detached.
   2015-02-25 00:18:29,271 ERROR [RS_OPEN_META-centos7:16201-0]
 handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740,
 starting to roll back the global memstore size.
   java.lang.RuntimeException: Tried to detach trace span null but it has
 already been detached.
   at org.apache.htrace.Tracer.clientError(Tracer.java:61)
   at org.apache.htrace.TraceScope.detach(TraceScope.java:57)
   at
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1559)
   at
 org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeRegionEventMarker(WALUtil.java:94)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.writeRegionOpenMarker(HRegion.java:910)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4911)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4874)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4845)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4801)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4752)
   at
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:356)
   at
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:126)
   at
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)



 On 2/24/15 18:27, Colin P. McCabe wrote:

 Thanks for trying this, Mastake.  I've got HDFS working on my cluster
 with tracing and LocalFileSpanReceiver.  Did you try using HBase +
 HDFS with LocalFileSpanReceiver?  Be sure to use a build including
 HTRACE-112 since LFSR was kind of busted prior to that.

 I'm going to do a longer writeup about getting HDFS + HBase working
 with other span receivers just as soon as I finish stomping a few more
 bugs.

 best,
 Colin

 On Tue, Feb 24, 2015 at 12:04 PM, Masatake Iwasaki
 iwasak...@oss.nttdata.co.jp wrote:

 Hi,

 Thanks for trying this. I am sorry for late reply.

 I tried this today
 by hbase-1.0.1-SANPSHOT built with
 {{-Dhadoop-two.version=2.7.0-SNAPSHOT}}
 in pseudo distributed cluster
 but failed to get end-to-end trace.

 I checked that
 * tracing works for both of hbase and hdfs,
 * hbase runs with 2.7.0-SNAPSHOT jar of hadoop.

 When I did do put with tracing on,
 I saw span named FSHLog.sync with annotations such as
 syncing writer and writer synced.
 The code for tracing in FSHLog worked at least.

 I'm still looking into this.
 If it turned out that tracing spans are not reached to
 actual HDFS writer thread in HBase, I will file a JIRA.

 # We need hadoop-2.6.0 or higher in order to trace HDFS.
 # Building hbase from source with {{-Dhadoop-two.version=2.6.0}}
 # is straight forward way to do this
 # because the binary release of hbase-1.0.0 bundles hadoop-2.5.1 jars.

 Masatake


 On 2/11/15 08:56, Nick Dimiduk wrote:

 Hi Joshua,

 In theory there's nothing special for you to do. Just issue your query
 to
 HBase with tracing enabled. The active span will go through HBase, down
 into HDFS, and back again. You'll need both systems collecting spans
 into
 the same place so that you can report on the complete trace tree.

 I've not recently tested the end-to-end, but I believe it's all there.
 If
 not, it's a bug -- this is an intended use case. Can you give it a try
 and let us know how it goes?

 FYI, 0.99.x are preview releases of HBase and not for production use.
 Just
 so you know :)

 -n

 On Wednesday, February 11, 2015, Chunxu Tang chunxut...@gmail.com
 wrote:

 Hi all,

 Now I’m exploiting HTrace to trace request level data flows in HBase
 and
 HDFS. I have successfully traced HBase and HDFS by using HTrace,
 respectively.

 After that, I combine HBase and HDFS together and I want to just send a
 PUT/GET request to HBase, but to trace the whole data flow in both
 HBase
 and HDFS. In my opinion, when I send a request such as Get to HBase, it
 will at last try to read the blocks on HDFS, so I can

Re: Trace HBase/HDFS with HTrace

2015-02-24 Thread Colin P. McCabe

Thanks for trying this, Mastake.  I've got HDFS working on my cluster
with tracing and LocalFileSpanReceiver.  Did you try using HBase +
HDFS with LocalFileSpanReceiver?  Be sure to use a build including
HTRACE-112 since LFSR was kind of busted prior to that.

I'm going to do a longer writeup about getting HDFS + HBase working
with other span receivers just as soon as I finish stomping a few more
bugs.

best,
Colin

On Tue, Feb 24, 2015 at 12:04 PM, Masatake Iwasaki
iwasak...@oss.nttdata.co.jp wrote:
 Hi,

 Thanks for trying this. I am sorry for late reply.

 I tried this today
 by hbase-1.0.1-SANPSHOT built with {{-Dhadoop-two.version=2.7.0-SNAPSHOT}}
 in pseudo distributed cluster
 but failed to get end-to-end trace.

 I checked that
 * tracing works for both of hbase and hdfs,
 * hbase runs with 2.7.0-SNAPSHOT jar of hadoop.

 When I did do put with tracing on,
 I saw span named FSHLog.sync with annotations such as
 syncing writer and writer synced.
 The code for tracing in FSHLog worked at least.

 I'm still looking into this.
 If it turned out that tracing spans are not reached to
 actual HDFS writer thread in HBase, I will file a JIRA.

 # We need hadoop-2.6.0 or higher in order to trace HDFS.
 # Building hbase from source with {{-Dhadoop-two.version=2.6.0}}
 # is straight forward way to do this
 # because the binary release of hbase-1.0.0 bundles hadoop-2.5.1 jars.

 Masatake


 On 2/11/15 08:56, Nick Dimiduk wrote:

 Hi Joshua,

 In theory there's nothing special for you to do. Just issue your query to
 HBase with tracing enabled. The active span will go through HBase, down
 into HDFS, and back again. You'll need both systems collecting spans into
 the same place so that you can report on the complete trace tree.

 I've not recently tested the end-to-end, but I believe it's all there. If
 not, it's a bug -- this is an intended use case. Can you give it a try
 and let us know how it goes?

 FYI, 0.99.x are preview releases of HBase and not for production use. Just
 so you know :)

 -n

 On Wednesday, February 11, 2015, Chunxu Tang chunxut...@gmail.com wrote:

 Hi all,

 Now I’m exploiting HTrace to trace request level data flows in HBase and
 HDFS. I have successfully traced HBase and HDFS by using HTrace,
 respectively.

 After that, I combine HBase and HDFS together and I want to just send a
 PUT/GET request to HBase, but to trace the whole data flow in both HBase
 and HDFS. In my opinion, when I send a request such as Get to HBase, it
 will at last try to read the blocks on HDFS, so I can construct a whole
 data flow tracing through HBase and HDFS. While, the fact is that I can
 only get tracing data of HBase, with no data of HDFS.

 Could you give me any suggestions on how to trace the data flow in both
 HBase and HDFS? Does anyone have similar experience? Do I need to modify
 the source code? And maybe which part(s) should I touch? If I need to
 modify the code, I will try to create a patch for that.

 Thank you.

 My Configurations:
 Hadoop version: 2.6.0
 HBase version: 0.99.2
 HTrace version: htrace-master
 OS: Ubuntu 12.04


 Joshua

Re: Trace HBase/HDFS with HTrace

2015-02-12 Thread Colin P. McCabe

On Thu, Feb 12, 2015 at 1:23 PM, Chunxu Tang chunxut...@gmail.com wrote:
 Hi all,

 Thanks for your detailed replies!

 Now I have tested end-to-end tracing in two versions of HBase (0.98.10 and
 0.99.2), combined with Hadoop 2.6.0 and htrace-master (3.0.4), and both of
 them failed. For HBase 0.98.10, it actually has htrace 2.0.4 core, so it's
 normal to get no traces. While, HBase 0.99.2 has htrace 3.0.4 core, but I
 still cannot get traces of HDFS, I can only get traces of HBase.

Hadoop 2.6.0 doesn't the correct version of HTrace, so this is all
expected.  You aren't going to be able to do anything useful here as
long as you keep using Hadoop 2.6.0.  I would suggest using
Hadoop-2.7.0-SNAPSHOT with an appropriate version of HBase.

Hope this helps.

best,
Colin



 I think the first thing I need to make sure is that I use a correct method
 to implement end-to-end test. I'm not very sure whether it's good to show
 whole source code on the mailing list, so I just put some core code chunks
 written in the client code here:

 public void run(){
 Configuration conf = HBaseConfiguration.create();
 org.apache.hadoop.hbase.trace.SpanReceiverHost.getInstance(conf);
 org.apache.hadoop.tracing.SpanReceiverHost.getInstance(new
 HdfsConfiguration());

 TraceScope ts = Trace.startSpan(Gets, Sampler.ALWAYS);
 HTable table = new HTable(conf, t1);
 Get get = new Get(Bytes.toBytes(r1));
 table.get(get);
 ...
 }

 Now I can only get traces of HBase, ending with HfileReaderV2.readBlock()
 function. Is my testing method correct? And because I'm not familiar with
 new version of HTrace and HBase/HDFS with new htrace core, could you give
 me some suggestions to detect where the error may take place?

 Thank you all.

 Joshua

 2015-02-11 22:08 GMT-05:00 Colin P. McCabe cmcc...@apache.org:

 No, I think I'm the one who's missing something. :)

 I will give that a try next time I'm testing out end-to-end tracing.

 thanks guys.
 Colin

 On Wed, Feb 11, 2015 at 4:36 PM, Enis Söztutar enis@gmail.com wrote:
  mvn install just installs it in local cache which you can then use for
  building other projects. So no need to have to define a file based local
  repo. Am I missing something?
 
  Enis
 
  On Wed, Feb 11, 2015 at 12:36 PM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
  Oh, I see. I was assuming a local build of Hadoop snapshot installed
 into
  the local cache.
 
  On Wednesday, February 11, 2015, Colin P. McCabe cmcc...@apache.org
  wrote:
 
   On Wed, Feb 11, 2015 at 11:27 AM, Nick Dimiduk ndimi...@gmail.com
   javascript:; wrote:
I don't recall the hadoop release repo restriction being a problem,
  but I
haven't tested it lately. See if you can just specify the release
  version
with -Dhadoop.version or -Dhadoop-two.version.
   
  
   Sorry, it's been a while since I did this... I guess the question is
   whether 2.7.0-SNAPSHOT is available in Maven-land somewhere?  If so,
   then Chunxu should forget all that stuff I said, and just build HBase
   with -Dhadoop.version=2.7.0-SNAPSHOT
  
I would go against branch-1.0 as this will be the eminent 1.0.0
 release
   and
had HTrace 3.1.0-incubating.
  
   Thanks.
  
   Colin
  
  
   
-n
   
On Wed, Feb 11, 2015 at 11:13 AM, Colin P. McCabe 
 cmcc...@apache.org
   javascript:;
wrote:
   
Thanks for trying stuff out!  Sorry that this is a little
 difficult at
the moment.
   
To really do this right, you would want to be using Hadoop with
 HTrace
3.1.0, and HBase with HTrace 3.1.0.  Unfortunately, there hasn't
 been
a new release of Hadoop with HTrace 3.1.0.  The only existing
 releases
of Hadoop use an older version of the HTrace library.  So you will
have to build from source.
   
If you check out Hadoop's branch-2 branch (currently, this branch
represents what will be in the 2.7 release, when it is cut), and
 build
that, you will get the latest.  Then you have to build a version of
HBase against the version of Hadoop you have built.
   
By default, HBase's Maven build will build against upstream release
versions of Hadoop only. So just setting
-Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know
where to find the jars.  To get around this problem, you can create
your own local maven repo. Here's how.
   
In hadoop/pom.xml, add these lines to the distributionManagement
  stanza:
   
+repository
+  idlocaldump/id
+  urlfile:///home/cmccabe/localdump/releases/url
+/repository
+snapshotRepository
+  idlocaldump/id
+  urlfile:///home/cmccabe/localdump/snapshots/url
+/snapshotRepository
   
Comment out the repositories that are already there.
   
Now run mkdir /home/cmccabe/localdump.
   
Then, in your hadoop tree, run mvn deploy -DskipTests.
   
You should get a localdump directory that has files kind of like

Re: Trace HBase/HDFS with HTrace

2015-02-11 Thread Colin P. McCabe

On Wed, Feb 11, 2015 at 11:27 AM, Nick Dimiduk ndimi...@gmail.com wrote:
I don't recall the hadoop release repo restriction being a problem, but I
haven't tested it lately. See if you can just specify the release version
with -Dhadoop.version or -Dhadoop-two.version.

Sorry, it's been a while since I did this... I guess the question is
whether 2.7.0-SNAPSHOT is available in Maven-land somewhere? If so,
then Chunxu should forget all that stuff I said, and just build HBase
with -Dhadoop.version=2.7.0-SNAPSHOT

I would go against branch-1.0 as this will be the eminent 1.0.0 release and
had HTrace 3.1.0-incubating.

Thanks.

Colin

-n

On Wed, Feb 11, 2015 at 11:13 AM, Colin P. McCabe cmcc...@apache.org
wrote:

Thanks for trying stuff out! Sorry that this is a little difficult at
the moment.

To really do this right, you would want to be using Hadoop with HTrace
3.1.0, and HBase with HTrace 3.1.0. Unfortunately, there hasn't been
a new release of Hadoop with HTrace 3.1.0. The only existing releases
of Hadoop use an older version of the HTrace library. So you will
have to build from source.

If you check out Hadoop's branch-2 branch (currently, this branch
represents what will be in the 2.7 release, when it is cut), and build
that, you will get the latest. Then you have to build a version of
HBase against the version of Hadoop you have built.

By default, HBase's Maven build will build against upstream release
versions of Hadoop only. So just setting
-Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know
where to find the jars. To get around this problem, you can create
your own local maven repo. Here's how.

In hadoop/pom.xml, add these lines to the distributionManagement stanza:

+repository
+ idlocaldump/id
+ urlfile:///home/cmccabe/localdump/releases/url
+/repository
+snapshotRepository
+ idlocaldump/id
+ urlfile:///home/cmccabe/localdump/snapshots/url
+/snapshotRepository

Comment out the repositories that are already there.

Now run mkdir /home/cmccabe/localdump.

Then, in your hadoop tree, run mvn deploy -DskipTests.

You should get a localdump directory that has files kind of like this:

...
/home/cmccabe/localdump/snapshots/org/apache/hadoop
/home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce

/home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/maven-metadata.xml.md5

/home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT

/home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml.md5

/home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/hadoop-mapreduce-2.7.0-20121120.230341-1.pom.sha1

/home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml
...

Now, add the following lines to your HBase pom.xml:

repositories
repository
+ idlocaldump/id
+ urlfile:///home/cmccabe/localdump/url
+ nameLocal Dump/name
+ snapshots
+enabledtrue/enabled
+ /snapshots
+ releases
+enabledtrue/enabled
+ /releases
+/repository
+repository

This will allow you to run something like:
mvn test -Dtest=TestMiniClusterLoadSequential -PlocalTests
-DredirectTestOutputToFile=true -Dhadoop.profile=2.0
-Dhadoop.version=2.7.0-SNAPSHOT -Dcdh.hadoop.version=2.7.0-SNAPSHOT

Once we do a new release of Hadoop with HTrace 3.1.0 this will get a lot
easier.

Related: Does anyone know what the best git branch to build from for
HBase would be for this kind of testing? I've been meaning to do some
end to end testing (it's been on my TODO for a while)

best,
Colin

On Wed, Feb 11, 2015 at 7:55 AM, Chunxu Tang chunxut...@gmail.com wrote:
Hi all,

Now I’m exploiting HTrace to trace request level data flows in HBase and
HDFS. I have successfully traced HBase and HDFS by using HTrace,
respectively.

After that, I combine HBase and HDFS together and I want to just send a
PUT/GET request to HBase, but to trace the whole data flow in both HBase
and HDFS. In my opinion, when I send a request such as Get to HBase, it
will at last try to read the blocks on HDFS, so I can construct a whole
data flow tracing through HBase and HDFS. While, the fact is that I can
only get tracing data of HBase, with no data of HDFS.

Could you give me any suggestions on how to trace the data flow in both
HBase and HDFS? Does anyone have similar experience? Do I need to modify
the source code? And maybe which part(s) should I touch? If I need to
modify the code, I will try to create a patch for that.

Thank you.

My Configurations:
Hadoop version: 2.6.0
HBase version: 0.99.2
HTrace version: htrace-master
OS: Ubuntu 12.04

Joshua

Re: HTrace integration for more HDFS client operations

2015-01-18 Thread Colin P. McCabe

On Fri, Jan 16, 2015 at 10:03 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 This reminds me: have we tested the compatibility of this new release with
 previous versions? For instance, if we upgrade HBase to the incubator
 release but not HDFS, will tracing work only as far as that?


So, for this 3.1.0 release, the story is pretty simple.  The previous
releases were in a different namespace and the jars had a different name,
so HBase and HDFS can use different versions if they want to.  There will
be no conflicts.  Of course, if HBase and HDFS don't use the same version,
the spans won't be parented with HBase's spans.  But there are no crashes
or other problems like that.

The situation for the future is more complex.  Of course, HBase pulls in
jars from Hadoop.  One of those jars is going to be our htrace-core jar.
The HDFS client and HBase's daemons are going to want to use the same
version of htrace.

I know that HBase likes to provide compatibility with as many versions of
Hadoop as it can.  Basically HBase is going to have to look at the oldest
version of Apache HTrace that Hadoop might ask it to use, and verify that
that works.

It might help to look at the stuff we're trying to get rid of in the API:
1. We'd like to get rid of the Span#addKVAnnotation method which takes
byte[], in favor of the one which takes String
2. We'd like to get rid of the public MilliSpan constructor...
MilliSpan#Builder is more flexible and future-proof.  If we want to add new
parameters we don't want a combinatorial explosion of constructors (we
learned this in Hadoop)
3. Do not use Span#getParentId because it assumes that there is a single
parent for each span, an assumption we're trying to get rid of

#2 and #3 shouldn't be a problem for HBase because there's no reason for
HBase to directly create MilliSpans, or call getParentId.  I bet there
might be some cases where we're calling the byte[] version of
addKVAnnotation, though.

So tl;dr: When we update HBase to use the new Apache jar, let's be careful
NOT to use any of these deprecated APIs.  Then we should be able to remove
those from the next release without creating any compat problems for HBase.

best,
Colin



 On Fri, Jan 16, 2015 at 9:44 AM, Stack st...@duboce.net wrote:

  You the man CPMcC.
  St.Ack
 
  On Fri, Jan 16, 2015 at 12:30 AM, Colin P. McCabe cmcc...@apache.org
  wrote:
 
   Hi all,
  
   I've got some good news that I figured I'd post to the list!  Today I
  added
   a bunch of htrace integrating to HDFS, in
   https://issues.apache.org/jira/browse/HDFS-7189.  This patch adds
  tracing
   for a whole host of DFS client operations, such as rename and delete.
  
   Obviously this will be helpful for HDFS users, and it should also
  increase
   our ability to follow HBase operations all the way back into HDFS via
   HTrace-- for example when HBase is deleting or moving a WAL, etc.
  
   The last big piece of HTrace integration for HDFS is integration into
 the
   output stream (i.e. the write path).  This should be coming soon, so
 stay
   tuned.
  
   cheers,
   Colin

58 matches

Mail list logo