Re: Gsoc 2016 with apache HTrace
Hi Madhawa, s3 and other alternate FS connector support for HTrace is a really great project, I think. The connector code is getting a lot of real-world use, and it would be interesting to get performance numbers here. It would be especially interesting to see some graphs of latencies for various operations. I'll ask some folks I know who are working on the connector code if they have any suggestions. best, Colin On Wed, Mar 16, 2016 at 10:21 PM, Madhawa Kasun Gunasekarawrote: > Hi, > > I am a final year student in IESL College of Engineering, Sri lanka. I am > interested with working with apache htrace. I'm interested in "Add HTrace > distributed tracing for s3 and other alternative Hadoop FS implementations" > project [1]. Please kindly give me further information on how I could > proceed. > > [1] https://issues.apache.org/jira/browse/COMDEV-191 > > Thanks > > Madhawa
Re: Gsoc 2016 with apache HTrace
Thanks for your interest, Madhawa! As a next step, I suggest filing a JIRA on the HADOOP bug tracker. This bug tracker is used for things in the "Hadoop common" subproject. This is where the code for the s3 connector and other filesystem connectors live. Just add a few paragraphs of description of what you'd like to do. Note that Hadoop actually has three different s3 connectors at the moment. "s3", "s3n", and "s3a". s3a is the most modern connector and probably the one that you should focus on. The other connectors are older and we don't recommend using them. The code is here: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java Once you feel confident in the scope of your proposal, you can fill out an application as described here: http://community.apache.org/gsoc.html See the "Application template" section. There's a timeline here: https://developers.google.com/open-source/gsoc/timeline. I believe the application deadline is March 25. best, Colin On Thu, Mar 17, 2016 at 12:38 PM, Madhawa Kasun Gunasekara <madhaw...@gmail.com> wrote: > Hi Colin, > > Thanks for the response, yes I have seen the value of this project. > I would like to know the suggestions, before drafting the project proposal. > > Please kindly give me further information on how I could proceed. > > Thanks, > Madhawa > > Madhawa > > On Thu, Mar 17, 2016 at 11:54 PM, Colin P. McCabe <cmcc...@apache.org> > wrote: >> >> Hi Madhawa, >> >> s3 and other alternate FS connector support for HTrace is a really >> great project, I think. The connector code is getting a lot of >> real-world use, and it would be interesting to get performance numbers >> here. It would be especially interesting to see some graphs of >> latencies for various operations. I'll ask some folks I know who are >> working on the connector code if they have any suggestions. >> >> best, >> Colin >> >> On Wed, Mar 16, 2016 at 10:21 PM, Madhawa Kasun Gunasekara >> <madhaw...@gmail.com> wrote: >> > Hi, >> > >> > I am a final year student in IESL College of Engineering, Sri lanka. I >> > am >> > interested with working with apache htrace. I'm interested in "Add >> > HTrace >> > distributed tracing for s3 and other alternative Hadoop FS >> > implementations" >> > project [1]. Please kindly give me further information on how I could >> > proceed. >> > >> > [1] https://issues.apache.org/jira/browse/COMDEV-191 >> > >> > Thanks >> > >> > Madhawa > >
Re: Experiences Using Apache HTrace (Incubating) in Distributed Web Search
Awesome! Looking forward to checking out the slides. best, Colin On Thu, Mar 3, 2016 at 5:00 PM, Lewis John Mcgibbneywrote: > Hi Folks, > A heads up that I sent in a proposal to Apache Big Data on the above topic. > Very pleased that it was accepted and I hope to be in Vancouver to share > experiences. > The is based on my ongoing work on > https://issues.apache.org/jira/browse/NUTCH-2005 > I would like to share slides and get feedback closer to the event prior to > me submitting the slides so I will update this thread nearer the time. > Thanks for now. > Lewis > > -- > *Lewis*
Re: HTrace 4.1 release candidate 2
Thanks, guys. I will post to the incubator list in a sec. best, Colin On Wed, Feb 24, 2016 at 9:56 AM, Elliott Clark <ecl...@apache.org> wrote: > +1 on the RC as well. > > Checked the hash. > Created a test app using the api. > Built from src. > > > On Mon, Feb 22, 2016 at 8:37 PM, Stack <st...@duboce.net> wrote: > >> On Mon, Feb 22, 2016 at 3:03 PM, Colin P. McCabe <cmcc...@apache.org> >> wrote: >> >> > There are at least 3 RPC compatibility-breaking changes in >> > htrace-htraced between 4.0.1 and 4.1.0: >> > >> > HTRACE-315 changed the default port for htraced's HTTP interface from >> > 9095 to 9096. >> > HTRACE-237 changes the HTTP wire format slightly for htraced. >> > Previous to this, we just sent a whitespace-separated list of trace >> > spans. After this, we send an actual JSON object. >> > HTRACE-308 (Deserialize WriteSpans requests incrementally rather than >> > all at once) changes the field "Spans" in the HRPC header to >> > "NumSpans". It also decreases the maximum RPC size, >> > MAX_HRPC_BODY_LENGTH, from 64 MB to 32 MB. >> > >> > >> Thanks Colin. >> >> >> > To be honest, the main point of 4.0.1 was to stabilize the >> > htrace-core4 API and make a first release with the GUI. There was a >> > lot of unfinished business in htraced-- things that we only got to >> > 4.1. htraced is way more stable in 4.1 since we dealt with things >> > like GC pressure, the client side, and so forth. And also just fixing >> > bugs. I think we should accept the compatibility break with 4.0.1. >> > However, I agree that it would be nice to support the "old client, new >> > server" case. I filed HTRACE-344 to add a mechanism that makes it >> > easier to detect the case where the client is too new, and give back a >> > reasonable response. I don't think we should adopt a formal >> > compatibility policy-- htraced is not mature enough for that. But we >> > can strive to maintain compatibility harder than we have in 4.2 (or >> > whatever the next release ends up being.) >> > >> > >> Ok. Thats good enough I'd say. >> >> +1 on the 4.1 RC. >> >> I checked hash and signature. Built from src and all unit tests passed. >> Started up htraced and put in a few spans with the client. >> >> St.Ack >> >> >> >> >> >> >> > best, >> > Colin >> > >> > >> > On Mon, Feb 22, 2016 at 10:37 AM, Stack <st...@duboce.net> wrote: >> > > On Mon, Feb 22, 2016 at 9:51 AM, Colin P. McCabe <cmcc...@apache.org> >> > wrote: >> > > >> > >> On Sun, Feb 21, 2016 at 8:10 PM, Stack <st...@duboce.net> wrote: >> > >> > "The rationale for this limitation is that tracing can simply be >> > disabled >> > >> > for a brief period during the rolling upgrade process." >> > >> > >> > >> > The second time an operator has to do this, they'll just throw away >> > >> tracing >> > >> > as a PITA. >> > >> >> > >> Let's put this in perspective. Apache Spark recently transitioned >> > >> from Scala 2.10 to 2.11. Those Scala releases aren't binary >> > >> compatible. They aren't even source-code compatible, which means that >> > >> people potentially had to rewrite their Spark jobs just to perform an >> > >> upgrade... let alone have things continue to work during the upgrade. >> > >> And people didn't throw away Spark; it's more popular than ever. >> > >> >> > >> >> > > By 'perspective', you mean others made a mess so we can too? >> > > >> > > >> > > >> > >> Now: It makes sense for a storage system (particularly a mature and >> > >> widely-deployed one) to bend over backwards to stay up during >> > >> upgrades. That's why HDFS is so strict about this, and HBase as well. >> > >> But they weren't always that strict; we used to break RPC >> > >> compatibility with every release in the earlier days. Also, HTrace is >> > >> not a storage system! It's a tracing system. It can be unavailable >> > >> for a few hours. It will be OK. >> > >> >> > >> What's not OK is for us to have CLASSPATH conflicts within a minor >> > >> version of htrace-core4 th
Re: PRs submitted over public Github repo
Thanks, Jake. That will be helpful. Best, Colin On Feb 22, 2016 8:27 AM, "Jake Farrell" <jfarr...@apache.org> wrote: > If the github issue has the jira ticket id in it then we can enable the > github webhook to comment on the jira issue each time the pr is updated, > this is part of our typical github integrations setup > > -Jake > > On Fri, Feb 19, 2016 at 8:25 PM, Colin P. McCabe <cmcc...@apache.org> > wrote: > > > I think it would be more appropriate to shadow the pull request to > > JIRA, since in order to go in, all contributions need a JIRA + review > > from committers. Let's continue this discussion on INFRA-11298. > > > > best, > > Colin > > > > On Fri, Feb 19, 2016 at 3:27 PM, Lewis John Mcgibbney > > <lewis.mcgibb...@gmail.com> wrote: > > > I am of the opinion that we should definitely have the PR's being > > shadowed > > > to this list. > > > It provides us with context about new contributions and more > importantly > > > new community members. > > > Thanks Andrey for hitting the lists and letting us know about that. > > > I filed https://issues.apache.org/jira/browse/INFRA-11298 to address > > this. > > > Thanks > > > > > > On Fri, Feb 19, 2016 at 3:22 PM, < > > > dev-digest-h...@htrace.incubator.apache.org> wrote: > > > > > >> > > >> -- Forwarded message -- > > >> From: Andrey Redko <drr...@gmail.com> > > >> To: dev@htrace.incubator.apache.org > > >> Cc: > > >> Date: Thu, 18 Feb 2016 07:47:50 -0500 > > >> Subject: PRs submitted over public Github repo > > >> Hey Devs, > > >> > > >> I am wondering if anyone is watching the PRs submitted over public > > Github > > >> repo (http://github.com/apache/incubator-htrace/)? > > >> Would be great to have some feedback on those. > > >> Thank you. > > >> > > >> Best Regards, > > >> Andriy Redko > > >> > > >> > > >> > > >
Re: PRs submitted over public Github repo
I think it would be more appropriate to shadow the pull request to JIRA, since in order to go in, all contributions need a JIRA + review from committers. Let's continue this discussion on INFRA-11298. best, Colin On Fri, Feb 19, 2016 at 3:27 PM, Lewis John Mcgibbneywrote: > I am of the opinion that we should definitely have the PR's being shadowed > to this list. > It provides us with context about new contributions and more importantly > new community members. > Thanks Andrey for hitting the lists and letting us know about that. > I filed https://issues.apache.org/jira/browse/INFRA-11298 to address this. > Thanks > > On Fri, Feb 19, 2016 at 3:22 PM, < > dev-digest-h...@htrace.incubator.apache.org> wrote: > >> >> -- Forwarded message -- >> From: Andrey Redko >> To: dev@htrace.incubator.apache.org >> Cc: >> Date: Thu, 18 Feb 2016 07:47:50 -0500 >> Subject: PRs submitted over public Github repo >> Hey Devs, >> >> I am wondering if anyone is watching the PRs submitted over public Github >> repo (http://github.com/apache/incubator-htrace/)? >> Would be great to have some feedback on those. >> Thank you. >> >> Best Regards, >> Andriy Redko >> >> >>
Re: HTrace 4.1 release candidate 2
Our compatibility policy (see http://mail-archives.apache.org/mod_mbox/htrace-dev/201509.mbox/%3c55f8badc.4050...@oss.nttdata.co.jp%3E ) only covers the htrace-core4 API right now. So we can guarantee that any projects using htrace-core 4.0.1 can upgrade to htrace-core 4.1.0 without breaking anything. (This is a more painful guarantee than it sounds since it means we can't remove functions, only deprecate them... And so forth.) But it's a very useful guarantee for our downstream projects. However, we don't support mixing and matching versions of the SpanReceiver client and server components. The admin has to roll out a uniform version of those components-- for example, using htraced 4.0.1 with htrace-htraced.jar 4.1.0 is not supported. The rationale for this limitation is that tracing can simply be disabled for a brief period during the rolling upgrade process. Also, the different SpanReceiver subprojects are at different levels of maturity, and imposing heavy compatibility guarantees would slow down development for no real gain. best, Colin On Fri, Feb 19, 2016 at 4:32 PM, Stack <st...@duboce.net> wrote: > Can a 4.0.1 client talk to a 4.1.0 htrace? Has it been tested? > St.Ack > > On Tue, Feb 9, 2016 at 7:00 PM, Colin P. McCabe <cmcc...@apache.org> wrote: > >> Hi all, >> >> I've posted the second release candidate for HTrace 4.1 here: >> >> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc2/ >> >> The jars have been staged here: >> >> https://repository.apache.org/content/repositories/orgapachehtrace-1022 >> >> Compared to rc1, this rc includes HTRACE-334 and HTRACE-342. >> >> HTrace 4.1 brings a lot of robustness improvements. There were major >> improvements to htraced and the web UI, as well as new metrics added. >> There were numerous build fixups, and we added Docker support, to >> ensure a repeatable build. >> >> Check it out. The vote will run for 5 days. >> >> cheers, >> Colin >> >> >> Release Notes - HTrace - Version 4.1 >> ** Bug >> * [HTRACE-114] - Fix compilation error of htrace-hbase against >> hbase-1.0.0 >> * [HTRACE-238] - Change maven compiler source level to 1.7 to >> match targetJdk >> * [HTRACE-243] - Remove duplicate maven-assembly-plugin >> configuration section in htrace-htraced/pom.xml >> * [HTRACE-245] - NOTICE.txt: change "developed by The Apache >> Software...” to "developed at The Apache Software...” >> * [HTRACE-246] - HTrace WebApp not properly defined and therefore >> not packaged into .war >> * [HTRACE-248] - HTraced should gracefully shutdown if stopped >> * [HTRACE-249] - Script and doc on how to publish website >> * [HTRACE-251] - Fix "mvn clean" target >> * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs >> are too chatty >> * [HTRACE-256] - Change the artifactId for htrace-core in branch >> 4.0 to be htrace-core4 >> * [HTRACE-257] - htrace-htraced: add web symlink rather than >> generating programmatically >> * [HTRACE-262] - Temporarily suppress doclint for Java 8 to >> prevent build failure >> * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY >> config key more consistent with other configs >> * [HTRACE-267] - Move owl logo licensing information from NOTICE to >> LICENSE >> * [HTRACE-268] - Remove Units and go-codec from LICENSE since they >> are not contained in the source release >> * [HTRACE-272] - TracerPool must not load multiple inscance of >> same receiver class when a simple classname is given >> * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was >> using the wrong JSON serialization for spans and add validation to >> htraced REST ingest path >> * [HTRACE-280] - htraced: add metrics about total spans added and >> dropped per address >> * [HTRACE-281] - htraced: add example/htraced-conf.xml >> * [HTRACE-282] - htraced: reap spans which are older than a >> configurable interval >> * [HTRACE-283] - Heartbeater should wait for goroutine to finish on >> close >> * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the >> shaded version of commons-logging as provided >> * [HTRACE-285] - htraced tool: fix query parsing and add query_test >> * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for >> conditional logging >> * [HTRACE-294] - htraced: fix some metrics issues >> * [HTRACE-297] - htraced: avoid serializing spans to json unless >> TRACE logging is enabled >> * [HTRACE-300] - Reaper should be initialized
Re: PRs submitted over public Github repo
Hi Andrey, Please file a JIRA for any issues you find. It's at https://issues.apache.org/jira/browse/htrace We should probably put this in the README.md so that it shows up on github. We were talking about enabling gerrit a while back, but unfortunately there needs to be some more work on the ASF infrastructure side to make that work. best, Colin On Thu, Feb 18, 2016 at 4:47 AM, Andrey Redkowrote: > Hey Devs, > > I am wondering if anyone is watching the PRs submitted over public Github > repo (http://github.com/apache/incubator-htrace/)? > Would be great to have some feedback on those. > Thank you. > > Best Regards, > Andriy Redko
Re: HTrace 4.1 release candidate 1
I have sunk this RC, and posted RC2. Thanks for the feedback, all! best, Colin On Mon, Feb 8, 2016 at 2:37 PM, Colin P. McCabe <cmcc...@apache.org> wrote: > Thanks, guys. I will spin another RC with HTRACE-334 added. Can I > get a review of https://issues.apache.org/jira/browse/HTRACE-342 so > that I can add that as well? It's a very simple docs change. > > best, > Colin > > On Sat, Feb 6, 2016 at 4:04 PM, Masatake Iwasaki > <iwasak...@oss.nttdata.co.jp> wrote: >> Thanks for putting this up, Colin. >> >>> * [HTRACE-334] - htrace-web: Make limit of search and children API >>> configurable >> >> This seemed not to be cherry-picked to branch-4.1. >> I do not think this is critical but would like it to be in. >> >> Except for this, the RC is good. >> >> I built Hadoop against 4.1.0-incubating, >> run HDFS operations with tracing enabled, >> saw tracing by Web-UI of htraced. >> It worked fine. >> >> Masatake Iwasaki >> >> >> On 2/3/16 09:50, Colin P. McCabe wrote: >>> >>> Hi all, >>> >>> I've posted the first release candidate for HTrace 4.1 here: >>> >>> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/ >>> >>> The jars have been staged here: >>> >>> https://repository.apache.org/content/repositories/orgapachehtrace-1021 >>> >>> HTrace 4.1 brings a lot of robustness improvements. There were major >>> improvements to htraced and the web UI, as well as new metrics added. >>> There were numerous build fixups, and we added Docker support, to >>> ensure a repeatable build. >>> >>> Check it out. The vote will run for 5 days. >>> >>> cheers, >>> Colin >>> >>> Release Notes - HTrace - Version 4.1 >>> ** Bug >>> * [HTRACE-114] - Fix compilation error of htrace-hbase against >>> hbase-1.0.0 >>> * [HTRACE-238] - Change maven compiler source level to 1.7 to >>> match targetJdk >>> * [HTRACE-243] - Remove duplicate maven-assembly-plugin >>> configuration section in htrace-htraced/pom.xml >>> * [HTRACE-245] - NOTICE.txt: change "developed by The Apache >>> Software...” to "developed at The Apache Software...” >>> * [HTRACE-246] - HTrace WebApp not properly defined and therefore >>> not packaged into .war >>> * [HTRACE-248] - HTraced should gracefully shutdown if stopped >>> * [HTRACE-249] - Script and doc on how to publish website >>> * [HTRACE-251] - Fix "mvn clean" target >>> * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs >>> are too chatty >>> * [HTRACE-256] - Change the artifactId for htrace-core in branch >>> 4.0 to be htrace-core4 >>> * [HTRACE-257] - htrace-htraced: add web symlink rather than >>> generating programmatically >>> * [HTRACE-262] - Temporarily suppress doclint for Java 8 to >>> prevent build failure >>> * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY >>> config key more consistent with other configs >>> * [HTRACE-267] - Move owl logo licensing information from NOTICE to >>> LICENSE >>> * [HTRACE-268] - Remove Units and go-codec from LICENSE since they >>> are not contained in the source release >>> * [HTRACE-272] - TracerPool must not load multiple inscance of >>> same receiver class when a simple classname is given >>> * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was >>> using the wrong JSON serialization for spans and add validation to >>> htraced REST ingest path >>> * [HTRACE-280] - htraced: add metrics about total spans added and >>> dropped per address >>> * [HTRACE-281] - htraced: add example/htraced-conf.xml >>> * [HTRACE-282] - htraced: reap spans which are older than a >>> configurable interval >>> * [HTRACE-283] - Heartbeater should wait for goroutine to finish on >>> close >>> * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the >>> shaded version of commons-logging as provided >>> * [HTRACE-285] - htraced tool: fix query parsing and add query_test >>> * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for >>> conditional logging >>> * [HTRACE-294] - htraced: fix some metrics issues >>> * [HTRACE-297] - htraced: avoid serializing spans to json unless >>> TRACE logging is enabled >>
HTrace 4.1 release candidate 2
Hi all, I've posted the second release candidate for HTrace 4.1 here: http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc2/ The jars have been staged here: https://repository.apache.org/content/repositories/orgapachehtrace-1022 Compared to rc1, this rc includes HTRACE-334 and HTRACE-342. HTrace 4.1 brings a lot of robustness improvements. There were major improvements to htraced and the web UI, as well as new metrics added. There were numerous build fixups, and we added Docker support, to ensure a repeatable build. Check it out. The vote will run for 5 days. cheers, Colin Release Notes - HTrace - Version 4.1 ** Bug * [HTRACE-114] - Fix compilation error of htrace-hbase against hbase-1.0.0 * [HTRACE-238] - Change maven compiler source level to 1.7 to match targetJdk * [HTRACE-243] - Remove duplicate maven-assembly-plugin configuration section in htrace-htraced/pom.xml * [HTRACE-245] - NOTICE.txt: change "developed by The Apache Software...” to "developed at The Apache Software...” * [HTRACE-246] - HTrace WebApp not properly defined and therefore not packaged into .war * [HTRACE-248] - HTraced should gracefully shutdown if stopped * [HTRACE-249] - Script and doc on how to publish website * [HTRACE-251] - Fix "mvn clean" target * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs are too chatty * [HTRACE-256] - Change the artifactId for htrace-core in branch 4.0 to be htrace-core4 * [HTRACE-257] - htrace-htraced: add web symlink rather than generating programmatically * [HTRACE-262] - Temporarily suppress doclint for Java 8 to prevent build failure * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY config key more consistent with other configs * [HTRACE-267] - Move owl logo licensing information from NOTICE to LICENSE * [HTRACE-268] - Remove Units and go-codec from LICENSE since they are not contained in the source release * [HTRACE-272] - TracerPool must not load multiple inscance of same receiver class when a simple classname is given * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was using the wrong JSON serialization for spans and add validation to htraced REST ingest path * [HTRACE-280] - htraced: add metrics about total spans added and dropped per address * [HTRACE-281] - htraced: add example/htraced-conf.xml * [HTRACE-282] - htraced: reap spans which are older than a configurable interval * [HTRACE-283] - Heartbeater should wait for goroutine to finish on close * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the shaded version of commons-logging as provided * [HTRACE-285] - htraced tool: fix query parsing and add query_test * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for conditional logging * [HTRACE-294] - htraced: fix some metrics issues * [HTRACE-297] - htraced: avoid serializing spans to json unless TRACE logging is enabled * [HTRACE-300] - Reaper should be initialized before shards are activated * [HTRACE-301] - htraced: fix unit tests that aren't waiting for spans to be written, use semaphore for WrittenSpans * [HTRACE-302] - htraced: Add admissions control to HRPC to limit the number of incoming messages * [HTRACE-304] - htraced: fix bug with GREATER_THAN queries * [HTRACE-307] - htraced: queries sometimes return no results even when many results exist due to confusion in iterator usage * [HTRACE-311] - htraced: Fix logging to stdout via -Dlog.path= * [HTRACE-316] - htrace-web: span.js issue: span ID string length is 32, not 36 * [HTRACE-317] - Fix the documentation for adding tracing to an application to reflect HTrace 4.x API changes * [HTRACE-328] - htraced continues scanning in some cases even when no more results are possible ** Improvement * [HTRACE-342] - centralize building instructions in BUILDING.txt * [HTRACE-334] - htrace-web: Make limit of search and children API configurable * [HTRACE-129] - htraced: add /server/stats REST endpoint * [HTRACE-156] - HTrace GUI: add about view * [HTRACE-181] - gui: Split "about" screen * [HTRACE-237] - Optimize htraced span receiver * [HTRACE-239] - Add htrace/impl/TestZipkinSpanReceiver.java * [HTRACE-260] - htrace-zipkin should not set the obsolete duration field in thrift * [HTRACE-271] - Add log4j.properties to all submodule tests * [HTRACE-276] - Shade classes into org.apache.htrace.shaded rather than org.apache.htrace * [HTRACE-286] - htraced: improvements to logging, daemon startup, and configuration * [HTRACE-290] - htraced: Fix per-faculty log level settings and add unit tests for conditional logging * [HTRACE-291] - rename bin/htrace to bin/htracedTool * [HTRACE-292] - "htracedTool version" should display the git hash, and -Dgit.version option should be available for build * [HTRACE-295] - htraced: setting span.expiry.ms to 0 should disable span expiry *
Re: HTrace 4.1 release candidate 1
Thanks, guys. I will spin another RC with HTRACE-334 added. Can I get a review of https://issues.apache.org/jira/browse/HTRACE-342 so that I can add that as well? It's a very simple docs change. best, Colin On Sat, Feb 6, 2016 at 4:04 PM, Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> wrote: > Thanks for putting this up, Colin. > >> * [HTRACE-334] - htrace-web: Make limit of search and children API >> configurable > > This seemed not to be cherry-picked to branch-4.1. > I do not think this is critical but would like it to be in. > > Except for this, the RC is good. > > I built Hadoop against 4.1.0-incubating, > run HDFS operations with tracing enabled, > saw tracing by Web-UI of htraced. > It worked fine. > > Masatake Iwasaki > > > On 2/3/16 09:50, Colin P. McCabe wrote: >> >> Hi all, >> >> I've posted the first release candidate for HTrace 4.1 here: >> >> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/ >> >> The jars have been staged here: >> >> https://repository.apache.org/content/repositories/orgapachehtrace-1021 >> >> HTrace 4.1 brings a lot of robustness improvements. There were major >> improvements to htraced and the web UI, as well as new metrics added. >> There were numerous build fixups, and we added Docker support, to >> ensure a repeatable build. >> >> Check it out. The vote will run for 5 days. >> >> cheers, >> Colin >> >> Release Notes - HTrace - Version 4.1 >> ** Bug >> * [HTRACE-114] - Fix compilation error of htrace-hbase against >> hbase-1.0.0 >> * [HTRACE-238] - Change maven compiler source level to 1.7 to >> match targetJdk >> * [HTRACE-243] - Remove duplicate maven-assembly-plugin >> configuration section in htrace-htraced/pom.xml >> * [HTRACE-245] - NOTICE.txt: change "developed by The Apache >> Software...” to "developed at The Apache Software...” >> * [HTRACE-246] - HTrace WebApp not properly defined and therefore >> not packaged into .war >> * [HTRACE-248] - HTraced should gracefully shutdown if stopped >> * [HTRACE-249] - Script and doc on how to publish website >> * [HTRACE-251] - Fix "mvn clean" target >> * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs >> are too chatty >> * [HTRACE-256] - Change the artifactId for htrace-core in branch >> 4.0 to be htrace-core4 >> * [HTRACE-257] - htrace-htraced: add web symlink rather than >> generating programmatically >> * [HTRACE-262] - Temporarily suppress doclint for Java 8 to >> prevent build failure >> * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY >> config key more consistent with other configs >> * [HTRACE-267] - Move owl logo licensing information from NOTICE to >> LICENSE >> * [HTRACE-268] - Remove Units and go-codec from LICENSE since they >> are not contained in the source release >> * [HTRACE-272] - TracerPool must not load multiple inscance of >> same receiver class when a simple classname is given >> * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was >> using the wrong JSON serialization for spans and add validation to >> htraced REST ingest path >> * [HTRACE-280] - htraced: add metrics about total spans added and >> dropped per address >> * [HTRACE-281] - htraced: add example/htraced-conf.xml >> * [HTRACE-282] - htraced: reap spans which are older than a >> configurable interval >> * [HTRACE-283] - Heartbeater should wait for goroutine to finish on >> close >> * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the >> shaded version of commons-logging as provided >> * [HTRACE-285] - htraced tool: fix query parsing and add query_test >> * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for >> conditional logging >> * [HTRACE-294] - htraced: fix some metrics issues >> * [HTRACE-297] - htraced: avoid serializing spans to json unless >> TRACE logging is enabled >> * [HTRACE-300] - Reaper should be initialized before shards are >> activated >> * [HTRACE-301] - htraced: fix unit tests that aren't waiting for >> spans to be written, use semaphore for WrittenSpans >> * [HTRACE-302] - htraced: Add admissions control to HRPC to limit >> the number of incoming messages >> * [HTRACE-304] - htraced: fix bug with GREATER_THAN queries >> * [HTRACE-307] - htraced: queries sometimes return no results even >> when many results exist due to confusion in iter
HTrace 4.1 release candidate 1
Hi all, I've posted the first release candidate for HTrace 4.1 here: http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/ The jars have been staged here: https://repository.apache.org/content/repositories/orgapachehtrace-1021 HTrace 4.1 brings a lot of robustness improvements. There were major improvements to htraced and the web UI, as well as new metrics added. There were numerous build fixups, and we added Docker support, to ensure a repeatable build. Check it out. The vote will run for 5 days. cheers, Colin Release Notes - HTrace - Version 4.1 ** Bug * [HTRACE-114] - Fix compilation error of htrace-hbase against hbase-1.0.0 * [HTRACE-238] - Change maven compiler source level to 1.7 to match targetJdk * [HTRACE-243] - Remove duplicate maven-assembly-plugin configuration section in htrace-htraced/pom.xml * [HTRACE-245] - NOTICE.txt: change "developed by The Apache Software...” to "developed at The Apache Software...” * [HTRACE-246] - HTrace WebApp not properly defined and therefore not packaged into .war * [HTRACE-248] - HTraced should gracefully shutdown if stopped * [HTRACE-249] - Script and doc on how to publish website * [HTRACE-251] - Fix "mvn clean" target * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs are too chatty * [HTRACE-256] - Change the artifactId for htrace-core in branch 4.0 to be htrace-core4 * [HTRACE-257] - htrace-htraced: add web symlink rather than generating programmatically * [HTRACE-262] - Temporarily suppress doclint for Java 8 to prevent build failure * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY config key more consistent with other configs * [HTRACE-267] - Move owl logo licensing information from NOTICE to LICENSE * [HTRACE-268] - Remove Units and go-codec from LICENSE since they are not contained in the source release * [HTRACE-272] - TracerPool must not load multiple inscance of same receiver class when a simple classname is given * [HTRACE-279] - Fix issues where the HTracedSpanReceiver was using the wrong JSON serialization for spans and add validation to htraced REST ingest path * [HTRACE-280] - htraced: add metrics about total spans added and dropped per address * [HTRACE-281] - htraced: add example/htraced-conf.xml * [HTRACE-282] - htraced: reap spans which are older than a configurable interval * [HTRACE-283] - Heartbeater should wait for goroutine to finish on close * [HTRACE-284] - htrace-htraced, htrace-flume: do not treat the shaded version of commons-logging as provided * [HTRACE-285] - htraced tool: fix query parsing and add query_test * [HTRACE-289] - Fix TraceEnabled, etc. logger methods for conditional logging * [HTRACE-294] - htraced: fix some metrics issues * [HTRACE-297] - htraced: avoid serializing spans to json unless TRACE logging is enabled * [HTRACE-300] - Reaper should be initialized before shards are activated * [HTRACE-301] - htraced: fix unit tests that aren't waiting for spans to be written, use semaphore for WrittenSpans * [HTRACE-302] - htraced: Add admissions control to HRPC to limit the number of incoming messages * [HTRACE-304] - htraced: fix bug with GREATER_THAN queries * [HTRACE-307] - htraced: queries sometimes return no results even when many results exist due to confusion in iterator usage * [HTRACE-311] - htraced: Fix logging to stdout via -Dlog.path= * [HTRACE-316] - htrace-web: span.js issue: span ID string length is 32, not 36 * [HTRACE-317] - Fix the documentation for adding tracing to an application to reflect HTrace 4.x API changes * [HTRACE-328] - htraced continues scanning in some cases even when no more results are possible ** Improvement * [HTRACE-129] - htraced: add /server/stats REST endpoint * [HTRACE-156] - HTrace GUI: add about view * [HTRACE-181] - gui: Split "about" screen * [HTRACE-237] - Optimize htraced span receiver * [HTRACE-239] - Add htrace/impl/TestZipkinSpanReceiver.java * [HTRACE-260] - htrace-zipkin should not set the obsolete duration field in thrift * [HTRACE-271] - Add log4j.properties to all submodule tests * [HTRACE-276] - Shade classes into org.apache.htrace.shaded rather than org.apache.htrace * [HTRACE-286] - htraced: improvements to logging, daemon startup, and configuration * [HTRACE-290] - htraced: Fix per-faculty log level settings and add unit tests for conditional logging * [HTRACE-291] - rename bin/htrace to bin/htracedTool * [HTRACE-292] - "htracedTool version" should display the git hash, and -Dgit.version option should be available for build * [HTRACE-295] - htraced: setting span.expiry.ms to 0 should disable span expiry * [HTRACE-296] - htraced tests: make sure local settings for HTRACED_WEB_DIR and HTRACE_CONF_DIR don't affect unit tests * [HTRACE-298] - htraced: improve datastore serialization and metrics * [HTRACE-303] - Add
Re: HTrace 4.1 release candidate 1
Thanks for looking at this, Lewis. On Tue, Feb 2, 2016 at 6:25 PM, Lewis John Mcgibbneywrote: > Hi Colin, > > Signatures Good > Aggregated results of running DRAT over the release candidate > > Notes Binaries Archives Standards Apache Generated Unknown > 0 0 0 142 118 0 15 > Unapproved licenses include > > > /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap-theme.css > > /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap-theme.min.css > /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap.css > /usr/local/drat/deploy/data/jobs/rat/1454465689433/input/bootstrap.min.css > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/backbone-1.1.2.js > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/bootstrap.js > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/bootstrap.min.js > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/d3.min.js > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/jquery-2.1.4.js > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/moment-2.10.3.js > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/npm.js > > /usr/local/drat/deploy/data/jobs/rat/1454465688911/input/underscore-1.7.0.js > /usr/local/drat/deploy/data/jobs/rat/1454465688785/input/SpanProtos.java > > /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml > > /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml_02022016_1814 > > I understand that the .css and .js files above are covered in LICENSE at > the bottom however we need to address the following files > > /usr/local/drat/deploy/data/jobs/rat/1454465688785/input/SpanProtos.java > > /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml > > /usr/local/drat/deploy/data/jobs/rat/1454465689194/input/dependency-reduced-pom.xml_02022016_1814 Hmm. I think we talked about SpanProtos.java, dependency-reduced-pom.xml, etc. during the previous release and concluded that they are generated files, and hence exempt from the license requirement according to http://incubator.apache.org/guides/releasemanagement.html#notes-license-headers > > NOTICE includes > Copyright 2015 The Apache Software Foundation > This should be > Copyright 2016 The Apache Software Foundation > OK > There seems to be a bit on confusion between instructions for Building the > code. We have the note in README.md and then a separate note within > BUILDING.txt. We should probably resolve this and include them both in > README.md > OK, I created HTRACE-342 to fix this. > Build and tests pass fine. > > Typically the absence of the license header in the above files would be a > -1 from me. I will wait to see how others review the candidate before > VOTE'ing. > Good job putting this together. Thanks best, Colin > > > On Tue, Feb 2, 2016 at 4:51 PM, > wrote: > >> >> Hi all, >> >> I've posted the first release candidate for HTrace 4.1 here: >> >> http://people.apache.org/~cmccabe/htrace/releases/4.1.0/rc1/ >> >> The jars have been staged here: >> >> https://repository.apache.org/content/repositories/orgapachehtrace-1021 >> >> HTrace 4.1 brings a lot of robustness improvements. There were major >> improvements to htraced and the web UI, as well as new metrics added. >> There were numerous build fixups, and we added Docker support, to >> ensure a repeatable build. >> >> Check it out. The vote will run for 5 days. >> >> cheers, >> Colin >> >> Release Notes - HTrace - Version 4.1 >> ** Bug >> * [HTRACE-114] - Fix compilation error of htrace-hbase against >> hbase-1.0.0 >> * [HTRACE-238] - Change maven compiler source level to 1.7 to >> match targetJdk >> * [HTRACE-243] - Remove duplicate maven-assembly-plugin >> configuration section in htrace-htraced/pom.xml >> * [HTRACE-245] - NOTICE.txt: change "developed by The Apache >> Software...” to "developed at The Apache Software...” >> * [HTRACE-246] - HTrace WebApp not properly defined and therefore >> not packaged into .war >> * [HTRACE-248] - HTraced should gracefully shutdown if stopped >> * [HTRACE-249] - Script and doc on how to publish website >> * [HTRACE-251] - Fix "mvn clean" target >> * [HTRACE-253] - Tracer loadSamplers and loadSpanReceivers logs >> are too chatty >> * [HTRACE-256] - Change the artifactId for htrace-core in branch >> 4.0 to be htrace-core4 >> * [HTRACE-257] - htrace-htraced: add web symlink rather than >> generating programmatically >> * [HTRACE-262] - Temporarily suppress doclint for Java 8 to >> prevent build failure >> * [HTRACE-266] - Make the CLIENT_REST_MAX_SPANS_AT_A_TIME_KEY >> config key more consistent with other configs >> * [HTRACE-267] - Move owl logo licensing information from NOTICE to >> LICENSE >> * [HTRACE-268] - Remove Units and go-codec from LICENSE since they >> are not contained in
Re: git tag and branch naming
Thanks, Sean. That sounds like a good idea. I guess we can drop the "-release" suffix then. "rel/4.0" and "rel/4.0.1", etc. seem pretty self-explanatory. My main goal was just to make branches look different than tags. I would prefer to keep the "-branch" suffix on branches just to make that clear as well. Sean, would RCs also receive the "rel/" prefix, or not? I'm guessing not, since we don't need to preserve them forever. best, Colin On Tue, Jan 26, 2016 at 9:26 PM, Sean Busbey <bus...@cloudera.com> wrote: > with the new ASF release tag policy, this would make our release tags look > like 'rel/4.0-release' and 'rel/4.0.1-release'. > > the 'rel' prefix makes the distinction between branches and tagged releases > clear to me. what do others think? > > On Tue, Jan 26, 2016 at 10:41 PM, Masatake Iwasaki < > iwasak...@oss.nttdata.co.jp> wrote: > >> Sorry for late reply. >> >> I agree with the proposed naming conversion for branches and tags. >> If there is no objection further, we should close HTRACE-331 and >> prepare for the next release. >> >> Thanks, >> Masatake Iwasaki >> >> >> On 12/15/15 04:53, Colin P. McCabe wrote: >> >>> As part of our release process, we create git tags for each release >>> candidate (RC)... for example, 3.1.0RC9 and 4.0.1RC1. We also often >>> use release branches-- for example, the "4.0" branch. >>> >>> As Sean Busbey pointed out, we should also be creating "release" tags, >>> so that people who want to check out the release can do so without >>> having to figure out which RC was anointed as the release. I also >>> think we should adopt a naming convention for release branches and >>> tags so that people attempting to check out tags don't accidentally >>> check out branches, and vice versa. >>> >>> The branch and tag naming is confusing right now. For example, >>> someone running "git checkout 4.0" might be surprised to learn that >>> this checks out a branch currently containing 4.0.1, not the git tag >>> for the 4.0 release. >>> >>> I'm thinking we should adopt the following convention: >>> * release tags should have "release" in the name. So the tag for >>> htrace 4.1 should be "4.1-release" >>> * RC tags continue to be "4.1-RC1" and so forth. >>> * release branches should have "branch" in the name. So the branch for >>> 4.1 should be "branch-4.1". In general, branches should not include >>> "RC[0-9]" or "release" in the names, to avoid confusion with the tags. >>> >>> Let me know what you think. If you guys agree, I will also create >>> 4.0-release and 4.0.1-release tags corresponding to those releases. >>> >>> best, >>> Colin >>> >> >> > > > -- > Sean
Re: Time for a new release?
Ah, sorry. I did some preliminary work (like cutting the branch) but I haven't made the RC yet. I'll try to get it out there soon. best, Colin On Tue, Jan 26, 2016 at 8:47 PM, Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> wrote: > Hi Colin, > > I can see branch-4.1 created in the repo. > Have you already started to create RC? > > Regards, > Masatake Iwasaki > > On 1/15/16 10:18, Masatake Iwasaki wrote: >>> >>> I think it's time to cut the 4.1 release. >> >> >> +1. >> >> I volunteer to do release management work, if you like. >> >> Thanks, >> Masatake Iwasaki >> >> >> On 1/14/16 07:52, Colin P. McCabe wrote: >>> >>> Hi all, >>> >>> Happy new year! >>> >>> I think it's time to cut the 4.1 release. We've fixed up a lot of >>> bugs, and made a lot of progress since 4.0.1. >>> >>> If everyone agrees, I'll post something in the next few days. >>> >>> best, >>> Colin >> >> >
Time for a new release?
Hi all, Happy new year! I think it's time to cut the 4.1 release. We've fixed up a lot of bugs, and made a lot of progress since 4.0.1. If everyone agrees, I'll post something in the next few days. best, Colin
git tag and branch naming
As part of our release process, we create git tags for each release candidate (RC)... for example, 3.1.0RC9 and 4.0.1RC1. We also often use release branches-- for example, the "4.0" branch. As Sean Busbey pointed out, we should also be creating "release" tags, so that people who want to check out the release can do so without having to figure out which RC was anointed as the release. I also think we should adopt a naming convention for release branches and tags so that people attempting to check out tags don't accidentally check out branches, and vice versa. The branch and tag naming is confusing right now. For example, someone running "git checkout 4.0" might be surprised to learn that this checks out a branch currently containing 4.0.1, not the git tag for the 4.0 release. I'm thinking we should adopt the following convention: * release tags should have "release" in the name. So the tag for htrace 4.1 should be "4.1-release" * RC tags continue to be "4.1-RC1" and so forth. * release branches should have "branch" in the name. So the branch for 4.1 should be "branch-4.1". In general, branches should not include "RC[0-9]" or "release" in the names, to avoid confusion with the tags. Let me know what you think. If you guys agree, I will also create 4.0-release and 4.0.1-release tags corresponding to those releases. best, Colin
htrace jenkins build failures
Does anyone know what's up with the 30+ jenkins failure emails we got this weekend. It looks like "mvn clean" was failing with an AccessDeniedException... what could have caused that? Perhaps somehow we created a directory without "execute" permission? I remember we had problems with Maven clean being unable to delete those in the Hadoop build. In any case, the permission denied exception is gone now. best, Colin
Re: [DISCUSS] Release for HTrace/Drive Towards Graduation
Thanks, Adrian. I've been talking to some folks who might be interested in setting up gerrit for htrace. Stay tuned. cheers, Colin On Fri, Oct 30, 2015 at 1:25 PM, Adrian Colewrote: > OK, well do announce once you've decided how to do a CI pipeline that > includes pre-commit testing and context-sensitive review comments. I look > forward to it.
[ANNOUNCE] Apache HTrace 4.0.1 (Incubating) released
The Apache HTrace (Incubating) team is pleased to announce the release of HTrace 4.0.1. HTrace is a tracing framework for use with distributed systems. This dot release fixes some build issues, including the generation of war files for the webapp, the naming of the htrace-core4 artifact, the "mvn clean" target, and the go build on Mac OS X. The release is available in maven: https://repo1.maven.org/maven2/org/apache/htrace/ The full change log is available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315924=12333550 Your help and feedback is more than welcome. For more information on how to report problems and to get involved, visit the project website at http://incubator.apache.org/projects/htrace.html. cheers, The Apache HTrace (Incubating) Team
Re: Making Website Update Part of Release Process
Thanks, stack! That's awesome, will be great for remembering the steps in making a release. Colin On Tue, Sep 15, 2015 at 9:40 PM, Stackwrote: > On Mon, Sep 14, 2015 at 10:14 PM, Stack wrote: > >> They are up now looking at latest mirror: >> http://mirrors.gigenet.com/apache/incubator/htrace/ >> >> Doc on how to release is coming. >> >> > A start on RM doc can be found here: > http://htrace.incubator.apache.org/building.html > > > >> On how to update the website, currently best doc is in HTRACE-19. I can >> update in morning unless you beat me to it. >> >> > Here is how to update website: > http://htrace.incubator.apache.org/building.html#Publishing_htraceincubatorapacheorg_website > > St.Ack > > > >> Thanks Lewis, >> St.Ack >> >> >> >> >> >> On Mon, Sep 14, 2015 at 9:41 PM, Lewis John Mcgibbney < >> lewis.mcgibb...@gmail.com> wrote: >> >>> Hi Folks, >>> Excellent work on getting recent 4.0.0-incubating release out the door. >>> I wonder if there is documentation available for a release manager? If so >>> does it currently contain advice on how to update the HTrace website with >>> the info for the release? >>> Right now I don't even see the HTrace artifacts available >>> http://www.apache.org/dyn/closer.cgi/incubator/htrace/ >>> I am assuming they are not available for public consumption yet. >>> Is this the case? >>> Thanks >>> Lewis >>> >>> -- >>> *Lewis* >>> >> >>
Re: htrace-core compatibility policy for htrace 4.x
On Tue, Sep 15, 2015 at 5:42 PM, Masatake Iwasakiwrote: > I agree with the policy. > > >> Major releases should change the namespace of htrace-core classes so that >> both a 4.x and a 5.x jar can reside on the same CLASSPATH > > Should we have a namespace convention for this? > We moved classes from org.apache.htrace to org.apache.htrace.core on 4.0 > but need new word other than "core" for 5.0. > Simple way for this would be containing version number in package name > like "org.apache.htrace.5". Hmm, interesting idea. org.apache.htrace5 might be nice too. We can probably wait a bit before deciding on the namespace name... hopefully we won't need 5.x for a while :) Colin > > >> Let's focus on just compatibility rules for htrace-core right now, >> since that's where our integration issues are. The other subprojects >> of htrace generally don't have the same integration issues. > > I thinks it is reasonable. > Any version of receiver with the same major version keeps working > as far as htrace-core keeps compatibility. > > > Thanks, > Masatake Iwasaki > > > > On 9/15/15 09:13, Colin McCabe wrote: >> >> Hi all, >> >> In the recent 4.0 release, we changed the htrace-core API. The API >> that programs use to create traces, annotations, etc. (aka the "Java >> client API") went through some changes. This was necessary to clean up >> some core architectural issues (such as the use of overly short 64 bit >> IDs that will collide in a real-world deployment, or the overuse of >> globals.) >> >> Since we want to make it easy for projects to integrate with HTrace, I >> think we should have some compatibility rules for htrace-core for the >> future. >> >> Specifically, I think that we should include only backwards-compatible >> changes to the htrace-core API in HTrace 4.x So, for example, adding a >> new function is OK. Deleting an existing function or altering it in an >> incompatible way is not. It is OK to add a new function to a public >> abstract base class (provided you also add a default implementation in >> the base), but not to add a new function to a public interface, since >> that would break compilation. >> >> We should save incompatible changes for HTrace 5.x. In general, each >> "major release" such as 4.x or 5.x should contain only compatible >> changes to htrace-core. There should be no guarantees between 4.x and >> 5.x, or between any major releases-- this is the time to address >> architectural debt that can't be resolved any other way. Major >> releases should change the namespace of htrace-core classes so that >> both a 4.x and a 5.x jar can reside on the same CLASSPATH, similar to >> how we did with 3.x and 4.x. This is important because it will require >> some time for downstream projects to upgrade from 4.x to 5.x, and in >> the meantime we must avoid CLASSPATH conflict issues. There is no >> requirement that tracing work when the major version of the client and >> the span receivers are different. However, the programs themselves >> should function. >> >> Let's focus on just compatibility rules for htrace-core right now, >> since that's where our integration issues are. The other subprojects >> of htrace generally don't have the same integration issues. For >> example, it is easy for an admin to standardize on a single version of >> htrace-hbase or htrace-htraced across the entire cluster. They simply >> install the jars for the version they want. It is not easy for that >> same admin to standardize htrace-core, since they might have Hadoop >> pulling in 4.1 and HBase pulling in 4.0. The different subprojects are >> also at different levels of maturity. For example, htrace-flume is >> still very immature, whereas htrace-htraced is starting to get more >> mature. So I think the subprojects should come up with their own >> compatibility policies rather than trying to be one-size-fits-all. >> >> This policy should only apply to publicly visible symbols in >> htrace-core, not to private or package-private symbols. Test >> functions should also not be covered, since they don't appear in the >> final htrace-core jar. >> >> I think having a compatibility policy for htrace-core will be very >> nice for the users of our core APIs. Let me know what you think. >> >> best, >> Colin > >
Re: [VOTE] HTrace 4.0 Release Candidate 0
Quick reminder that the vote closes in 4 hours. Check it out. Lewis, check out my NOTICE.txt explanation as well. You can take a look at my public key at http://people.apache.org/~cmccabe/htrace/releases/KEYS. Stack signed when we met up in San Francisco. I also checked the key into https://dist.apache.org/repos/dist/release/incubator/htrace r10450 | cmccabe | 2015-09-08 18:41:03 + (Tue, 08 Sep 2015) | 1 line Add public key for cmcc...@apache.org I'm not sure why it hasn't shown up on http://archive.apache.org/dist/incubator/htrace/KEYS ... is there something I need to do to "publish" it from the svn repo? best, Colin On Fri, Sep 4, 2015 at 4:02 PM, Colin P. McCabe <cmcc...@apache.org> wrote: > I've posted the first release candidate here: > > http://people.apache.org/~cmccabe/htrace/releases/4.0.0/rc0 > > The jars have been staged here: > > https://repository.apache.org/content/repositories/orgapachehtrace-1017 > > There's a lot of great stuff in this release, including a new web UI, > many bug fixes, API improvements and enlargement of span IDs to 128 > bits to avoid conflicts. > > The vote will run for 5 days. > > cheers, > Colin > > Release Notes - HTrace - Version 4.0 > > ** Sub-task > * [HTRACE-208] - Remove deprecated addKVAnnotation(byte[], byte[]) > method > * [HTRACE-209] - Make span ID 128 bit to avoid collisions > * [HTRACE-210] - Remove TrueIfTracingSampler > * [HTRACE-211] - Move htrace-core classes to the > org.apache.htrace.core namespace > * [HTRACE-212] - Change version to 4.0 > * [HTRACE-214] - De-globalize Tracer.java > * [HTRACE-215] - Simplify the Sampler type > * [HTRACE-216] - SpanReceivers should not fill in ProcessId > * [HTRACE-217] - Rename ProcessId to TracerId > * [HTRACE-222] - Add SpanReceiverPool > * [HTRACE-228] - Fix subprojects to refer to new > org.apache.htrace.core namespace > * [HTRACE-229] - htrace-webapp needs to be updated to refer to > "tracerid" not "processid" > ** Bug > * [HTRACE-159] - libhtrace.so: use HRPC endpoint of htraced > * [HTRACE-164] - htrace hrpc: use msgpack for serialization > * [HTRACE-166] - Add tabbed view > * [HTRACE-167] - Update go build instructions in BUILDING.txt > * [HTRACE-171] - htraced godeps should use > github.com/ugorji/go/codec rather than github.com/ugorji/go > * [HTRACE-174] - Refactor GUI > * [HTRACE-177] - htrace-zipkin: shade all dependencies > * [HTRACE-182] - htraced: add rpm build via -Prpm > * [HTRACE-189] - gui: fix error handling in a few places > * [HTRACE-190] - htraced: allow querying by process ID > * [HTRACE-191] - gui: add "duration" to span details, filter out > "selected" > * [HTRACE-192] - gui: when expanding parents or children, sort the > spans by begin time > * [HTRACE-193] - gui: avoid doing multiple redraws when > spanResults is updated > * [HTRACE-196] - gui: add scrolling for spans view > * [HTRACE-201] - htrace-web: URL-Encode query JSON > * [HTRACE-202] - htrace-web: fix "converting circular object to > JSON" error when pressing "clear" button > * [HTRACE-218] - Fix issues with finding json-c includes and librt > in the native library > * [HTRACE-219] - Add -Dleveldb.prefix and -Djsonc.prefix build options > * [HTRACE-220] - htraced: should be able to set log.path to the > empty string via "-Dlog.path=" on the command line > * [HTRACE-223] - gobuild.sh: fix issue where maven succeeds if go > build fails > * [HTRACE-224] - htrace C client: htrace_conf_get_u64, > htrace_conf_get_double can't handle spaces at the end of strings > * [HTRACE-230] - Make TracerBuilder like all other Builders; an > internal rather than adjacent class > * [HTRACE-233] - htrace-zipkin should explicitly include slf4j-api > to avoid ClassNotFoundException > * [HTRACE-234] - Add workaround to prevent htrace-hbase from > getting in an infinite loop while creating the dependency-reduced pom > ** Improvement > * [HTRACE-29] - add javascript web UI for htraced > * [HTRACE-160] - htraced: support continuing a query from where > the client left it off by sending a previous span > * [HTRACE-162] - htraced hrpc: some logging improvements > * [HTRACE-170] - Optimize use of Random in htrace-core by using > ThreadLocalRandom > * [HTRACE-172] - Move minJdk to 1.7 (JDK 7) > * [HTRACE-175] - Add Trace#addKVAnnotation convenience method > * [HTRACE-176] - Expose ZipkinSpanReceiver c
Re: Dependency Request
Lewis has been looking into creating a Docker image with all our build dependencies. Then we can just use Docker to build and run our unit tests inside that container. I think this will be a better solution than installing things on the build machines since with Docker we don't have to keep bugging INFRA about installing new dependencies. It also makes it easy to move the build from one machine to another, and control the versions of our dependencies. Finally, we can just point new contributors at the Docker image and get them compiling things in seconds rather than hunting around for dependencies on their local system. There's more discussion on HTRACE-241 and HTRACE-157. best, Colin On Wed, Sep 9, 2015 at 10:09 AM, Andrew Bayerwrote: > I think BUILDS is still there? > > On Wed, Sep 9, 2015 at 9:46 AM, Stack wrote: > >> I opened INFRA-10401 >> >> (Andrew, is the special 'BUILDS' project gone now and we should just file >> against INFRA for all build issues going forward?) >> >> Thanks, >> St.Ack >> >> On Tue, Sep 8, 2015 at 7:41 PM, Andrew Bayer >> wrote: >> >> > Please open a JIRA? >> > On Sep 8, 2015 18:39, "lewis john mcgibbney" wrote: >> > >> > > Hi builds@, >> > > The Apache HTrace (incubating) project recently established a build [0] >> > for >> > > our codebase on b.a.o. >> > > In order to build and test we require the leveldb-devel package to be >> > > installed and libleveldb.so to be available on the PATH. >> > > Is it possible for someone to install this one one or more of the build >> > > slaves? >> > > Thanks in advance. >> > > Lewis >> > > >> > > [0] https://builds.apache.org/view/All/job/HTrace-Master >> > > >> > > >> > > -- >> > > >> > > ` : >> > > : , : >> > > #+`. ,,`, >> > > ` ;##` .`,. ;;':;` >> > > `` ##@.;.;: ,;+;;;';;';;';'` >> > > ```,###: .,;; +;;'';;+;;;';;` >> > > ```#+##'``;+ '';;;'';;';;;';;;` >> > > ```,##+#@:: ''';';;';+;;';;':::+: >> > >```.#'';';+;;';';';;';;';;':,;: >> > > '#+#+#';';''';;';';;';;';':: >> > > ;;:';,##''';'';;';';;'';;;'::';;;':.``` >> > > `.,`;;;++';'';;';'';;';;;';;'::';;:;';;;:: >> > > :`,.,.`:';+#+;;''';'';';';;';;';;';;;'::;';:;. >> > >.`..;,:`';;';';;;'+#+';;''+';;';:'';;';';;;':::;,:` >> > > ` ,`:. >> > ;;;';';;;++#+'';''';''+;;';;';::';';;:.. >> > > ` `` >> > > ;;;';';';';;'+###+';';'';;';;';;';;';;;';;',:. >> > > ` ` >> > > `;:;;';';';;;'+;';';;';;';;';;';;'';;';';::; >> > > >> > > >> `.;,:::;::;';';;'#++''';;';;;'';+';:::''::;;..: >> > > >> > > >> > >> ```:,'::,;';';;;';;;''##+++'';;';;';;;''';;':,,,:.:,.` >> > > >> > > >> > >> ```..::,;';:;';';';;;';';';';'''++###+'+;';;;';;;';;:;.:..:.., >> > > >> > > >> > >> ,;;:;:;';''';''++##+++.:..:.,; >> > > ` >> > > >> > > >> `.``,,:,';;::;;::';';;;';';;';';;';';;';;';';';'++#+###@#++:...,,.;:. >> > > >> > > `:.';.,;;',,;;;';';;';;':;;;';';;';;';';';;';;;''.:,:.,:'#@'::, >> > > >> > > ```.:,';;.::':';';',;;;';;':;';;';;';;;';;';'';;.;.,.:..,:.:: >> > > >> > > ``:::',:;';;,:;;',:';';;':';';;;';;'::';;;,..,.,.,:+` >> > > >> > > `..:'+:';;',;';,:;:';;;,,';::,';;',,';;.:.:;, >> > > >> > > ``,.';;:':,;:;,,:;:::``..,:,`` >> > > >> > > :`;;` >> > > >> > > ``: ,:` >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > http://people.apache.org/~lewismc || @hectorMcSpector || >> > > http://www.linkedin.com/in/lmcgibbney >> > > >> > > Apache Gora V.P || Apache Nutch PMC || Apache Any23 V.P || >> > > Apache OODT PMC >> > >Apache Open Climate Workbench PMC || Apache Tika PMC || >> Apache >> > > TAC >> > > Apache Usergrid || Apache HTrace (incubating) || Apache CommonsRDF >> > > (incubating) >> > > >> > >>
Re: [VOTE] HTrace 4.0 Release Candidate 0
On Mon, Sep 7, 2015 at 8:47 AM, Lewis John Mcgibbneywrote: > Hi Colin, > Nice work in getting the release candidate prepared and available for > review. > Where is the KEYS file? > I tried the one here > Good call. I'll append my key to the KEYS file. My public key is here: http://pgp.mit.edu/pks/lookup?op=get=0xDE78987A9CD4D9D3 > It does not contain your sig so I cannot verify the release sigs. > Moving on, > > Notes Binaries Archives Standards Apache Generated Unknown 0 0 0 116 92 0 15 > Unapproved licenses are as follows > > bootstrap-theme.css > bootstrap-theme.min.css > bootstrap.css > bootstrap.min.css > backbone-1.1.2.js > bootstrap.js > bootstrap.min.js > d3.min.js > jquery-2.1.4.js > moment-2.10.3.js > npm.js > underscore-1.7.0.js > SpanProtos.java > dependency-reduced-pom.xml > > None of the above libraries are declared within NOTICE.txt bootstrap-theme.css, bootstrap-theme.min.css, bootstrap.css, bootstrap.min.css, bootstrap.js, bootstrap.min.js, and bootstrap-3.3.1/js/npm.js are described in LICENSE.txt as: > Bootstrap, an html, css, and javascript framework, is > Copyright (c) 2011-2015 Twitter, Inc and MIT licensed: > https://github.com/twbs/bootstrap/blob/master/LICENSE backbone-1.1.2.js is described in LICENSE.txt as: > backbone, is a javascript library, that is Copyright (c) 2010-2014 > Jeremy Ashkenas, DocumentCloud. It is MIT licensed: > https://github.com/jashkenas/backbone/blob/master/LICENSE d3.min.js is described in LICENSE.txt as: > D3, a javascript library for manipulating data, used by htrace-hbase > is Copyright 2010-2014, Michael Bostock and BSD licensed: > https://github.com/mbostock/d3/blob/master/LICENSE jquery-2.1.4.js is described in LICENSE.txt as: > jquery, a javascript library, is Copyright jQuery Foundation and other > contributors, https://jquery.org/. The software consists of > voluntary contributions made by many individuals. For exact > contribution history, see the revision history > available at https://github.com/jquery/jquery > It is MIT licensed: > https://github.com/jquery/jquery/blob/master/LICENSE.txt moment-2.10.3.js is described in LICENSE.txt as: > moment.js is a front end time conversion project. > It is (c) 2011-2014 Tim Wood, Iskren Chernev, Moment.js contributors > and shared under the MIT license: > https://github.com/moment/moment/blob/develop/LICENSE underscore-1.7.0.js is described in LICENSE.txt as: > underscore, a javascript library of functional programming helpers, is > (c) 2009-2014 Jeremy Ashkenas, DocumentCloud and Investigative Reporters > & Editors and an MIT license: > https://github.com/jashkenas/underscore/blob/master/LICENSE According to http://www.apache.org/dev/licensing-howto.html: > Bundling a dependency which is issued under one of the following > licenses is straightforward, assuming that said license applies > uniformly to all files within the dependency: > 1. BSD (without advertising clause) > 2. MIT/X11 > In LICENSE, add a pointer to the dependency's license within the source tree > and a short note summarizing its licensing... > Under normal circumstances, there is no need to modify NOTICE. Since these are all BSD or MIT licensed, I would interpret this to mean there is no need for us to modify NOTICE. Does that make sense? SpanProtos.java and dependency-reduced-pom.xml are generated files. According to http://incubator.apache.org/guides/releasemanagement.html#notes-license-headers : > The issue of licenses on generated documentation is a little controversial. > Copyright may not subsist in a document which is generated by an > transformation > from an original. In which case, the license header may be unnecessary. > License > headers should always be present in the original. Where it is reasonable to > do so, > the templates should also add the license header to the generated documents. I looked at how Apache Hadoop is handling this, and they do not have license headers on their protobuf-generated files. So I think this is fine. (From a technical point of view, I think the PB compiler also provides no way to do this, as far as I know.) The situation for dependency-reduced-pom.xml is the same-- it is a file generated by Maven. This is similar to jar files, which are also generated by the build process, and do not contain a license header. Also, I believe all these library and generated files were in the last release we did (although they moved around a bit). > > I navigate to the decompressed RC directory and try mvn clean install... it > enters into a loop! As Masatake commented, this is HTRACE-236. There is a note about it in the README.md. For a workaround, you can use the solution described in HTRACE-234 to get the build to run on your version of Maven (which I assume is 3.3)... Thanks for trying out the release-- I will take a look at updating KEYS tomorrow. So far I haven't seen anything that would need a respin (please let me know if I missed
Re: HTrace on Jira
Hi Lewis, At this point, nothing else is going into 4.0 unless it's a blocker for the release. The master branch version is now at 4.1. Pretty much all open JIRAs should be targeted at 4.1. You can see the JIRAs fixed in 4.0 here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315924=12333022 I'm not sure what you mean when you say that "most issues are not assigned to any release at all." Pretty much every issue has a "fix version" which is the release it was fixed in, and an "affects version", which is the first release it appeared in. Do you see any issues without those fields set? If so, let's fill them in. One thing that frustrates me is that there is no "Target version:" field in our JIRA, like there is on Hadoop's JIRA. I'm not sure what we have to configure to get a field like that. best, Colin On Tue, Sep 8, 2015 at 4:46 PM, Lewis John Mcgibbneywrote: > Hi Folks, > What is happening with versioning on Jira? > Master branch runs off of 4.0.0-incubating-SNAPSHOT, the only version which > exists within Jira is 4.1. > Most issues are not assigned to any release at all. > Can someone sort it out and we could potentially set a roadmap for > releasing HTrace 4.0/4.0.0-incubating? > Thanks folks > lewis > > -- > *Lewis*
Re: HTrace on Jira
Good idea... I created some components: build: The Maven build system, pom.xml, Makefile, CMake, Docker, etc. flume: The htrace-flume span receiver zipkin: The htrace-zipkin span receiver hbase: The htrace-hbase span receiver core: The htrace-core subproject which contains the main htrace library used by applications to initiate tracing htraced: The htraced daemon and span receiver ui: htrace-web graphical user interface I think it probably makes sense to combine docs with the website since we generate our website from the docs via Maven. Feel free to add more stuff, of course. cheers, Colin On Tue, Sep 8, 2015 at 6:40 PM, Lewis John Mcgibbneywrote: > Hi Folks, > Further to this, is it possible for someone to go through the components > and create components for each module as well as one for website? > Thanks > Lewis > > On Tue, Sep 8, 2015 at 4:46 PM, Lewis John Mcgibbney < > lewis.mcgibb...@gmail.com> wrote: > >> Hi Folks, >> What is happening with versioning on Jira? >> Master branch runs off of 4.0.0-incubating-SNAPSHOT, the only version >> which exists within Jira is 4.1. >> Most issues are not assigned to any release at all. >> Can someone sort it out and we could potentially set a roadmap for >> releasing HTrace 4.0/4.0.0-incubating? >> Thanks folks >> lewis >> >> -- >> *Lewis* >> > > > > -- > *Lewis*
Re: [VOTE] HTrace 4.0 Release Candidate 0
Thanks for looking at this, Masatake and Lewis! Lewis, is my explanation of NOTICE.txt correct, or did I miss something? Remember to vote... the vote will be open until tomorrow. Here is my +1. ;) cheers, Colin On Tue, Sep 8, 2015 at 6:32 PM, Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> wrote: > Thanks for putting this up, Colin. > > - verified mds and signature > - ran "mvn install" without test failure > - ran "mvn package -Pnative" without test failure in htrace-c > - built src tarball by running "mvn clean install -DskipTests > assembly:single -Pdist" > - launched htraced and sent test tracing spans by HTracedRESTReceiver and > checked the spans by Web-UI > > I'm +1 if the issue about NOTICE.txt Lewis pointed out is not critical for > release. > > Masatake Iwasaki > > > On 9/5/15 08:02, Colin P. McCabe wrote: >> >> I've posted the first release candidate here: >> >> http://people.apache.org/~cmccabe/htrace/releases/4.0.0/rc0 >> >> The jars have been staged here: >> >> https://repository.apache.org/content/repositories/orgapachehtrace-1017 >> >> There's a lot of great stuff in this release, including a new web UI, >> many bug fixes, API improvements and enlargement of span IDs to 128 >> bits to avoid conflicts. >> >> The vote will run for 5 days. >> >> cheers, >> Colin >> >> Release Notes - HTrace - Version 4.0 >> >> ** Sub-task >> * [HTRACE-208] - Remove deprecated addKVAnnotation(byte[], byte[]) >> method >> * [HTRACE-209] - Make span ID 128 bit to avoid collisions >> * [HTRACE-210] - Remove TrueIfTracingSampler >> * [HTRACE-211] - Move htrace-core classes to the >> org.apache.htrace.core namespace >> * [HTRACE-212] - Change version to 4.0 >> * [HTRACE-214] - De-globalize Tracer.java >> * [HTRACE-215] - Simplify the Sampler type >> * [HTRACE-216] - SpanReceivers should not fill in ProcessId >> * [HTRACE-217] - Rename ProcessId to TracerId >> * [HTRACE-222] - Add SpanReceiverPool >> * [HTRACE-228] - Fix subprojects to refer to new >> org.apache.htrace.core namespace >> * [HTRACE-229] - htrace-webapp needs to be updated to refer to >> "tracerid" not "processid" >> ** Bug >> * [HTRACE-159] - libhtrace.so: use HRPC endpoint of htraced >> * [HTRACE-164] - htrace hrpc: use msgpack for serialization >> * [HTRACE-166] - Add tabbed view >> * [HTRACE-167] - Update go build instructions in BUILDING.txt >> * [HTRACE-171] - htraced godeps should use >> github.com/ugorji/go/codec rather than github.com/ugorji/go >> * [HTRACE-174] - Refactor GUI >> * [HTRACE-177] - htrace-zipkin: shade all dependencies >> * [HTRACE-182] - htraced: add rpm build via -Prpm >> * [HTRACE-189] - gui: fix error handling in a few places >> * [HTRACE-190] - htraced: allow querying by process ID >> * [HTRACE-191] - gui: add "duration" to span details, filter out >> "selected" >> * [HTRACE-192] - gui: when expanding parents or children, sort the >> spans by begin time >> * [HTRACE-193] - gui: avoid doing multiple redraws when >> spanResults is updated >> * [HTRACE-196] - gui: add scrolling for spans view >> * [HTRACE-201] - htrace-web: URL-Encode query JSON >> * [HTRACE-202] - htrace-web: fix "converting circular object to >> JSON" error when pressing "clear" button >> * [HTRACE-218] - Fix issues with finding json-c includes and librt >> in the native library >> * [HTRACE-219] - Add -Dleveldb.prefix and -Djsonc.prefix build >> options >> * [HTRACE-220] - htraced: should be able to set log.path to the >> empty string via "-Dlog.path=" on the command line >> * [HTRACE-223] - gobuild.sh: fix issue where maven succeeds if go >> build fails >> * [HTRACE-224] - htrace C client: htrace_conf_get_u64, >> htrace_conf_get_double can't handle spaces at the end of strings >> * [HTRACE-230] - Make TracerBuilder like all other Builders; an >> internal rather than adjacent class >> * [HTRACE-233] - htrace-zipkin should explicitly include slf4j-api >> to avoid ClassNotFoundException >> * [HTRACE-234] - Add workaround to prevent htrace-hbase from >> getting in an infinite loop while creating the dependency-reduced pom >> ** Improvement >> * [HTRACE-29] - add javascript web UI for htraced >> * [HTRACE-160] - htraced: support continuing a query fr
[VOTE] HTrace 4.0 Release Candidate 0
I've posted the first release candidate here: http://people.apache.org/~cmccabe/htrace/releases/4.0.0/rc0 The jars have been staged here: https://repository.apache.org/content/repositories/orgapachehtrace-1017 There's a lot of great stuff in this release, including a new web UI, many bug fixes, API improvements and enlargement of span IDs to 128 bits to avoid conflicts. The vote will run for 5 days. cheers, Colin Release Notes - HTrace - Version 4.0 ** Sub-task * [HTRACE-208] - Remove deprecated addKVAnnotation(byte[], byte[]) method * [HTRACE-209] - Make span ID 128 bit to avoid collisions * [HTRACE-210] - Remove TrueIfTracingSampler * [HTRACE-211] - Move htrace-core classes to the org.apache.htrace.core namespace * [HTRACE-212] - Change version to 4.0 * [HTRACE-214] - De-globalize Tracer.java * [HTRACE-215] - Simplify the Sampler type * [HTRACE-216] - SpanReceivers should not fill in ProcessId * [HTRACE-217] - Rename ProcessId to TracerId * [HTRACE-222] - Add SpanReceiverPool * [HTRACE-228] - Fix subprojects to refer to new org.apache.htrace.core namespace * [HTRACE-229] - htrace-webapp needs to be updated to refer to "tracerid" not "processid" ** Bug * [HTRACE-159] - libhtrace.so: use HRPC endpoint of htraced * [HTRACE-164] - htrace hrpc: use msgpack for serialization * [HTRACE-166] - Add tabbed view * [HTRACE-167] - Update go build instructions in BUILDING.txt * [HTRACE-171] - htraced godeps should use github.com/ugorji/go/codec rather than github.com/ugorji/go * [HTRACE-174] - Refactor GUI * [HTRACE-177] - htrace-zipkin: shade all dependencies * [HTRACE-182] - htraced: add rpm build via -Prpm * [HTRACE-189] - gui: fix error handling in a few places * [HTRACE-190] - htraced: allow querying by process ID * [HTRACE-191] - gui: add "duration" to span details, filter out "selected" * [HTRACE-192] - gui: when expanding parents or children, sort the spans by begin time * [HTRACE-193] - gui: avoid doing multiple redraws when spanResults is updated * [HTRACE-196] - gui: add scrolling for spans view * [HTRACE-201] - htrace-web: URL-Encode query JSON * [HTRACE-202] - htrace-web: fix "converting circular object to JSON" error when pressing "clear" button * [HTRACE-218] - Fix issues with finding json-c includes and librt in the native library * [HTRACE-219] - Add -Dleveldb.prefix and -Djsonc.prefix build options * [HTRACE-220] - htraced: should be able to set log.path to the empty string via "-Dlog.path=" on the command line * [HTRACE-223] - gobuild.sh: fix issue where maven succeeds if go build fails * [HTRACE-224] - htrace C client: htrace_conf_get_u64, htrace_conf_get_double can't handle spaces at the end of strings * [HTRACE-230] - Make TracerBuilder like all other Builders; an internal rather than adjacent class * [HTRACE-233] - htrace-zipkin should explicitly include slf4j-api to avoid ClassNotFoundException * [HTRACE-234] - Add workaround to prevent htrace-hbase from getting in an infinite loop while creating the dependency-reduced pom ** Improvement * [HTRACE-29] - add javascript web UI for htraced * [HTRACE-160] - htraced: support continuing a query from where the client left it off by sending a previous span * [HTRACE-162] - htraced hrpc: some logging improvements * [HTRACE-170] - Optimize use of Random in htrace-core by using ThreadLocalRandom * [HTRACE-172] - Move minJdk to 1.7 (JDK 7) * [HTRACE-175] - Add Trace#addKVAnnotation convenience method * [HTRACE-176] - Expose ZipkinSpanReceiver configuration keys externally * [HTRACE-180] - Move the GUI to a top-level subproject * [HTRACE-184] - Expose PROCESS_ID_KEY configuration key * [HTRACE-186] - gui: support finding the parents and children of spans, add owl * [HTRACE-194] - gui: support multiple selections, zooming to fit a group of spans, deleting a group of spans * [HTRACE-197] - htraced build: set RUNPATH if possible * [HTRACE-199] - gui: Double clicking on spans should bring up span details * [HTRACE-203] - htrace-web: pressing enter should dismiss the modal dialog box * [HTRACE-204] - htrace-web: add draggable bar which allows more or less visual space for process name in search view * [HTRACE-205] - htrace-web: Width of SearchResultsView should be uppdated along with resizing of browser window * [HTRACE-206] - htrace-web: when the canvas has focus, the delete key should clear, z key should zoom * [HTRACE-221] - htraced: search /etc/htraced/conf for the htraced configuration by default * [HTRACE-227] - Remove dependency to non-public API of hadoop-common from htrace-hbase ** New Feature * [HTRACE-143] - htraced search GUI enhancements ** Task * [HTRACE-183] - htraced: move src/go directory to go ** Test * [HTRACE-213] - Add test for ZipkinSpanReceiver
Preparing for HTrace 4.0
Hi guys, I think it's time for a new release of HTrace. We've resolved 58 JIRAs since our last release! I can RM. cheers, Colin
Re: Build instructions for Htrace
On Wed, Sep 2, 2015 at 9:32 AM, Jake Farrellwrote: > mvnw looks to be gradle training wheels, not sure that its needed. perhaps > a dockerfile would help here and also give us a build env we could run > tests within jenkins/travis with, just a thought > > +1. Creating a docker build environment for Jenkins has been on our to-do list for a while. It would be a great contribution for someone. It seems like it would also help anyone who wanted to build the project as well. > On Wed, Sep 2, 2015 at 12:20 PM, Adrian Cole > wrote: > > > What do you think about moving to maven wrapper (ex ./mvnw install as > > opposed to maven install). This sorts out the maven version issues by > > baking them into the wrapper. > > > > I can raise a patch if you think this would help. > > > Thanks for the offer, but I'm not sure this is a good direction for us to go in. If we start putting shell scripts and other wrappers around Maven, things get even more complex. I've been down the autotools / virtualenv road, and it's not pretty. I have a suspicion that we can make later versions of maven work if we fiddle with our pom.xml dependency lists a bit. best, Colin
Re: Introduction
Hi Arun, Thanks for trying out HTrace! If you want, just file a JIRA for the javadoc issue and we can fix that up. It sounds like it might be something that only shows up on jdk8. Since you mentioned C and C++, you might also be interested in our C client. It's somewhat alpha right now but we hope more folks will try it out soon. Cheers, Colin On Aug 30, 2015 3:21 AM, Arun Khetarpal akhet...@gmail.com wrote: Super! Sounds like a plan. The first thing i tried to do was actually get the source code and tried compiling. -- I wasn't able to find any section on build tools mentioned anywhere. -- On my box - mac + maven3 + java1.8, the code did not compile because of javadoc comments. *Example error: * [ERROR] apache/htrace/incubator-htrace/htrace-core/src/main/java/org/apache/htrace/core/Tracer.java:264: warning: no @return I was able to fix them by disabling doc lint checks, but we can fix them as well. Any thoughts? Regards, Arun On 30 August 2015 at 00:16, Stack st...@duboce.net wrote: Welcome Arun. Glad of the offer of help; especially from someone with experience tracing already (Brave is cool). If starting out, best thing you could do (IMO) would be to try and use the new 4.0 APIs to tell a trace story in an application that you are familiar with. File issues against our doc where it is unclear and ditto on difficulty using new APIs and getting a trace rig running (We've not pushed the website in a while so look at doc in the repo for the moment -- it was updated recently). Thank you, St.Ack On Sat, Aug 29, 2015 at 9:32 AM, Arun Khetarpal akhet...@gmail.com wrote: Hi Team, My name is Arun. I found this project to be very appealing and would like to contribute to it. I have some experience with Java and with C and C++. I had faced a similar challenge of tracing in my organisation where I ended up integrating it with Brave (https://github.com/openzipkin/brave). I was wondering on how to participate in this project. Any pointers would be highly appreciated. Thanks, Arun
Enabling the release-notes field on JIRA?
Sorry if this is a dumb question, but does anyone know how to enable the release-notes field on JIRA? It seems like we don't have it on our issues. If you hit edit on a Hadoop JIRA issue, you see multiple fields-- including a text box for release-notes. It seems to be missing from the HTRACE JIRA. Anyone know how to enable this? I took a glance at the admin console but I don't see anything there. best, Colin
Re: Tracing a chain of iterators -- htrace 2.04
Hi Andrew, Thanks for posting! Is the concern just that there would be too many spans if each call to next() created a span? Does sampling help address this concern? You said, what we'd really want, is to aggregate the time spent in each call to next for each iterator, and then send the spans at the end. But HTrace already does this, right? Most span receivers will batch up the spans they receive and send them all in one big batch, probably at the end. What am I missing? cheers, Colin On Tue, Aug 18, 2015 at 3:03 PM, Andrew Mains andrew.ma...@kontagent.com wrote: Hi all, This is really more of a user question than a dev question, but I'm posting here since I was unable to find a user list for the project; hope that's alright. I was hoping to get some input on the best way to trace execution through a chain of iterators. Specifically, we have a database-like application which pipes data through multiple iterators, performing some transformation at each step. We'd like insight into how long each step is taking in total for a particular request. That is, for a chain of iterators iter_1... iter_i, we want the total time spent in each iter_i for that request. The naive implementation would be to start a span in each call to next, but that's far too fine grained, given that we'd be starting a new span for each row. What we'd really want, is to aggregate the time spent in each call to next for each iterator, and then send the spans at the end. This would require implementing a new span subclass, which is a bit tricky to integrate at the moment (since it prevents us from using the static helpers in Trace). Any thoughts on the best way to approach this issue? Is there something I'm missing, or some way that we can reframe the problem such that it makes sense with what's currently in htrace? Let me know if there's anything that's unclear, or any further info I can provide about our use case. Thanks for the help! Andrew
Re: HTRACE-215 Simplify the Sampler type - discussion
Hi Daniel, The problem with the T in SamplerT is that it's application-specific. The code for each application needs to be modified specifically to make use of a different T. Ideally, Samplers should be pluggable, so that you can use any sampler with any HTraced code. For example, I might run a test application with sampling set to always but in production, I would run with a probability sampler with some specific sampling rate. But you can't do that when your sampler depends on being passed some application-specific data. You're stuck with only samplers that can work with that specific T. Consider a specific example: tracing Hadoop. I'd like to be able to turn on tracing in Hadoop just by changing a config key. But if I'm using a SamplerT with a non-trivial T, I can't do that. I have to tell the customer, first apply this patch to your Hadoop code to add the Ts, do a full build, and then put it into live production... The customer won't even follow me to step #1, let alone deploying the patched code in production. It totally wrecks the usefulness of HTrace if you need to rebuild your code to use it. Another thing to think about is that we'd like to reduce the boilerplate code needed to add HTrace to an application. Ideally the system would create the samplers you need from your HTraceConfiguration, rather than requiring the application to create and manage them manually. Of course, applications should be able to programmatically add and remove Samplers as well, but only if they have a specific need to do that. I think that tracing different events with different probabilities is a nice feature. There is a way to do that through the new API that I think is cleaner. You would create multiple Tracer objects (Tracer will no longer be a singleton). Each tracer would be configured with ProbabilitySampler, but they would have a different sampling rate set. For the Foo code, you would call fooTracer.newTopLevelSpan(...), for the Bar code, you would call barTracer.newTopLevelSpan(...), and so forth. In the new API, spans are always created from a specific Tracer and use the Samplers associated with that Tracer. This is similar to having different Log objects in log4j. Perhaps you think the Foo system is not that interesting most of the time, so its log level defaults to WARN. But if you think you're having a problem in the Foo system, you can set its log level to TRACE and then you see all the log messages that the Foo system has. Same thing here, except that instead of Log objects, we have Tracer objects. Instead of log messages, we have trace spans. But we still have a lot of flexibility at runtime as a result of this. And we don't need to recompile to trace. regards, Colin On Mon, Jul 27, 2015 at 11:33 AM, Daniel Lee dan...@slice.com wrote: RE: https://issues.apache.org/jira/browse/HTRACE-215 I was previously making use of this feature. I was using it to trace different types of inputs with different probabilities. It looks like now I'll either have move all tracing logic completely outside of htrace related classes and only use Always and Never sampler which seems weird? Why even bother with providing ProbabilitySampler when (rand.nextDouble() X ? AlwaysSampler.INSTANCE : NeverSampler.INSTANCE) is available. Daniel
Re: HTRACE-215 Simplify the Sampler type - discussion
On Mon, Jul 27, 2015 at 2:32 PM, Daniel Lee dan...@slice.com wrote: Hi Colin, I'm not sure how Hadoop tracing is setup but I also enable tracing via a config setting. I'm not sure I agree creating multiple new Tracer objects each with their own Probability samplers is an acceptable solution from a usability standpoint. Consider an application that receives messages from clients and wants to trace different message and client types with different probabilities. Now, for every tuple of (message, client) type there has to be a new Tracer and Sampler created so this gets ugly quickly. It also sounds like having multiple tracers could get confusing quickly under this scenario. I'm just going to wrap everything in a custom class that includes the logic I used to have in the Sampler. Hi Daniel, Tracer objects are not that big, and you can create a lot of them if you like. Again, it's very similar to log4j Log objects. If you want to have one for each message type, or even two for each message type, it's not more than a kilobyte or two of memory. Another way to pass information to Sampler objects is by using thread-local data. You can just set the thread-local data before calling Tracer#newSpan... if your custom sampler is called, it will access this thread local data, which can be anything you want. Maybe at some point we can have a span importance field which could be supplied by the programmer to the Tracer#newSpan function and then used by Samplers. That would allow anyone to write a sampler which took advantage of this, rather than coupling it tightly to one sampler implementation. But I'd rather put that off until after 4.0, since we have so many other things to do. Out of curiosity, what projects are you using HTrace for? Is it something you can share? best, Colin Thanks, Daniel On Mon, Jul 27, 2015 at 12:00 PM, Colin P. McCabe cmcc...@apache.org wrote: Hi Daniel, The problem with the T in SamplerT is that it's application-specific. The code for each application needs to be modified specifically to make use of a different T. Ideally, Samplers should be pluggable, so that you can use any sampler with any HTraced code. For example, I might run a test application with sampling set to always but in production, I would run with a probability sampler with some specific sampling rate. But you can't do that when your sampler depends on being passed some application-specific data. You're stuck with only samplers that can work with that specific T. Consider a specific example: tracing Hadoop. I'd like to be able to turn on tracing in Hadoop just by changing a config key. But if I'm using a SamplerT with a non-trivial T, I can't do that. I have to tell the customer, first apply this patch to your Hadoop code to add the Ts, do a full build, and then put it into live production... The customer won't even follow me to step #1, let alone deploying the patched code in production. It totally wrecks the usefulness of HTrace if you need to rebuild your code to use it. Another thing to think about is that we'd like to reduce the boilerplate code needed to add HTrace to an application. Ideally the system would create the samplers you need from your HTraceConfiguration, rather than requiring the application to create and manage them manually. Of course, applications should be able to programmatically add and remove Samplers as well, but only if they have a specific need to do that. I think that tracing different events with different probabilities is a nice feature. There is a way to do that through the new API that I think is cleaner. You would create multiple Tracer objects (Tracer will no longer be a singleton). Each tracer would be configured with ProbabilitySampler, but they would have a different sampling rate set. For the Foo code, you would call fooTracer.newTopLevelSpan(...), for the Bar code, you would call barTracer.newTopLevelSpan(...), and so forth. In the new API, spans are always created from a specific Tracer and use the Samplers associated with that Tracer. This is similar to having different Log objects in log4j. Perhaps you think the Foo system is not that interesting most of the time, so its log level defaults to WARN. But if you think you're having a problem in the Foo system, you can set its log level to TRACE and then you see all the log messages that the Foo system has. Same thing here, except that instead of Log objects, we have Tracer objects. Instead of log messages, we have trace spans. But we still have a lot of flexibility at runtime as a result of this. And we don't need to recompile to trace. regards, Colin On Mon, Jul 27, 2015 at 11:33 AM, Daniel Lee dan...@slice.com wrote: RE: https://issues.apache.org/jira/browse/HTRACE-215 I was previously making use of this feature. I was using it to trace different types of inputs with different probabilities. It looks like now I'll either have
Re: Accumulo SpanReceiver and related questions
On Thu, Jun 4, 2015 at 12:04 PM, Young, Philip ptyo...@tycho.ncsc.mil wrote: Hi All, I have only just started using Apache Htrace (so bear with me) in one of my projects and I would like to send all the traces that are created to an Apache Accumulo to store, similar to how the HBaseSpanReceiver currently does it. I have developed an initial version of an AccumuloSpanReceiver and as such, I have a few questions: 1. How do I contribute my new AccumuloSpanReceiver back into the HTrace codebase? Is this by creating a Jira ticket and attaching a patch to it or would you rather a pull request in Github? The Accumulo project is maintaining its own SpanReceiver in the Accumulo code base. Take a look at: https://github.com/apache/accumulo/blob/3a99300a2b897f9f8740a3efa0271c43536ef0cc/core/src/main/java/org/apache/accumulo/core/trace/DistributedTrace.java So if storing spans in Accumulo is a requirement for you, you could check out what they've done and contribute there. You could also check out htraced, which stores spans directly in leveldb. See htrace-htraced. 2. I see that there is currently a couple of different GUI's for the display of traces, eg. one in htrace-hbase and another in htrace-htraced. Is there going to be some consolidation for the GUIs for the visualisation of traces with plugins for connecting to different data stores, eg. HBase, Accumulo, htraced, etc. or is it up to each sub-module to provide their own GUI implementation? Yes, I think some consolidation on GUIs would be nice. One thing we've talked about in the past is having htraced interface with these other data stores. In the interim, do you think that it would be a good approach for me to adapt the HBase GUI to also be able to display traces that are stored in Accumulo? If I was to do this, I think that pulling the webapp out of the htrace-hbase module into something like htrace-webapp would be better along with all the protobuf code into there also? I think the best approach would be to interface htraced with accumulo so that you could re-use all the work we've been doing on the htraced GUI. Similarly with HBase, Cassandra, and other data stores we might like to support. best, Colin If I could get some advice before I get to far into this would be fantastic! Cheers Phil Young
Re: HTrace development environment CI
Hi Vladimir, Welcome. Great to see you're interested, and HTRACE-170 is a nice improvement! I agree that it is a little tough to kick the tires on the project right now, just because you have to build all the various downstream projects against HTrace. And you have the usual build project X against project Y problems. The situation is improving over time as we get more downstream projects (such as Hadoop) now supporting the Apache version of HTrace. You should be able to build Hadoop 2.7 or HBase 1.0.0 against your version of htrace just by doing a mvn install of your htrace, and then modifying the Hadoop pom.xml slightly to build against your new version. We've been talking about setting up Docker images for a while. Maybe I'm getting this wrong, but Vagrant seems kind of similar... except that it uses VM images instead of Linux containers. Vagrant also seems to be focused on setting up installation scripts, similar to how people use Chef or Puppet, whereas I believe Docker mostly leaves that up to you. Does that make any sense, or did I misunderstand? I'm curious if these tools could make it easier for people to test out a Hadoop+HTrace setup, or create a development environment. That would be really cool. We also need some kind of environment to run Jenkins jobs in, and all the people who know about such things tell me it ought to be a Docker image or VM, for better control over our environment. cheers, Colin On Wed, May 13, 2015 at 1:25 PM, Vladimir Sitnikov sitnikov.vladi...@gmail.com wrote: Hi, I somehow noticed HTrace, and I might start using it in the near future. My main areas of interest are performance concurrency. While I lurk around the code I am a bit puzzled on how you create development environment. I saw this thread: http://mail-archives.apache.org/mod_mbox/htrace-dev/201503.mbox/%3CCA+qbEUOMgw1OZP=achvf5y8ha1wy75kz_isprsvmdfhudve...@mail.gmail.com%3E It looks like there is quite a few steps to complete in order to launch HTrace+HDFS (or whatever else is in trend). For Apache Calcite (a framework to translate SQL queries to different storage engines) I created a Vagrant virtual machine that provisions third party tools: https://github.com/vlsi/calcite-test-dataset Although it looks like a maven project, it allows you to provision a test machine with all the stuff installed. From user perspective, you run mvn install and it provisions a VM for you. Do you think it makes sense implementing similar test VM for HTrace integration testing? While I can help with Vagrant stuff, I am not sure where to start from HTrace point of view. I have very brief understanding of connectors/etc. For Calcite we host test VM in another repository to avoid main repository bloat. -- Regards, Vladimir Sitnikov
Re: [VOTE] HTrace 3.2.0 - Release Candidate 2
Thanks, Abe! It's a good thing we skipped from 0 to 2... saved a lot of time. :) C. On Sat, May 9, 2015 at 11:45 PM, Abraham Elmahrek a...@cloudera.com wrote: Heads up folks. I've started the vote in gene...@incubator.apache.org. On Fri, May 8, 2015 at 11:29 AM, Abraham Elmahrek a...@cloudera.com wrote: Closing the vote: - +1s - 6 - 0s - 0 - -1s - 0 Thanks every one for helping out and verifying this release. -Abe On Thu, May 7, 2015 at 5:57 AM, Jake Farrell jfarr...@apache.org wrote: agree, there are no binary files, so the extra artifacts are not a release blocker, would remove for the next release. +1 for this rc from me -Jake On Thu, May 7, 2015 at 1:33 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: I would say no, don't roll a new RC. If there is a way to ensure that generated files have ALv2.0 headers moving forward and committed to trunk then that would be my advice. Good job with RC. +1 from me On Wednesday, May 6, 2015, Abraham Elmahrek a...@cloudera.com wrote: Lewis, These are third party packages and a generated file. The JS dependencies are listed in LICENSE.txt. I don't see licenses for dependency-reduced-pom.xml in general... but I think it might be generated by the maven shading plugin. It looks like some of these generated files made it into the source tarball. Do you guys think it's worth spinning a new RC for this? -Abe On Wed, May 6, 2015 at 6:05 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com javascript:; wrote: Hi Folks, I ran DRAT over the codebase Notes Binaries Archives Standards Apache Generated Unknown 0 0 0 120 83 0 28 28 unknown licenses flagged up Upon further investigation these were Unapproved licenses: /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backbone-1.1.2.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backbone.marionette-2.4.1.min.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backbone.paginator-2.0.2.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backgrid-0.3.5.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/backgrid-paginator-0.3.5.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/bootstrap.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/bootstrap.min.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/d3-3.5.5.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/d3.min.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/jquery-2.1.3.min.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/moment-2.9.0.min.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/npm.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.min.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.standalone.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/rome.standalone.min.js /usr/local/drat/deploy/data/jobs/rat/1430960456935/input/underscore-1.7.0.js /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/backgrid-0.3.5.min.css /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/backgrid-paginator-0.3.5.min.css /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap-theme.css /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap-theme.min.css /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap.css /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/bootstrap.min.css /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/rome.css /usr/local/drat/deploy/data/jobs/rat/1430960457357/input/rome.min.css /usr/local/drat/deploy/data/jobs/rat/1430960456828/input/SpanProtos.java /usr/local/drat/deploy/data/jobs/rat/1430960457162/input/dependency-reduced-pom.xml /usr/local/drat/deploy/data/jobs/rat/1430960457162/input/dependency-reduced-pom.xml_05062015_1800 If we can clarify the above then I am +1 to the release. There's nothing more I can add over and above what has been stated by others. SIGS check Builds and Tests in native check Nice release candidate. Thanks Lewis On Thu, Apr 30, 2015 at 9:16 PM, Abraham Elmahrek a...@apache.org javascript:; wrote: I've the second release candidate here: *http://people.apache.org/~abe/htrace/releases/3.2.0/rc1/ http://people.apache.org/~abe/htrace/releases/3.2.0/rc1/* The jars have been staged here: * https://repository.apache.org/content/repositories/orgapachehtrace-1016
Moving to JDK7?
Hi all, What do y'all think about moving our minJdk version to JDK7? It is set at JDK6 right now mostly because that was where Hadoop was stuck for a long time. But now Hadoop is on JDK7, and so is HBase. Is there any reason to keep supporting JDK6 in our next release? There are some nice things in JDK7 like ThreadLocalRandom, try-with-resources, etc. and I would hate to ugly up the code for no reason. best, Colin
Re: Moving to JDK7?
Thanks, all. Filed https://issues.apache.org/jira/browse/HTRACE-172 for this. Colin On Tue, May 12, 2015 at 12:50 PM, Abraham Elmahrek a...@cloudera.com wrote: +1 as well. Plenty of performance improvements in JDK7. On Tue, May 12, 2015 at 12:23 PM, Jake Farrell jfarr...@apache.org wrote: +1, would be nice to jump to 8 since 7 is now eol, but deprecating 6 support is a good start -Jake On Tue, May 12, 2015 at 3:11 PM, Colin P. McCabe cmcc...@apache.org wrote: Hi all, What do y'all think about moving our minJdk version to JDK7? It is set at JDK6 right now mostly because that was where Hadoop was stuck for a long time. But now Hadoop is on JDK7, and so is HBase. Is there any reason to keep supporting JDK6 in our next release? There are some nice things in JDK7 like ThreadLocalRandom, try-with-resources, etc. and I would hate to ugly up the code for no reason. best, Colin
Re: [DISCUSS] Github integration
When I was first getting started with open source development, the de-facto standard hosting place was SourceForge. It was never perfect, but it provided basic web hosting, source control (in those days, everything was svn or cvs), wiki, bug tracker, etc. SourceForge had a lot of momentum in those days. Somehow, though, over the course of 10 or 15 years, things just went wrong for the site. It started requiring more and more clicks to download anything, and more and more ads started popping up. Now it's pushing malware installers, at least according to this article on gluster's blog. http://blog.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/ Other competitors popped to try to take on the mantle of best open source hosting site. Google Code (now shutting down), BitBucket, now Github. I've used 'em all. I have repos hosted on SourceForge, BitBucket, Github, and Google Code. (Which reminds me, I have to migrate the projects I still have on Google Code soon...) What I found is that putting stuff up on github doesn't guarantee pull requests, or really any kind of community engagement. I think I've gotten a grand total of 3 pull requests for my github repos over the last 5 years. And the repos that I got pull requests for were ones that I gave talks on at conferences, or talked to co-workers about. HTrace's own experience was similar... we had very few contributors back in the github days. In my experience, a quiet fade into obscurity is the fate of most github projects. Getting people interested and contributing to projects requires stepping away from the computer and interacting with them on a personal level, not whiz-bang software. I think the most important thing that github provides projects with is a standard landing page, a standard way to contact the author, and a standard way to send in patches. If you are just perusing someone's private website, it may not be obvious how to contact them or what format to send the patches in. Github alleviates that problem. For HTrace, we already have our own landing page, our own mailing list and bug tracker, and our own conventions about how to send in patches. Github adds a lot less value for us. The idea that projects get popular because of where they host their code or what software they use to do reviews seems really questionable to me. Some of the most popular projects, like the Linux kernel, have really esoteric review systems (i.e. sending in carefully formatted patches to a mailing list). Hadoop is another example... we have a lot of antiquated practices like CHANGES.txt, but still get a lot of contributors. JIRA adds a lot of value for us because it lets us search discussions that go back potentially years. On Hadoop, I often find myself referring to old discussions to see why something was done one way or another. I also use it to see what is in one release or another release (it can search by version). It can do searches across multiple projects (In JQL terms, project = hadoop or project = htrace and ...) JIRA is also a place where people can comment even if they are not developers. If they are users who just want to see something fixed, they can comment on the jira asking what is going on with the issue. Or they can open a JIRA to suggest a feature, link one JIRA to another, etc. Yes, JIRA comments are mirrored to email, but the mirroring is a pretty lossy process. You can't search emails by version, or across projects, or in any structured way. If we can have github comments mirrored to JIRA as you describe, maybe it's not so bad. But it's not so good either. Looking at that JIRA conversation, I find it a bit harder to follow than a standard one. The software seems to be quoting huge amounts of code context, making it a little bit tougher to follow. A human would have only quoted the lines he was interested in. And I wonder whether, if I made a comment on the JIRA, the person would see it, or whether he'd only be following the github. In other words, is the mirroring two-way? I don't know. I have to think about this more. If there is really a huge demand by people outside the project to use github, and we have JIRA integration, then maybe it could work. So far all of the demand seems to be from people who are already contributors. This makes me think that something like Crucible would actually be a better fit, since it fits into our existing workflow rather than grafting a 3rd-party website to it. best, Colin On Sun, Apr 26, 2015 at 12:36 PM, Nick Dimiduk ndimi...@gmail.com wrote: For reference, here's a ticket [0] from Phoenix which makes used of the Github PR integration. As you can see, the PR comments are posted on the JIRA. In this regard, it's actually easier to track patch comments than in RB, by simply looking at the JIRA comments. [0] https://issues.apache.org/jira/browse/PHOENIX-628 On Fri, Apr 24, 2015 at 11:38 AM, Roman Shaposhnik ro...@shaposhnik.org wrote: On Tue,
releasing native build artifacts
Hi all, Based on the responses in the earlier thread, it sounds like it's getting to be about that time again... time for a new HTrace release :) I've been thinking about what we should do with our native build artifacts. With Java, we only ever have to build one version of everything, since the jar files can run on any architecture and supported OS. But we now have a C client (which is only at an alpha level of stability, but still...) and the htraced server, which is also not Java. Do we want to do builds of this stuff for the common OS/arch configurations? If we did a RHEL6 / x86_64 build and an Ubuntu 12.04 / x86_64 build, that would cover most of our users, I think. We could also do 32-bit builds, but I don't think anyone is actually using 32-bit any more in big data land (and if they are, they need to stop :) This also means we would have 3 downloads available in the next release: * Just jar files + source * Jar files + source + RHEL6/x86_64 libhtrace.so + htraced * Jar files + source + Ubuntu 12.04/x86_64 libhtrace.so + htraced thoughts? Colin
Re: [DISCUSS] Github integration
Hi guys, I have an objection. In the past, I've found it frustrating to search through github pull requests. There is no interface (like there is on JIRA) to search using any kind of structured query language, and we don't have the tools to track things by release, contributor, etc. If we start having some of our patch discussions on github, JIRA will become a lot less useful. We might run into a situation like on Spark where people open multiple pull requests for the same thing, not knowing about each other. Or people have a discussion on JIRA, not aware that a parallel discussion is going on on github. I think we should take more time to think this through. 1 hour is not enough time to decide to switch away from JIRA :) best, Colin On Tue, Apr 21, 2015 at 10:21 AM, Jake Farrell jfarr...@apache.org wrote: Once Github picks up the mirror i'll enable the remaining integrations steps so we will start getting notices on our dev@ list and can close our pull requests through commits. Here are some docs I did for Thrift that would be good to adopt or change for how people contribute or commit to HTrace. http://thrift.apache.org/docs/HowToContribute http://thrift.apache.org/docs/committers/HowToCommit If you have any other questions let me know -Jake On Tue, Apr 21, 2015 at 12:57 PM, Nick Dimiduk ndimi...@apache.org wrote: That was fast, thanks Jake :) What else do we need to to to get the fancy PR integration i've seen in other projects? I see there's a specific task type for that on INFRA Jira. Is there a doc for Apacheer-but-not-githubbers on what the workflow looks like? Or is it just read the Github docs on PR's? Thanks, Nick On Tue, Apr 21, 2015 at 9:54 AM, Jake Farrell jfarr...@apache.org wrote: Hey Eliott Great idea, I have setup the git.a.o mirror for us and will enable the Github integrations as soon as Github picks up the repo from git.a.o (usually within 24 hours) -Jake On Tue, Apr 21, 2015 at 12:46 PM, Elliott Clark ecl...@apache.org wrote: That would be great. On Tue, Apr 21, 2015 at 9:34 AM, Nick Dimiduk ndimi...@apache.org wrote: Do we have any kind of github integration setup? I can't even find a mirror of HTrace on the apache account. I think we'll make it easier for folks to contribute if they can send PR's. I'd like to open an INFRA ticket to get us setup with this integration. Are there any objections? Thanks, Nick
Re: [DISCUSS] Github integration
out, its hard to track everything in split systems -Jake On Tue, Apr 21, 2015 at 1:53 PM, Colin P. McCabe cmcc...@apache.org wrote: Hi guys, I have an objection. In the past, I've found it frustrating to search through github pull requests. There is no interface (like there is on JIRA) to search using any kind of structured query language, and we don't have the tools to track things by release, contributor, etc. If we start having some of our patch discussions on github, JIRA will become a lot less useful. We might run into a situation like on Spark where people open multiple pull requests for the same thing, not knowing about each other. Or people have a discussion on JIRA, not aware that a parallel discussion is going on on github. I think we should take more time to think this through. 1 hour is not enough time to decide to switch away from JIRA :) best, Colin On Tue, Apr 21, 2015 at 10:21 AM, Jake Farrell jfarr...@apache.org wrote: Once Github picks up the mirror i'll enable the remaining integrations steps so we will start getting notices on our dev@ list and can close our pull requests through commits. Here are some docs I did for Thrift that would be good to adopt or change for how people contribute or commit to HTrace. http://thrift.apache.org/docs/HowToContribute http://thrift.apache.org/docs/committers/HowToCommit If you have any other questions let me know -Jake On Tue, Apr 21, 2015 at 12:57 PM, Nick Dimiduk ndimi...@apache.org wrote: That was fast, thanks Jake :) What else do we need to to to get the fancy PR integration i've seen in other projects? I see there's a specific task type for that on INFRA Jira. Is there a doc for Apacheer-but-not-githubbers on what the workflow looks like? Or is it just read the Github docs on PR's? Thanks, Nick On Tue, Apr 21, 2015 at 9:54 AM, Jake Farrell jfarr...@apache.org wrote: Hey Eliott Great idea, I have setup the git.a.o mirror for us and will enable the Github integrations as soon as Github picks up the repo from git.a.o (usually within 24 hours) -Jake On Tue, Apr 21, 2015 at 12:46 PM, Elliott Clark ecl...@apache.org wrote: That would be great. On Tue, Apr 21, 2015 at 9:34 AM, Nick Dimiduk ndimi...@apache.org wrote: Do we have any kind of github integration setup? I can't even find a mirror of HTrace on the apache account. I think we'll make it easier for folks to contribute if they can send PR's. I'd like to open an INFRA ticket to get us setup with this integration. Are there any objections? Thanks, Nick
Re: [DISCUSS] Github integration
The argument keeps getting made that we have to be on github to make it easy for outsiders to contribute but I don't see any evidence to back that up. Quite the contrary, during the time HTrace was a github project, the number of contributions and contributors were much smaller than now. Objectively, the JIRA workflow is not difficult to learn. The number of new and recent contributors that Hadoop has is a testament to that. And many other very successful projects use the same model. I would argue that to the average developer, attaching a text file to a JIRA is easier to understand than creating a branch and a pull request in github. It's certainly easier for a first-timer than the upload process of reviewboard or gerrit. I think if we are being honest with ourselves, the only valid reason to switch away from patch attachments on JIRA is the convenience of developers. Elliot has said that he doesn't like having to click on attach patch. Some things that haven't been brought up, but which ought to be, are that reviews in JIRA require some cut-n-paste, and that you need to install a Google Chrome extension to see side-by-side diffs. My opinion is that while these things are kind of annoying, they're really not that bad. Having to explain what the difference is in my latest patch versus the previous one takes much more time and mental effort than clicking on attach patch. There are even scripts out there to automatically attach patches. Copying a few lines to the clipboard to suggest changes during a review isn't bad... in some ways I prefer it to clicking all those expand discussion arrows in other code review tools. Colin On Tue, Apr 21, 2015 at 6:18 PM, Nick Dimiduk ndimi...@apache.org wrote: There's a joke here about N devs in a room and N opinions that are all right (and all wrong)! All I'm asking for here is to make it easy for outsiders to contribute. Having HTrace show up in the mirror is a big step. The next logical thing is folks will click the fork button. We should be ready to receive the incoming help; the details of that implementation are less important to me. Whatever our individual opinions, GH is a defacto place for developers these days -- their tools are extremely well socialized. It's a shame to cut ourselves off from users of that community. I happen to share Colin's opinions about the inferiority of GH's interface for historical comments (I personally like gerrit the best of the tools I've used), but that doesn't mean we should shun it. (I also generally loath JIRA, on par with Elliott's thoughts). I think the Apache infra allows comments on PRs that are tied to a JIRA to land in the comments on the associated JIRA. Is that right Jake? It doesn't prevent the patch from disappearing from github, but at least the trail of discussion is preserved and the single page scroll down consumption is still possible. I think we as a project can make it a policy that a patch must be attached to the JIRA, not just living in a PR (we'll want that for pre-commit build bot support anyway, right?) Use the PR as another means of review, not the source of truth on the the change itself. Would that be enough for you Colin? On the topic of Gerrit, there was a discussion about bringing it about for Apache projects. It's been raised and died and raised a number of times. Gerrit for reviews and push gating + github style build hook detection would be a great setup for me as well. Maybe we should investigate that as a separate thread? -n On Tue, Apr 21, 2015 at 6:07 PM, Elliott Clark ecl...@apache.org wrote: For me pull requests show great history for the issue if things don't get bounced around too many different creators. Github really struggles when there are issues that hang around for a long time, either because they don't have patches yet, or because lots of different people are creating candidate patches. However for me email copies of everything that's from github provide all the search-ability that I would need to just use github. However for me Jira is just so disconnected from the code that it's a total time sink. I want to create code, look at code, and have my code tested. Every time I have to create a patch and attach it it's a total context switch (better than RB but that's not saying much). The integration of jira and jenkins just feels like duct-tape and hope when compared to the hooks provided by github. So for me jira seems bad at creating patches, reviewing patches, and testing patches. I've used gerrit before and it's awesome. Just a joy to use once things are set up and moving. However I don't think that it will work since it's not supported by infra and it needs to be the source of truth for a git repo. My preferences, in order, would be * Gerrit * Github only * Github with Jira integration * Phabricator with jira * Review board * Jira only On Tue, Apr 21, 2015 at 5:19 PM, Colin P. McCabe cmcc...@apache.org
Re: [DISCUSS] Github integration
On Tue, Apr 21, 2015 at 7:08 PM, Nick Dimiduk ndimi...@apache.org wrote: I would associate the upswing in introductions to increased marketing from joining incubator; orthogonal to moving out of github. No one has suggested moving away from patches attached to JIRA. As I said, patch on JIRA is what we'll eventually need for pre-commit checking anyway. I'd like the github mirror to be activated, which Jake has done. I'd also like PR's to show up as a mail to the dev list and, if possible, also land on the associated JIRA as a comment. I maintain that this will make it easier for non-Apache folks who fork-and-PR to get our attention without much fuss on either end. Does your -1 apply to PRs resulting in a mail on the dev list? I think the minimum it would need to be usable to me would be some kind of integration with JIRA, so that I could review the patch there. I suppose we could set up some kind of system whereby comments made on github were mirrored to JIRA. I also don't think we should activate any of this stuff before we have consensus on all the issues involved. Colin -n On Tuesday, April 21, 2015, Colin P. McCabe cmcc...@apache.org wrote: The argument keeps getting made that we have to be on github to make it easy for outsiders to contribute but I don't see any evidence to back that up. Quite the contrary, during the time HTrace was a github project, the number of contributions and contributors were much smaller than now. Objectively, the JIRA workflow is not difficult to learn. The number of new and recent contributors that Hadoop has is a testament to that. And many other very successful projects use the same model. I would argue that to the average developer, attaching a text file to a JIRA is easier to understand than creating a branch and a pull request in github. It's certainly easier for a first-timer than the upload process of reviewboard or gerrit. I think if we are being honest with ourselves, the only valid reason to switch away from patch attachments on JIRA is the convenience of developers. Elliot has said that he doesn't like having to click on attach patch. Some things that haven't been brought up, but which ought to be, are that reviews in JIRA require some cut-n-paste, and that you need to install a Google Chrome extension to see side-by-side diffs. My opinion is that while these things are kind of annoying, they're really not that bad. Having to explain what the difference is in my latest patch versus the previous one takes much more time and mental effort than clicking on attach patch. There are even scripts out there to automatically attach patches. Copying a few lines to the clipboard to suggest changes during a review isn't bad... in some ways I prefer it to clicking all those expand discussion arrows in other code review tools. Colin On Tue, Apr 21, 2015 at 6:18 PM, Nick Dimiduk ndimi...@apache.org wrote: There's a joke here about N devs in a room and N opinions that are all right (and all wrong)! All I'm asking for here is to make it easy for outsiders to contribute. Having HTrace show up in the mirror is a big step. The next logical thing is folks will click the fork button. We should be ready to receive the incoming help; the details of that implementation are less important to me. Whatever our individual opinions, GH is a defacto place for developers these days -- their tools are extremely well socialized. It's a shame to cut ourselves off from users of that community. I happen to share Colin's opinions about the inferiority of GH's interface for historical comments (I personally like gerrit the best of the tools I've used), but that doesn't mean we should shun it. (I also generally loath JIRA, on par with Elliott's thoughts). I think the Apache infra allows comments on PRs that are tied to a JIRA to land in the comments on the associated JIRA. Is that right Jake? It doesn't prevent the patch from disappearing from github, but at least the trail of discussion is preserved and the single page scroll down consumption is still possible. I think we as a project can make it a policy that a patch must be attached to the JIRA, not just living in a PR (we'll want that for pre-commit build bot support anyway, right?) Use the PR as another means of review, not the source of truth on the the change itself. Would that be enough for you Colin? On the topic of Gerrit, there was a discussion about bringing it about for Apache projects. It's been raised and died and raised a number of times. Gerrit for reviews and push gating + github style build hook detection would be a great setup for me as well. Maybe we should investigate that as a separate thread? -n On Tue, Apr 21, 2015 at 6:07 PM, Elliott Clark ecl...@apache.org wrote: For me pull requests show great history for the issue if things don't get bounced around
Re: HTrace GUI meetup
Hmm. Sounds like May is unavailable on Tuesday. Does Friday (4/3) at 5PM PDT work for everyone? Masatake, I'd especially like to talk to you about some ideas for the spans GUI... Abe, to respond to your points: We will post the minutes to the mailing list. Also, obviously the actual implementation of any of these ideas will happen on JIRA through the usual process. I view this as kind of similar to the meetings we do occasionally on Hadoop to coordinate a new feature that the community is working on. Also, on a semi-related note, I'm going to try to check some real-world span data into the repo to use when looking at the GUI. cheers, Colin On Thu, Mar 26, 2015 at 3:46 PM, Abraham Elmahrek a...@cloudera.com wrote: Hey Colin, I'm hugely +1 on this! I'd prefer Tuesday @ 5PM PST. There has been some discussion about doing this in the Sqoop community as well, so I thought I'd relay some of the ideas that popped up there ( http://mail-archives.apache.org/mod_mbox/sqoop-dev/201502.mbox/%3CCAHUddLM98%3DM4N4qNGfpAThQ%2BEpRf0war0FN4WM%3D3T6t7Owt4nQ%40mail.gmail.com%3E ): 1. Meeting minutes are persisted (wiki) and communicated (mailing list) 2. No concrete decisions are made (we should run votes first so every one can participate and make sure full context is provided some how) 3. Proper notice is given to the community and the meeting is globally available (which it seems we have). I think HTrace is a different community and can run it however we see fit. But I hope the above helps at least for reference. -Abe On Thu, Mar 26, 2015 at 3:29 PM, Colin P. McCabe cmcc...@apache.org wrote: Hi all, There's been a lot of really great work on the HTrace GUI recently. Masatake's span visualization screen, Abe's work on the details page, and May's work come to mind. I was thinking, we should have a phone call to talk about the GUI. I have some ideas that might be really cool. Abe also came up with some mockups. Some time next week would work well for me. Maybe next Tuesday (3/31) at 5pm, or next Friday (4/3) at 5pm PST? (I realize 5pm PST is late in California but I'm trying to come up with something that works for all the time zones.) Does that work for you guys? I'm thinking we can provide a plain ol' telephone dial-in number and use Google Hangouts for screen sharing. (We could use Google Hangouts for voice as well, but in my experience, it's best to use regular phone for voice to avoid glitches.) best, Colin
HTrace GUI meetup
Hi all, There's been a lot of really great work on the HTrace GUI recently. Masatake's span visualization screen, Abe's work on the details page, and May's work come to mind. I was thinking, we should have a phone call to talk about the GUI. I have some ideas that might be really cool. Abe also came up with some mockups. Some time next week would work well for me. Maybe next Tuesday (3/31) at 5pm, or next Friday (4/3) at 5pm PST? (I realize 5pm PST is late in California but I'm trying to come up with something that works for all the time zones.) Does that work for you guys? I'm thinking we can provide a plain ol' telephone dial-in number and use Google Hangouts for screen sharing. (We could use Google Hangouts for voice as well, but in my experience, it's best to use regular phone for voice to avoid glitches.) best, Colin
Re: Getting started with Apache HTrace development
Can we set up a wiki? Stuff like this needs to be updated periodically and it would be nice to have something like the hadoop wiki. Of course there may be some out of date stuff from time to time, but it's better than nothing... On Mon, Mar 2, 2015 at 8:52 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: This is dynamite and I think it would be very helpful to have it linked to from the website. Although the install and config doesn't appear too bulky, there are a number of steps and this would be non trivial for someone who is not familiarized with Hadoop xml based runtime configuration. I'm finishing off a patch for Chukwa right now then I will be building HTtace into my Nutxh 2.x search stack. My aim is to write something similar for that deployment as R would also be very helpful to see tracing for Gora data stores as well. Awesome. best, Colin On Monday, March 2, 2015, Colin P. McCabe cmcc...@apache.org wrote: A few people have asked how to get started with HTrace development. It's a good question and we don't have a great README up about it so I thought I would write something. HTrace is all about tracing distributed systems. So the best way to get started is to plug htrace into your favorite distributed system and see what cool things happen or what bugs pop up. Since I'm an HDFS developer, that's the distributed system that I'm most familiar with. So I will do a quick writeup about how to use HTrace + HDFS. (HBase + HTrace is another very important use-case that I would like to write about later, but one step at a time.) Just a quick note: a lot of this software is relatively new. So there may be bugs or integration pain points that you encounter. There has not yet been a stable release of Hadoop that contained Apache HTrace. There have been releases that contained the pre-Apache version of HTrace, but that's no fun. If we want to do development, we want to be able to run the latest version of the code. So we will have to build it ourselves. Building HTrace is not too bad. First we install the dependencies: cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel If you have a different Linux distro this command will vary slightly, of course. On Macs, brew is a good option. Next we use Maven to build the source: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/incubator-htrace.git cmccabe@keter:~/ cd incubator-htrace cmccabe@keter:~/ git checkout master cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true -Drat.skip OK. So htrace is built and installed to the local ~/.m2 directory. We should see it under the .m2: cmccabe@keter:~/ find ~/.m2 | grep htrace-core ... /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated ... The version you built should be 3.2.0-SNAPSHOT. Next, we check out Hadoop: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/hadoop.git cmccabe@keter:~/ cd hadoop cmccabe@keter:~/ git checkout branch-2 So we are basically building a pre-release version of Hadoop 2.7, currently known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT rather than the stable 3.1.0 release which it would ordinarily use in branch-2. I applied this diff to hadoop-project/pom.xml diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml index 569b292..5b7e466 100644 --- a/hadoop-project/pom.xml +++ b/hadoop-project/pom.xml @@ -785,7 +785,7 @@ dependency groupIdorg.apache.htrace/groupId artifactIdhtrace-core/artifactId -version3.1.0-incubating/version +version3.2.0-incubating-SNAPSHOT/version /dependency dependency groupIdorg.jdom/groupId Next, I built Hadoop: cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true You should get a package with Hadoop jars named like so: ... ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar ... This package should also contain an htrace-3.2.0-SNAPSHOT jar. OK, so how can we start seeing some trace spans? The easiest way is to configure LocalFileSpanReceiver. Add this to your hdfs-site.xml: property namehadoop.htrace.spanreceiver.classes/name valueorg.apache.htrace.impl.LocalFileSpanReceiver/value /property property namehadoop.htrace.sampler/name valueAlwaysSampler/value /property When you run the Hadoop daemons, you should see them writing to files named /tmp
Getting started with Apache HTrace development
A few people have asked how to get started with HTrace development. It's a good question and we don't have a great README up about it so I thought I would write something. HTrace is all about tracing distributed systems. So the best way to get started is to plug htrace into your favorite distributed system and see what cool things happen or what bugs pop up. Since I'm an HDFS developer, that's the distributed system that I'm most familiar with. So I will do a quick writeup about how to use HTrace + HDFS. (HBase + HTrace is another very important use-case that I would like to write about later, but one step at a time.) Just a quick note: a lot of this software is relatively new. So there may be bugs or integration pain points that you encounter. There has not yet been a stable release of Hadoop that contained Apache HTrace. There have been releases that contained the pre-Apache version of HTrace, but that's no fun. If we want to do development, we want to be able to run the latest version of the code. So we will have to build it ourselves. Building HTrace is not too bad. First we install the dependencies: cmccabe@keter:~/ apt-get install java javac google-go leveldb-devel If you have a different Linux distro this command will vary slightly, of course. On Macs, brew is a good option. Next we use Maven to build the source: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/incubator-htrace.git cmccabe@keter:~/ cd incubator-htrace cmccabe@keter:~/ git checkout master cmccabe@keter:~/ mvn install -DskipTests -Dmaven.javadoc.skip=true -Drat.skip OK. So htrace is built and installed to the local ~/.m2 directory. We should see it under the .m2: cmccabe@keter:~/ find ~/.m2 | grep htrace-core ... /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated ... The version you built should be 3.2.0-SNAPSHOT. Next, we check out Hadoop: cmccabe@keter:~/ git clone https://git-wip-us.apache.org/repos/asf/hadoop.git cmccabe@keter:~/ cd hadoop cmccabe@keter:~/ git checkout branch-2 So we are basically building a pre-release version of Hadoop 2.7, currently known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT rather than the stable 3.1.0 release which it would ordinarily use in branch-2. I applied this diff to hadoop-project/pom.xml diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml index 569b292..5b7e466 100644 --- a/hadoop-project/pom.xml +++ b/hadoop-project/pom.xml @@ -785,7 +785,7 @@ dependency groupIdorg.apache.htrace/groupId artifactIdhtrace-core/artifactId -version3.1.0-incubating/version +version3.2.0-incubating-SNAPSHOT/version /dependency dependency groupIdorg.jdom/groupId Next, I built Hadoop: cmccabe@keter:~/ mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true You should get a package with Hadoop jars named like so: ... ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar ... This package should also contain an htrace-3.2.0-SNAPSHOT jar. OK, so how can we start seeing some trace spans? The easiest way is to configure LocalFileSpanReceiver. Add this to your hdfs-site.xml: property namehadoop.htrace.spanreceiver.classes/name valueorg.apache.htrace.impl.LocalFileSpanReceiver/value /property property namehadoop.htrace.sampler/name valueAlwaysSampler/value /property When you run the Hadoop daemons, you should see them writing to files named /tmp/${PROCESS_ID} (for each different process). If this doesn't happen, try cranking up your log4j level to TRACE to see why the SpanReceiver could not be created. You should see something like this in the log4j logs: 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of type org.apache.htrace.impl.LocalFileSpanReceiver at org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92) at org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161) at org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147) at org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82) Running htraced is easy. You simply run the binary: cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced -Dlog.level=TRACE -Ddata.store.clear You should see messages like this: cmccabe@keter:~/src/htrace ./htrace-core/src/go/build/htraced -Dlog.level=TRACE -Ddata.store.clear 2015-03-02T19:08:33-08:00 D:
Re: HTrace for Nutch 2.x Search Stack
Hi Lewis, Good questions. I would say HTrace differs from TRACE logging (or other single-node metrics, JMX, audit logs, etc.) in that it pulls together information from across the cluster. This is something that is a major pain point when using a distributed system such as HDFS. Just to diagnose a slow write, you might have to match up logs from a client log and the logs of 3 different datanodes. The big idea behind htrace is two things: integrating those logging sources, and using sampling to instrument performance in production. The main thing htrace deals with is spans which are lengths of time. We're working on a web UI that will allow people to search for spans by time, duration, and name (among other things). It's not quite finished now (hoping to have something usable in HTrace 3.2.0 or maybe 3.3.0... but abe can comment more on that.) Here's an early screenshot (probably way out of date now): https://issues.apache.org/jira/secure/attachment/12689757/Search%20page%20skeleton%20-%200.png There is also a plan to create a visualization of parent/child relationships on the web UI, by using the d3 library (which can draw graphs, and do many other things besides.) In the meantime, there's an option to product a graphviz file from a file containing span JSON. That way you can draw a graph of parent/child relationships with the dot tool, available on Linux. Uh... unfortunately it's broken right now... let me file a JIRA for that :P This is a very new feature, got added earlier this week. The web UI is a great place to get involved right now... there is a lot of work going on there and we've been adding new contributors. Colin On Thu, Feb 26, 2015 at 1:46 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Nick, Grand. Thank you What is visualization looking like right now? It there currently a mechanism for visualizing HTrace structures? Is it worth considering posting something like this as a GSoC project is one does not currently exist? Thanks Lewis On Thu, Feb 26, 2015 at 1:31 PM, Nick Dimiduk ndimi...@gmail.com wrote: Hi Lewis,
Re: [REPORT] HTrace March 2015
+1. Thanks, Lewis. C. On Thu, Feb 26, 2015 at 10:58 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Folks, Please see below, this has been added to the wiki HTraceHTrace is a tracing framework intended for use with distributed systemswritten in java.HTrace has been incubating since 2014-11.Three most important issues to address in the move towards graduation: 1. Continue to grow the HTrace community 2. Continue to develop and release stable HTrace incubating artifacts 3. Continue to explore the integration of the HTrace framework into other Apache productsAny issues that the Incubator PMC (IPMC) or ASF Board wish/need to beaware of?NoHow has the community developed since the last report?There has been a bunch of mailing list activity relating directlyto issue 3 above e.g. better integration of HTrace into HBase/HDFS.HTrace is being represented at ApacheCon 2015 NA in April with a presentationIntroducing Apache HTrace: An End-to-End Tracing Framework for Distributed Systems - Colin McCabe, Cloudera - http://sched.co/2P8QHow has the project developed since the last report?The codebase has seen about 30 odd commits since last reporting.Jira continues to see activity which is encouraging as HTrace communityprogresses towards next incubating release.Date of last release: 2015-20-01 htrace-3.1.0-incubatingWhen were the last committers or PMC members elected?Abraham Elmahrek was elected to become an HTrace committer onWed, 11 Feb, 2-15.Signed-off-by: [ ](htrace) Jake Farrell [ ](htrace) Todd Lipcon [X](htrace) Lewis John Mcgibbney [ ](htrace) Andrew Purtell [ ](htrace) Billie Rinaldi [ ](htrace) Michael StackShepherd/Mentor notes: Ta Lewis -- *Lewis*
Re: Trace HBase/HDFS with HTrace
Hmm. Looking at that error, my guess would be that there is an incorrect usage of TraceScope#detach going on somewhere in hbase... perhaps a double detach. But I could be wrong. We added some code recently to catch issues like this. best, Colin On Wed, Feb 25, 2015 at 12:28 AM, Masatake Iwasaki iwasak...@oss.nttdata.co.jp wrote: I tried hbase-1 built against today's htrace-3.2.0-SNAPSHOT (with quick fix to TestHTraceHooks). I got the error below in regionserver log. I will dig this tomorrow.:: 2015-02-25 00:18:29,270 ERROR [RS_OPEN_META-centos7:16201-0] htrace.Tracer: Tried to detach trace span null but it has already been detached. 2015-02-25 00:18:29,271 ERROR [RS_OPEN_META-centos7:16201-0] handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740, starting to roll back the global memstore size. java.lang.RuntimeException: Tried to detach trace span null but it has already been detached. at org.apache.htrace.Tracer.clientError(Tracer.java:61) at org.apache.htrace.TraceScope.detach(TraceScope.java:57) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1559) at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeRegionEventMarker(WALUtil.java:94) at org.apache.hadoop.hbase.regionserver.HRegion.writeRegionOpenMarker(HRegion.java:910) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4911) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4874) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4845) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4801) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4752) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:356) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:126) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On 2/24/15 18:27, Colin P. McCabe wrote: Thanks for trying this, Mastake. I've got HDFS working on my cluster with tracing and LocalFileSpanReceiver. Did you try using HBase + HDFS with LocalFileSpanReceiver? Be sure to use a build including HTRACE-112 since LFSR was kind of busted prior to that. I'm going to do a longer writeup about getting HDFS + HBase working with other span receivers just as soon as I finish stomping a few more bugs. best, Colin On Tue, Feb 24, 2015 at 12:04 PM, Masatake Iwasaki iwasak...@oss.nttdata.co.jp wrote: Hi, Thanks for trying this. I am sorry for late reply. I tried this today by hbase-1.0.1-SANPSHOT built with {{-Dhadoop-two.version=2.7.0-SNAPSHOT}} in pseudo distributed cluster but failed to get end-to-end trace. I checked that * tracing works for both of hbase and hdfs, * hbase runs with 2.7.0-SNAPSHOT jar of hadoop. When I did do put with tracing on, I saw span named FSHLog.sync with annotations such as syncing writer and writer synced. The code for tracing in FSHLog worked at least. I'm still looking into this. If it turned out that tracing spans are not reached to actual HDFS writer thread in HBase, I will file a JIRA. # We need hadoop-2.6.0 or higher in order to trace HDFS. # Building hbase from source with {{-Dhadoop-two.version=2.6.0}} # is straight forward way to do this # because the binary release of hbase-1.0.0 bundles hadoop-2.5.1 jars. Masatake On 2/11/15 08:56, Nick Dimiduk wrote: Hi Joshua, In theory there's nothing special for you to do. Just issue your query to HBase with tracing enabled. The active span will go through HBase, down into HDFS, and back again. You'll need both systems collecting spans into the same place so that you can report on the complete trace tree. I've not recently tested the end-to-end, but I believe it's all there. If not, it's a bug -- this is an intended use case. Can you give it a try and let us know how it goes? FYI, 0.99.x are preview releases of HBase and not for production use. Just so you know :) -n On Wednesday, February 11, 2015, Chunxu Tang chunxut...@gmail.com wrote: Hi all, Now I’m exploiting HTrace to trace request level data flows in HBase and HDFS. I have successfully traced HBase and HDFS by using HTrace, respectively. After that, I combine HBase and HDFS together and I want to just send a PUT/GET request to HBase, but to trace the whole data flow in both HBase and HDFS. In my opinion, when I send a request such as Get to HBase, it will at last try to read the blocks on HDFS, so I can
Re: Trace HBase/HDFS with HTrace
Thanks for trying this, Mastake. I've got HDFS working on my cluster with tracing and LocalFileSpanReceiver. Did you try using HBase + HDFS with LocalFileSpanReceiver? Be sure to use a build including HTRACE-112 since LFSR was kind of busted prior to that. I'm going to do a longer writeup about getting HDFS + HBase working with other span receivers just as soon as I finish stomping a few more bugs. best, Colin On Tue, Feb 24, 2015 at 12:04 PM, Masatake Iwasaki iwasak...@oss.nttdata.co.jp wrote: Hi, Thanks for trying this. I am sorry for late reply. I tried this today by hbase-1.0.1-SANPSHOT built with {{-Dhadoop-two.version=2.7.0-SNAPSHOT}} in pseudo distributed cluster but failed to get end-to-end trace. I checked that * tracing works for both of hbase and hdfs, * hbase runs with 2.7.0-SNAPSHOT jar of hadoop. When I did do put with tracing on, I saw span named FSHLog.sync with annotations such as syncing writer and writer synced. The code for tracing in FSHLog worked at least. I'm still looking into this. If it turned out that tracing spans are not reached to actual HDFS writer thread in HBase, I will file a JIRA. # We need hadoop-2.6.0 or higher in order to trace HDFS. # Building hbase from source with {{-Dhadoop-two.version=2.6.0}} # is straight forward way to do this # because the binary release of hbase-1.0.0 bundles hadoop-2.5.1 jars. Masatake On 2/11/15 08:56, Nick Dimiduk wrote: Hi Joshua, In theory there's nothing special for you to do. Just issue your query to HBase with tracing enabled. The active span will go through HBase, down into HDFS, and back again. You'll need both systems collecting spans into the same place so that you can report on the complete trace tree. I've not recently tested the end-to-end, but I believe it's all there. If not, it's a bug -- this is an intended use case. Can you give it a try and let us know how it goes? FYI, 0.99.x are preview releases of HBase and not for production use. Just so you know :) -n On Wednesday, February 11, 2015, Chunxu Tang chunxut...@gmail.com wrote: Hi all, Now I’m exploiting HTrace to trace request level data flows in HBase and HDFS. I have successfully traced HBase and HDFS by using HTrace, respectively. After that, I combine HBase and HDFS together and I want to just send a PUT/GET request to HBase, but to trace the whole data flow in both HBase and HDFS. In my opinion, when I send a request such as Get to HBase, it will at last try to read the blocks on HDFS, so I can construct a whole data flow tracing through HBase and HDFS. While, the fact is that I can only get tracing data of HBase, with no data of HDFS. Could you give me any suggestions on how to trace the data flow in both HBase and HDFS? Does anyone have similar experience? Do I need to modify the source code? And maybe which part(s) should I touch? If I need to modify the code, I will try to create a patch for that. Thank you. My Configurations: Hadoop version: 2.6.0 HBase version: 0.99.2 HTrace version: htrace-master OS: Ubuntu 12.04 Joshua
Re: Trace HBase/HDFS with HTrace
On Thu, Feb 12, 2015 at 1:23 PM, Chunxu Tang chunxut...@gmail.com wrote: Hi all, Thanks for your detailed replies! Now I have tested end-to-end tracing in two versions of HBase (0.98.10 and 0.99.2), combined with Hadoop 2.6.0 and htrace-master (3.0.4), and both of them failed. For HBase 0.98.10, it actually has htrace 2.0.4 core, so it's normal to get no traces. While, HBase 0.99.2 has htrace 3.0.4 core, but I still cannot get traces of HDFS, I can only get traces of HBase. Hadoop 2.6.0 doesn't the correct version of HTrace, so this is all expected. You aren't going to be able to do anything useful here as long as you keep using Hadoop 2.6.0. I would suggest using Hadoop-2.7.0-SNAPSHOT with an appropriate version of HBase. Hope this helps. best, Colin I think the first thing I need to make sure is that I use a correct method to implement end-to-end test. I'm not very sure whether it's good to show whole source code on the mailing list, so I just put some core code chunks written in the client code here: public void run(){ Configuration conf = HBaseConfiguration.create(); org.apache.hadoop.hbase.trace.SpanReceiverHost.getInstance(conf); org.apache.hadoop.tracing.SpanReceiverHost.getInstance(new HdfsConfiguration()); TraceScope ts = Trace.startSpan(Gets, Sampler.ALWAYS); HTable table = new HTable(conf, t1); Get get = new Get(Bytes.toBytes(r1)); table.get(get); ... } Now I can only get traces of HBase, ending with HfileReaderV2.readBlock() function. Is my testing method correct? And because I'm not familiar with new version of HTrace and HBase/HDFS with new htrace core, could you give me some suggestions to detect where the error may take place? Thank you all. Joshua 2015-02-11 22:08 GMT-05:00 Colin P. McCabe cmcc...@apache.org: No, I think I'm the one who's missing something. :) I will give that a try next time I'm testing out end-to-end tracing. thanks guys. Colin On Wed, Feb 11, 2015 at 4:36 PM, Enis Söztutar enis@gmail.com wrote: mvn install just installs it in local cache which you can then use for building other projects. So no need to have to define a file based local repo. Am I missing something? Enis On Wed, Feb 11, 2015 at 12:36 PM, Nick Dimiduk ndimi...@gmail.com wrote: Oh, I see. I was assuming a local build of Hadoop snapshot installed into the local cache. On Wednesday, February 11, 2015, Colin P. McCabe cmcc...@apache.org wrote: On Wed, Feb 11, 2015 at 11:27 AM, Nick Dimiduk ndimi...@gmail.com javascript:; wrote: I don't recall the hadoop release repo restriction being a problem, but I haven't tested it lately. See if you can just specify the release version with -Dhadoop.version or -Dhadoop-two.version. Sorry, it's been a while since I did this... I guess the question is whether 2.7.0-SNAPSHOT is available in Maven-land somewhere? If so, then Chunxu should forget all that stuff I said, and just build HBase with -Dhadoop.version=2.7.0-SNAPSHOT I would go against branch-1.0 as this will be the eminent 1.0.0 release and had HTrace 3.1.0-incubating. Thanks. Colin -n On Wed, Feb 11, 2015 at 11:13 AM, Colin P. McCabe cmcc...@apache.org javascript:; wrote: Thanks for trying stuff out! Sorry that this is a little difficult at the moment. To really do this right, you would want to be using Hadoop with HTrace 3.1.0, and HBase with HTrace 3.1.0. Unfortunately, there hasn't been a new release of Hadoop with HTrace 3.1.0. The only existing releases of Hadoop use an older version of the HTrace library. So you will have to build from source. If you check out Hadoop's branch-2 branch (currently, this branch represents what will be in the 2.7 release, when it is cut), and build that, you will get the latest. Then you have to build a version of HBase against the version of Hadoop you have built. By default, HBase's Maven build will build against upstream release versions of Hadoop only. So just setting -Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know where to find the jars. To get around this problem, you can create your own local maven repo. Here's how. In hadoop/pom.xml, add these lines to the distributionManagement stanza: +repository + idlocaldump/id + urlfile:///home/cmccabe/localdump/releases/url +/repository +snapshotRepository + idlocaldump/id + urlfile:///home/cmccabe/localdump/snapshots/url +/snapshotRepository Comment out the repositories that are already there. Now run mkdir /home/cmccabe/localdump. Then, in your hadoop tree, run mvn deploy -DskipTests. You should get a localdump directory that has files kind of like
Re: Trace HBase/HDFS with HTrace
On Wed, Feb 11, 2015 at 11:27 AM, Nick Dimiduk ndimi...@gmail.com wrote: I don't recall the hadoop release repo restriction being a problem, but I haven't tested it lately. See if you can just specify the release version with -Dhadoop.version or -Dhadoop-two.version. Sorry, it's been a while since I did this... I guess the question is whether 2.7.0-SNAPSHOT is available in Maven-land somewhere? If so, then Chunxu should forget all that stuff I said, and just build HBase with -Dhadoop.version=2.7.0-SNAPSHOT I would go against branch-1.0 as this will be the eminent 1.0.0 release and had HTrace 3.1.0-incubating. Thanks. Colin -n On Wed, Feb 11, 2015 at 11:13 AM, Colin P. McCabe cmcc...@apache.org wrote: Thanks for trying stuff out! Sorry that this is a little difficult at the moment. To really do this right, you would want to be using Hadoop with HTrace 3.1.0, and HBase with HTrace 3.1.0. Unfortunately, there hasn't been a new release of Hadoop with HTrace 3.1.0. The only existing releases of Hadoop use an older version of the HTrace library. So you will have to build from source. If you check out Hadoop's branch-2 branch (currently, this branch represents what will be in the 2.7 release, when it is cut), and build that, you will get the latest. Then you have to build a version of HBase against the version of Hadoop you have built. By default, HBase's Maven build will build against upstream release versions of Hadoop only. So just setting -Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know where to find the jars. To get around this problem, you can create your own local maven repo. Here's how. In hadoop/pom.xml, add these lines to the distributionManagement stanza: +repository + idlocaldump/id + urlfile:///home/cmccabe/localdump/releases/url +/repository +snapshotRepository + idlocaldump/id + urlfile:///home/cmccabe/localdump/snapshots/url +/snapshotRepository Comment out the repositories that are already there. Now run mkdir /home/cmccabe/localdump. Then, in your hadoop tree, run mvn deploy -DskipTests. You should get a localdump directory that has files kind of like this: ... /home/cmccabe/localdump/snapshots/org/apache/hadoop /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/maven-metadata.xml.md5 /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml.md5 /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/hadoop-mapreduce-2.7.0-20121120.230341-1.pom.sha1 /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml ... Now, add the following lines to your HBase pom.xml: repositories repository + idlocaldump/id + urlfile:///home/cmccabe/localdump/url + nameLocal Dump/name + snapshots +enabledtrue/enabled + /snapshots + releases +enabledtrue/enabled + /releases +/repository +repository This will allow you to run something like: mvn test -Dtest=TestMiniClusterLoadSequential -PlocalTests -DredirectTestOutputToFile=true -Dhadoop.profile=2.0 -Dhadoop.version=2.7.0-SNAPSHOT -Dcdh.hadoop.version=2.7.0-SNAPSHOT Once we do a new release of Hadoop with HTrace 3.1.0 this will get a lot easier. Related: Does anyone know what the best git branch to build from for HBase would be for this kind of testing? I've been meaning to do some end to end testing (it's been on my TODO for a while) best, Colin On Wed, Feb 11, 2015 at 7:55 AM, Chunxu Tang chunxut...@gmail.com wrote: Hi all, Now I’m exploiting HTrace to trace request level data flows in HBase and HDFS. I have successfully traced HBase and HDFS by using HTrace, respectively. After that, I combine HBase and HDFS together and I want to just send a PUT/GET request to HBase, but to trace the whole data flow in both HBase and HDFS. In my opinion, when I send a request such as Get to HBase, it will at last try to read the blocks on HDFS, so I can construct a whole data flow tracing through HBase and HDFS. While, the fact is that I can only get tracing data of HBase, with no data of HDFS. Could you give me any suggestions on how to trace the data flow in both HBase and HDFS? Does anyone have similar experience? Do I need to modify the source code? And maybe which part(s) should I touch? If I need to modify the code, I will try to create a patch for that. Thank you. My Configurations: Hadoop version: 2.6.0 HBase version: 0.99.2 HTrace version: htrace-master OS: Ubuntu 12.04 Joshua
Re: HTrace integration for more HDFS client operations
On Fri, Jan 16, 2015 at 10:03 AM, Nick Dimiduk ndimi...@gmail.com wrote: This reminds me: have we tested the compatibility of this new release with previous versions? For instance, if we upgrade HBase to the incubator release but not HDFS, will tracing work only as far as that? So, for this 3.1.0 release, the story is pretty simple. The previous releases were in a different namespace and the jars had a different name, so HBase and HDFS can use different versions if they want to. There will be no conflicts. Of course, if HBase and HDFS don't use the same version, the spans won't be parented with HBase's spans. But there are no crashes or other problems like that. The situation for the future is more complex. Of course, HBase pulls in jars from Hadoop. One of those jars is going to be our htrace-core jar. The HDFS client and HBase's daemons are going to want to use the same version of htrace. I know that HBase likes to provide compatibility with as many versions of Hadoop as it can. Basically HBase is going to have to look at the oldest version of Apache HTrace that Hadoop might ask it to use, and verify that that works. It might help to look at the stuff we're trying to get rid of in the API: 1. We'd like to get rid of the Span#addKVAnnotation method which takes byte[], in favor of the one which takes String 2. We'd like to get rid of the public MilliSpan constructor... MilliSpan#Builder is more flexible and future-proof. If we want to add new parameters we don't want a combinatorial explosion of constructors (we learned this in Hadoop) 3. Do not use Span#getParentId because it assumes that there is a single parent for each span, an assumption we're trying to get rid of #2 and #3 shouldn't be a problem for HBase because there's no reason for HBase to directly create MilliSpans, or call getParentId. I bet there might be some cases where we're calling the byte[] version of addKVAnnotation, though. So tl;dr: When we update HBase to use the new Apache jar, let's be careful NOT to use any of these deprecated APIs. Then we should be able to remove those from the next release without creating any compat problems for HBase. best, Colin On Fri, Jan 16, 2015 at 9:44 AM, Stack st...@duboce.net wrote: You the man CPMcC. St.Ack On Fri, Jan 16, 2015 at 12:30 AM, Colin P. McCabe cmcc...@apache.org wrote: Hi all, I've got some good news that I figured I'd post to the list! Today I added a bunch of htrace integrating to HDFS, in https://issues.apache.org/jira/browse/HDFS-7189. This patch adds tracing for a whole host of DFS client operations, such as rename and delete. Obviously this will be helpful for HDFS users, and it should also increase our ability to follow HBase operations all the way back into HDFS via HTrace-- for example when HBase is deleting or moving a WAL, etc. The last big piece of HTrace integration for HDFS is integration into the output stream (i.e. the write path). This should be coming soon, so stay tuned. cheers, Colin