Re: [DISCUSS] Subproject proposal

2018-03-15 Thread Andrew Purtell
I think it would make a lot of sense if merged into Hadoop Common. HBase and 
Phoenix at least would have a trivial migration, and already depend on Hadoop 
Common for many other things. This would prolong the life of HTrace API usage 
in those projects, perhaps indefinitely. 


> On Mar 15, 2018, at 12:52 PM, Colin McCabe  wrote:
> 
> I would potentially be interested in continue to be involved with HTrace as a 
> subproject.
> 
> The vision behind HTrace was always to have a single trace system that 
> unified all of Hadoop.  So you could see what Accumulo was doing and how that 
> affected HDFS, or what Phoenix was doing that affected HBase and HDFS, etc. 
> etc.  This has sort of been built several times internally by companies 
> running services based on Hadoopy projects, but never really made its way 
> into open source in a meaningful way.  I thought we had a good shot at that, 
> but maybe we needed to start earlier and have more resources.  We especially 
> lacked full-time developers and people to evangelize the client.
> 
> I think it makes the most sense for HTrace to be a subproject of either 
> Apache Hadoop or Apache Skywalking.  Skywalking in particular seems 
> interesting since its goals are very similar to HTrace's -- to be a one-stop 
> shop including tracing clients, visualization, and storage.  Perhaps HTraced 
> could be useful to them for improving that "first 15 minute experience".  
> It's easy to start up and doesn't require managing a separate storage or 
> query system.
> 
> I'm not so sure about HTrace being a subproject of Accumulo.  It seems like 
> Accumulo is really focused on being a storage system, not so much on being a 
> platform.  It would be weird for HBase or HDFS to depend on something that 
> was a subproject of Accumulo, for example.
> 
> best,
> Colin
> 
> 
>> On Wed, Mar 14, 2018, at 17:35, Michael Wall wrote:
>> I am interested.  I am not thinking about it as subproject under Accumulo
>> though, just to be clear.  Just looked at Skywalking for the first time,
>> seems intriguing.
>> 
>>> On Wed, Mar 14, 2018 at 7:32 PM Mike Drob  wrote:
>>> 
>>> On Wed, Mar 14, 2018, 2:26 PM Billie Rinaldi 
>>> wrote:
>>> 
 In the active thread "[VOTE] Retire HTrace from Incubation" Christopher
 Tubbs brought up the idea to make HTrace a subproject of an existing TLP.
>>> 
>>> This would mitigate the issues of the community being inactive and the core
 instrumentation library not requiring ongoing development.
>>> 
>>> 
>>> Does moving to a subproject out another tlp necessitate changing Java
>>> package names prior to release? That would put a damper on user adoption
>>> again.
>>> 
>>> It's a choice we could make now (assuming we were able to find a TLP
 willing to adopt HTrace
>>> 
>>> as a subproject),
>>> 
>>> The Skywalking podling expressed some interest in the vote thread.
>>> 
>>> 
>>> 
>>> or we could allow HTrace to retire and then revisit the
 subproject idea at a future time if someone becomes interested in
>>> patching
 and releasing a new version of HTrace.
 
 So far, the people who have expressed interest in being involved with
 HTrace as a possible subproject are Christopher, Masatake, and myself. Is
 anyone else in the community interested in this idea?
 
>>> 


Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Andrew Purtell
> That's not the issue.  We already have HTrace integration with Hadoop
RPC, such that a Hadoop RPC creates a span.

This is an issue. I'm glad Hadoop RPC is covered, but nobody but Hadoop
uses it. Likewise, HBase RPC. These are not general purpose RPC stacks by
any stretch. There are some of those around. Some have tracing built in.
They take some of the oxygen out of the room. I think that is a fair point
when thinking about the viability of a podling that sees little activity as
it is.

I didn't come here to suggest HTrace go away, though. I came to raise a few
points on why adoption and use of HTrace has very likely suffered from
usability problems. These problems are still not completely resolved. Stack
describes HTrace integration with HBase as broken. My experience has been I
have to patch POMs, and patch HDFS, HBase, and Phoenix code, to get
anything that works at all. I also sought to tie some of those problems to
ecosystem issues because I know it is hard. For what it's worth, thanks.



On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe <cmcc...@apache.org> wrote:

> On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > What about OpenTracing (http://opentracing.io/)? Is this the successor
> > project to ZipKin? In particular grpc-opentracing (
> > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > fulfill in open source the tracing architecture described in the Dapper
> > paper.
>
> OpenTracing is essentially an API which sits on top of another tracing
> system.
>
> So you can instrument your code with the OpenTracing library, and then
> have that send the trace spans to OpenZipkin.
>
> Here are some thoughts here about this topic from a Zipkin developer:
> https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> c7#what-is-the-relationship-between-zipkin-and-opentracing
> .  Probably Adrian Cole can chime in here as well.
>
> In general the OpenTracing folks have been friendly and respectful.  (If
> any of them are reading this, I apologize for not following some of the
> discussions on gitter more thoroughly-- my time is just split so many
> ways right now!)
>
> >
> > If one takes a step back and looks at all of the hand rolled RPC stacks
> > in
> > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > everyone
> > migrated to a single RPC stack - gRPC - would provide the unified tracing
> > layer envisioned by HTrace. The tracing integration is then done exactly
> > in
> > one place. In contrast HTrace requires all of the components to sprinkle
> > spans throughout the application code.
> >
>
> That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> system is actually very straightforward-- you just add two fields to the
> base RPC request definition, and patch the RPC system to use them.
>
> Just instrumenting RPC is not sufficient.  You need programmers to add
> explicit span annotations to your code so that you can have useful
> information beyond what a program like wireshark would find.  Things
> like what disk is a request hitting, what HBase PUT is an HDFS write
> associated with, and so forth.
>
> Also, this is getting off topic, but there is a new RPC system every
> year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> REST/JSON, etc.  They all have advantages and disadvantages.  For
> example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> and performance problems with the protobuf-java library.  I wish GPRC
> luck, but I think it's good for people to experiment with different
> libraries.  It doesn't make sense to try to force everyone to use one
> thing, even if we could.
>
> > The Hadoop ecosystem is always partially at odds with itself, if for no
> > other reason than there is no shared vision among the projects. There are
> > no coordinated releases. There isn't even agreement on which version of
> > shared dependencies to use (hence the recurring pain in various places
> > with
> > downstream version changes of protobuf, guava, jackson, etc. etc).
> > Therefore HTrace is severely constrained on what API changes can be made.
> > Unfortunately the different major versions of HTrace do not interoperate
> > at
> > all. And are not even source compatible. While is not unreasonable at all
> > for a project in incubation, when combined with the inability of the
> > Hadoop
> > ecosystem to coordinate releases as a cross-cutting dependency ships a
> > new
> > version, this has reduced the utility of HTrace to effectively nil for
> > the
> > average user. I am sorry to say that. Only a c

Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Andrew Purtell
> The different major versions of HTrace are indeed source code compatible.

Maybe the issue was going from 2 to 3. At the time it was a real problem,
change or removal of a span id constant, and another time something to do
with setting parent-child span relationships, IIRC. If this is better
between 3 and 4 then the point no longer applies.


On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe <cmcc...@apache.org> wrote:

> On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > What about OpenTracing (http://opentracing.io/)? Is this the successor
> > project to ZipKin? In particular grpc-opentracing (
> > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > fulfill in open source the tracing architecture described in the Dapper
> > paper.
>
> OpenTracing is essentially an API which sits on top of another tracing
> system.
>
> So you can instrument your code with the OpenTracing library, and then
> have that send the trace spans to OpenZipkin.
>
> Here are some thoughts here about this topic from a Zipkin developer:
> https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> c7#what-is-the-relationship-between-zipkin-and-opentracing
> .  Probably Adrian Cole can chime in here as well.
>
> In general the OpenTracing folks have been friendly and respectful.  (If
> any of them are reading this, I apologize for not following some of the
> discussions on gitter more thoroughly-- my time is just split so many
> ways right now!)
>
> >
> > If one takes a step back and looks at all of the hand rolled RPC stacks
> > in
> > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > everyone
> > migrated to a single RPC stack - gRPC - would provide the unified tracing
> > layer envisioned by HTrace. The tracing integration is then done exactly
> > in
> > one place. In contrast HTrace requires all of the components to sprinkle
> > spans throughout the application code.
> >
>
> That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> system is actually very straightforward-- you just add two fields to the
> base RPC request definition, and patch the RPC system to use them.
>
> Just instrumenting RPC is not sufficient.  You need programmers to add
> explicit span annotations to your code so that you can have useful
> information beyond what a program like wireshark would find.  Things
> like what disk is a request hitting, what HBase PUT is an HDFS write
> associated with, and so forth.
>
> Also, this is getting off topic, but there is a new RPC system every
> year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> REST/JSON, etc.  They all have advantages and disadvantages.  For
> example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> and performance problems with the protobuf-java library.  I wish GPRC
> luck, but I think it's good for people to experiment with different
> libraries.  It doesn't make sense to try to force everyone to use one
> thing, even if we could.
>
> > The Hadoop ecosystem is always partially at odds with itself, if for no
> > other reason than there is no shared vision among the projects. There are
> > no coordinated releases. There isn't even agreement on which version of
> > shared dependencies to use (hence the recurring pain in various places
> > with
> > downstream version changes of protobuf, guava, jackson, etc. etc).
> > Therefore HTrace is severely constrained on what API changes can be made.
> > Unfortunately the different major versions of HTrace do not interoperate
> > at
> > all. And are not even source compatible. While is not unreasonable at all
> > for a project in incubation, when combined with the inability of the
> > Hadoop
> > ecosystem to coordinate releases as a cross-cutting dependency ships a
> > new
> > version, this has reduced the utility of HTrace to effectively nil for
> > the
> > average user. I am sorry to say that. Only a commercial Hadoop vendor or
> > power user can be expected to patch and build a stack that actually
> > works.
>
> One correction: The different major versions of HTrace are indeed source
> code compatible.  You can build an application that can use both HTrace
> 3 and HTrace 4.  This was absolutely essential for us because of the
> version skew issues you mention.
>
> > On Thu, Aug 17, 2017 at 11:04 AM, lewis john mcgibbney <
> lewi...@apache.org> wrote:
> >
> > > Hi Mike,
> > > I think this is a fair question. We've probably all been associated
> with
> > > projects which just don't rea

Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Andrew Purtell
What about OpenTracing (http://opentracing.io/)? Is this the successor
project to ZipKin? In particular grpc-opentracing (
https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
fulfill in open source the tracing architecture described in the Dapper
paper.

If one takes a step back and looks at all of the hand rolled RPC stacks in
the Hadoop ecosystem it's a mess. It is a heavier lift but getting everyone
migrated to a single RPC stack - gRPC - would provide the unified tracing
layer envisioned by HTrace. The tracing integration is then done exactly in
one place. In contrast HTrace requires all of the components to sprinkle
spans throughout the application code.

The Hadoop ecosystem is always partially at odds with itself, if for no
other reason than there is no shared vision among the projects. There are
no coordinated releases. There isn't even agreement on which version of
shared dependencies to use (hence the recurring pain in various places with
downstream version changes of protobuf, guava, jackson, etc. etc).
Therefore HTrace is severely constrained on what API changes can be made.
Unfortunately the different major versions of HTrace do not interoperate at
all. And are not even source compatible. While is not unreasonable at all
for a project in incubation, when combined with the inability of the Hadoop
ecosystem to coordinate releases as a cross-cutting dependency ships a new
version, this has reduced the utility of HTrace to effectively nil for the
average user. I am sorry to say that. Only a commercial Hadoop vendor or
power user can be expected to patch and build a stack that actually works.
​​

On Thu, A
​​
ug 17, 2017 at 11:04 AM, lewis john mcgibbney  wrote:

> Hi Mike,
> I think this is a fair question. We've probably all been associated with
> projects which just don't really make it. It would appear that HTrace is
> one of them. This is not to say that there is nothing going on with the
> tracing effort generally (as there is) but it looks like HTrace as a
> project may be headed to the Attic.
> I suppose the response to this thread will determine what happens...
> Lewis
> ​​
>
>
> On Wed, Aug 16, 2017 at 10:01 AM, <
> dev-digest-h...@htrace.incubator.apache.org> wrote:
>
> >
> > From: Mike Drob 
> > To: dev@htrace.incubator.apache.org
> > Cc:
> > Bcc:
> > Date: Wed, 16 Aug 2017 12:00:49 -0500
> > Subject: [DISCUSS] Attic podling Apache HTrace?
> > Hi folks,
> >
> > Want to bring up a potentially uncofortable topic for some. Is it time to
> > retire/attic the project?
> >
> > We've seen a minimal amount of activity in the past year. The last
> release
> > had two bug fixes, and had been pending for several months before
> somebody
> > reminded me to push the artifacts to subversion from the staging
> directory.
> >
> > I'd love to see a renewed set of activity here, but I don't think there
> is
> > a ton of interest going on.
> >
> > HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> > which is a good sign, but I haven't heard much from them recently. I
> > definitely do no think we are at the point where a lack of releases and
> > activity is a sign of super advanced maturity and stability.
> >
> > Your thoughts?
> >
> > Mike
> >
> >
>
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: HTrace and Go

2015-02-19 Thread Andrew Purtell
Might be useful followup to document how to build just the Java client? I
also found it necessary to install golang to build HTrace.

On Thu, Feb 19, 2015 at 1:56 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 Hi Michael,

 This is indeed the appropriate list. I believe there is some work in
 progress to see Accumulo using HTrace. I'm sure Billie can comment further
 on this effort. Re: golang, we are using this to implement a span-receiver
 daemon, the htraced module. Go is not required for using our Java client
 library. I believe we support both go 1.3 and 1.4.

 Welcome!
 -n

 On Thu, Feb 19, 2015 at 12:26 PM, Michael Wall mjw...@gmail.com wrote:

  Hi,
 
  I am interested in learning more about HTrace and helping.  Hopefully
 this
  is the correct list, I didn't see a users alias.
 
  Specifically, I use Accumulo everyday and am investigating how HTrace can
  supplement or replace Accumulo's built-in tracing.
 
  I just checked out the code and tried to build it.  To my surprise, I
 need
  to install Golang.  What version should I be using?  Is there somewhere I
  can read about why this was chosen and which modules/functionality Go
 will
  be used for?
 
  Thanks
 
  Mike Wall
 




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Website: mvn generated or apache cms or....

2014-12-15 Thread Andrew Purtell
Looks great Stack.

On Mon, Dec 15, 2014 at 6:28 PM, Stack st...@duboce.net wrote:

 I posted an amalgam of maven site, bootstrap, and markdown: i.e. maven site
 generates a bootstrap branded site and doc is written in markdown.  It
 looks like this: http://people.apache.org/~stack/htrace_website/  See
 issue
 https://issues.apache.org/jira/browse/HTRACE-19 for the patch and more
 detail.  You all ok w/ this being our first cut at a website?

 St.Ack

 On Thu, Dec 11, 2014 at 2:03 PM, Stack st...@duboce.net wrote:
 
  On Thu, Dec 11, 2014 at 11:34 AM, Jake Farrell jfarr...@apache.org
  wrote:
 
  cms has its advantages and drawbacks. I've used middleman, jekyll,
 maven,
  and the cms to generate different ASF sites and so far the one that I
  prefer for use is middleman. Having the ability to use markdown and live
  local dev vs having to push to buildbot (which if down then no site
  updates
  can occur). Example of middleman in use that I did is for mesos or
 thrift
  (thrift also has a cms version since we tested them all out). we can
  switch
  to whichever variant people are most comfortable with, just need to make
  sure that its documented so any committer can update the site easily
 
 
 
  Thanks Jake.
 
  I looked at middlman.  It is for 'hand-made' sites.  Looks nice.  Just
  putting our README.md in place of the index.html in default site I got
 this
  http://people.apache.org/~stack/htrace_mm_site/  so markdown works (just
  need to read how to get the styling in there).
 
  On other hand, just doing mvn site got me this far:
  http://people.apache.org/~stack/htrace_mvn_site/ which is kinda
  attractive.
 
  I don't have much time for website making. Maybe someone else does
 though.
  Otherwise I'd be inclined toward the one takes the least amount of work.
 
  Thanks,
  St.Ack
 
 
 
 
  -Jake
 
  On Thu, Dec 11, 2014 at 2:26 PM, Lewis John Mcgibbney 
  lewis.mcgibb...@gmail.com wrote:
 
   CMS ROCKS
  
   On Thu, Dec 11, 2014 at 8:43 AM, Stack st...@duboce.net wrote:
  
How should we do the website?  Jake set up the svnpubsub for us so
  we'd
generate the static site and then publish it only, we have no
  'website'
currently.
   
I could do a little fixup and then generate the site with 'mvn
 site'.
That'd be easy (if ugly). Going forward, changing the website, you'd
   edit,
stage, and the svn commit to deploy (a minor burden).
   
We have a bit of markdown carried over the github deploy.  If our
 site
   used
'apache cms' [1][2], we could just put up our little bit of *.md.
  Devs
could just login and edit the site; there would be no build, publish
   step.
   
Jake, is it even possible to move to 'apache cms' post setup?
   
Any other opinions out there on how to proceed?
Thanks,
St.Ack
   
1. http://incubator.apache.org/guides/sites.html
2. http://www.apache.org/dev/cms.html
   
  
  
  
   --
   *Lewis*
  
 
 
 



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: What version should be the first apache htrace be? 4.0.0 or 1.0.0?

2014-12-06 Thread Andrew Purtell
What we did for Phoenix is make an initial ASF release that was the granted 
code with a package name search-and-replace and minor version increment. This 
let us focus on all the Apache packaging and release concerns like NOTICE file 
wording, RAT compliance, etc. and provided an opportunity for existing users to 
migrate to an ASF artifact at low risk - just package renames. Then we made a 
major version increment and put in some significant new features for that next 
release. 



 On Dec 5, 2014, at 9:36 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 I think going backwards to 1.0 would be confusing for any existing users.
 Maybe make the -incubating releases pickup 3.1.x with the intention of the
 first graduated release being 4.0.0. Could be seen as artificially
 inflating the version numbers, but I don't think that matters too much. I
 assume (prefer) we'll follow the guidelines of semantic versioning.
 
 -n
 
 On Friday, December 5, 2014, Colin McCabe cmcc...@alumni.cmu.edu wrote:
 
 I looked at
 http://incubator.apache.org/guides/releasemanagement.html#best-practice-versioning
 and it doesn't say whether we need to start at 1.  Hmm.
 
 I think either way could work.  There is stuff from org.htrace up on
 Maven central, but since we're moving to org.apache.htrace, we won't
 conflict if we choose to go back to 1.0.0.  I don't really have any
 preference between 1.0.0 or 4.0.0.
 
 best,
 Colin
 
 On Fri, Dec 5, 2014 at 7:47 PM, Stack st...@duboce.net javascript:;
 wrote:
 org.htrace was at 3.0.4
 
 The next release could be 4.0.0.
 
 Or we could roll back and make it 1.0.0?
 
 Any opinions out there?
 
 Thanks,
 St.Ack