Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Adrian Cole
Just speaking on the OpenTracing vs whatever part. What colin mentioned is
correct. It is a library api defined for tracing and not an implementation
of a tracer or a backend.

That said, there are certain backends that are preferred, notably lightstep
and jaeger (by uber). This is because folks here did most of the defining
even if others do participate. This affects a view of what tracing is
inside OT. Notably, both have a view that logging is tracing (ex it is ok
and sometimes encouraged to push system logs into a span). These opinions
are sometimes encouraged through presentations etc which might make it a
better or worse fit as an Htrace replacement. For example, most in zipkin
are not keen on escalating it to a logging system as it was not designed
for this, and similarly to here, we couldnt afford to accept more
responsibility like that.

HTrace is almost never mentioned in OpenTracing discussions except when I
do. That by itself has been troubling to me as if it were meant to be
neutral it should have been mentioned constantly and impacting design.
Anyway..

The "actual dapper" team which is called census have spun up and are moving
fast. This has no backend yet but most can or soon will report to zipkin.
https://github.com/census-instrumentation

Most important to all of this imho is that the jury is out on whether
instrumentation libraries are indeed shared. For example, eventhough
amazon, microsoft, dynatrace app dynamics, new relic, facebook etc all know
about OpenTracing, it isnt what they are using as a core api. In some cases
it is because they have an event layer instead, and in others it is that
they prefer a data type approach as opposed to a dictated library
interface. Some in OpenTracing have struggled to influence the project
around points they have felt important, notably propagation, and wrote
their own bespoke layers or wrappers to handle it properly. Some of this is
fixable in OT, but imho the change dynamics, culture and leadership have
not changed since inception.

Many zipkin users use OpenTracing libraries, probably due to the high level
of staff and marketing they have behind the effort. For example, Red Hat
staff write a lot of things faster than volunteers can. That said, many
zipkin users prefer existing, especially well attended, libraries by the
project or ecosystem. Looking at github, native adoption is far more than
OT. In many cases, users roll their own still. This is not the same as lack
of a complete choice.. developers can, do and continue to write their own
code if given a spec on how to do it. This is also true in OpenTracing,
except there you need to both know the abstraction and the backend to write
custom.

OpenTracing is in CNCF now, as is their preferred system Jaeger. As far as
I know you wouldnt also be in ASF, but I dont know if that matters. Census
is likely to be CNCF because google (but I have no insight, just a guess).
Zipkin is on hold wrt foundation, we didnt have enough ummph to get to one
last year, so jury is still out.

Personally, I think Census have a lot of things right, ex separation of
concerns between logging metrics tracing and propagation. That said I think
all could learn from htrace or collaborate regardless of this outcome.

On 18 Aug 2017 05:57, "Colin McCabe"  wrote:

On Thu, Aug 17, 2017, at 14:40, Andrew Purtell wrote:
> > That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.
>
> This is an issue. I'm glad Hadoop RPC is covered, but nobody but Hadoop
> uses it. Likewise, HBase RP These are not general purpose RPC stacks by
> any stretch. There are some of those around. Some have tracing built in.
> They take some of the oxygen out of the room. I think that is a fair
> point when thinking about the viability of a podling that sees little
activity
> as it is.

Yeah-- maybe we should integrate HTrace into HBase RPC as well.

I don't think RPC-specific trace systems have been a strong competitors.
 Since the RPC landscape is so fragmented, those systems tend to not get
used by many people.  Our strongest open source competitors, OpenTracing
and OpenZipkin, support multiple RPC systems.  (Zipkin originally was
specific to Finagle, but that is no longer true.)

> I didn't come here to suggest HTrace go away, though. I came to raise a
> few points on why adoption and use of HTrace has very likely suffered from
> usability problems. These problems are still not completely resolved.
> Stack describes HTrace integration with HBase as broken. My experience
has been
> I have to patch POMs, and patch HDFS, HBase, and Phoenix code, to get
> anything that works at all. I also sought to tie some of those problems
> to ecosystem issues because I know it is hard. For what it's worth,
thanks.

I think you make some very good points about the difficulty of doing
cross-project coordination.  One thing that really held back HTrace 4.0
was that it was originally scheduled 

Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Colin McCabe
On Thu, Aug 17, 2017, at 14:40, Andrew Purtell wrote:
> > That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.
> 
> This is an issue. I'm glad Hadoop RPC is covered, but nobody but Hadoop
> uses it. Likewise, HBase RP These are not general purpose RPC stacks by
> any stretch. There are some of those around. Some have tracing built in.
> They take some of the oxygen out of the room. I think that is a fair
> point when thinking about the viability of a podling that sees little activity
> as it is.

Yeah-- maybe we should integrate HTrace into HBase RPC as well.

I don't think RPC-specific trace systems have been a strong competitors.
 Since the RPC landscape is so fragmented, those systems tend to not get
used by many people.  Our strongest open source competitors, OpenTracing
and OpenZipkin, support multiple RPC systems.  (Zipkin originally was
specific to Finagle, but that is no longer true.)

> I didn't come here to suggest HTrace go away, though. I came to raise a
> few points on why adoption and use of HTrace has very likely suffered from
> usability problems. These problems are still not completely resolved.
> Stack describes HTrace integration with HBase as broken. My experience has 
> been
> I have to patch POMs, and patch HDFS, HBase, and Phoenix code, to get
> anything that works at all. I also sought to tie some of those problems
> to ecosystem issues because I know it is hard. For what it's worth, thanks.

I think you make some very good points about the difficulty of doing
cross-project coordination.  One thing that really held back HTrace 4.0
was that it was originally scheduled to be part of Hadoop 2.8-- and the
Hadoop 2.8 release was delayed for a really, really long time, to the
point when it almost became a punchline.  So people had to use vendor
releases to get HTrace 4, because those were the only releases with new
Hadoop code.

Colin


> 
> 
> 
> On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe  wrote:
> 
> > On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > > What about OpenTracing (http://opentracing.io/)? Is this the successor
> > > project to ZipKin? In particular grpc-opentracing (
> > > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > > fulfill in open source the tracing architecture described in the Dapper
> > > paper.
> >
> > OpenTracing is essentially an API which sits on top of another tracing
> > system.
> >
> > So you can instrument your code with the OpenTracing library, and then
> > have that send the trace spans to OpenZipkin.
> >
> > Here are some thoughts here about this topic from a Zipkin developer:
> > https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> > c7#what-is-the-relationship-between-zipkin-and-opentracing
> > .  Probably Adrian Cole can chime in here as well.
> >
> > In general the OpenTracing folks have been friendly and respectful.  (If
> > any of them are reading this, I apologize for not following some of the
> > discussions on gitter more thoroughly-- my time is just split so many
> > ways right now!)
> >
> > >
> > > If one takes a step back and looks at all of the hand rolled RPC stacks
> > > in
> > > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > > everyone
> > > migrated to a single RPC stack - gRPC - would provide the unified tracing
> > > layer envisioned by HTrace. The tracing integration is then done exactly
> > > in
> > > one place. In contrast HTrace requires all of the components to sprinkle
> > > spans throughout the application code.
> > >
> >
> > That's not the issue.  We already have HTrace integration with Hadoop
> > RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> > system is actually very straightforward-- you just add two fields to the
> > base RPC request definition, and patch the RPC system to use them.
> >
> > Just instrumenting RPC is not sufficient.  You need programmers to add
> > explicit span annotations to your code so that you can have useful
> > information beyond what a program like wireshark would find.  Things
> > like what disk is a request hitting, what HBase PUT is an HDFS write
> > associated with, and so forth.
> >
> > Also, this is getting off topic, but there is a new RPC system every
> > year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> > REST/JSON, etc.  They all have advantages and disadvantages.  For
> > example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> > and performance problems with the protobuf-java library.  I wish GPRC
> > luck, but I think it's good for people to experiment with different
> > libraries.  It doesn't make sense to try to force everyone to use one
> > thing, even if we could.
> >
> > > The Hadoop ecosystem is always partially at odds with itself, if for no
> > > other reason than there is no shared vision among the projects. There are
> > > no coordinated releases. There isn't 

Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Andrew Purtell
> That's not the issue.  We already have HTrace integration with Hadoop
RPC, such that a Hadoop RPC creates a span.

This is an issue. I'm glad Hadoop RPC is covered, but nobody but Hadoop
uses it. Likewise, HBase RPC. These are not general purpose RPC stacks by
any stretch. There are some of those around. Some have tracing built in.
They take some of the oxygen out of the room. I think that is a fair point
when thinking about the viability of a podling that sees little activity as
it is.

I didn't come here to suggest HTrace go away, though. I came to raise a few
points on why adoption and use of HTrace has very likely suffered from
usability problems. These problems are still not completely resolved. Stack
describes HTrace integration with HBase as broken. My experience has been I
have to patch POMs, and patch HDFS, HBase, and Phoenix code, to get
anything that works at all. I also sought to tie some of those problems to
ecosystem issues because I know it is hard. For what it's worth, thanks.



On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe  wrote:

> On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > What about OpenTracing (http://opentracing.io/)? Is this the successor
> > project to ZipKin? In particular grpc-opentracing (
> > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > fulfill in open source the tracing architecture described in the Dapper
> > paper.
>
> OpenTracing is essentially an API which sits on top of another tracing
> system.
>
> So you can instrument your code with the OpenTracing library, and then
> have that send the trace spans to OpenZipkin.
>
> Here are some thoughts here about this topic from a Zipkin developer:
> https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> c7#what-is-the-relationship-between-zipkin-and-opentracing
> .  Probably Adrian Cole can chime in here as well.
>
> In general the OpenTracing folks have been friendly and respectful.  (If
> any of them are reading this, I apologize for not following some of the
> discussions on gitter more thoroughly-- my time is just split so many
> ways right now!)
>
> >
> > If one takes a step back and looks at all of the hand rolled RPC stacks
> > in
> > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > everyone
> > migrated to a single RPC stack - gRPC - would provide the unified tracing
> > layer envisioned by HTrace. The tracing integration is then done exactly
> > in
> > one place. In contrast HTrace requires all of the components to sprinkle
> > spans throughout the application code.
> >
>
> That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> system is actually very straightforward-- you just add two fields to the
> base RPC request definition, and patch the RPC system to use them.
>
> Just instrumenting RPC is not sufficient.  You need programmers to add
> explicit span annotations to your code so that you can have useful
> information beyond what a program like wireshark would find.  Things
> like what disk is a request hitting, what HBase PUT is an HDFS write
> associated with, and so forth.
>
> Also, this is getting off topic, but there is a new RPC system every
> year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> REST/JSON, etc.  They all have advantages and disadvantages.  For
> example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> and performance problems with the protobuf-java library.  I wish GPRC
> luck, but I think it's good for people to experiment with different
> libraries.  It doesn't make sense to try to force everyone to use one
> thing, even if we could.
>
> > The Hadoop ecosystem is always partially at odds with itself, if for no
> > other reason than there is no shared vision among the projects. There are
> > no coordinated releases. There isn't even agreement on which version of
> > shared dependencies to use (hence the recurring pain in various places
> > with
> > downstream version changes of protobuf, guava, jackson, etc. etc).
> > Therefore HTrace is severely constrained on what API changes can be made.
> > Unfortunately the different major versions of HTrace do not interoperate
> > at
> > all. And are not even source compatible. While is not unreasonable at all
> > for a project in incubation, when combined with the inability of the
> > Hadoop
> > ecosystem to coordinate releases as a cross-cutting dependency ships a
> > new
> > version, this has reduced the utility of HTrace to effectively nil for
> > the
> > average user. I am sorry to say that. Only a commercial Hadoop vendor or
> > power user can be expected to patch and build a stack that actually
> > works.
>
> One correction: The different major versions of HTrace are indeed source
> code compatible.  You can build an application that can use both HTrace
> 3 and HTrace 4.  This was absolutely essential for us because of the
> 

Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Andrew Purtell
> The different major versions of HTrace are indeed source code compatible.

Maybe the issue was going from 2 to 3. At the time it was a real problem,
change or removal of a span id constant, and another time something to do
with setting parent-child span relationships, IIRC. If this is better
between 3 and 4 then the point no longer applies.


On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe  wrote:

> On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > What about OpenTracing (http://opentracing.io/)? Is this the successor
> > project to ZipKin? In particular grpc-opentracing (
> > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > fulfill in open source the tracing architecture described in the Dapper
> > paper.
>
> OpenTracing is essentially an API which sits on top of another tracing
> system.
>
> So you can instrument your code with the OpenTracing library, and then
> have that send the trace spans to OpenZipkin.
>
> Here are some thoughts here about this topic from a Zipkin developer:
> https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> c7#what-is-the-relationship-between-zipkin-and-opentracing
> .  Probably Adrian Cole can chime in here as well.
>
> In general the OpenTracing folks have been friendly and respectful.  (If
> any of them are reading this, I apologize for not following some of the
> discussions on gitter more thoroughly-- my time is just split so many
> ways right now!)
>
> >
> > If one takes a step back and looks at all of the hand rolled RPC stacks
> > in
> > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > everyone
> > migrated to a single RPC stack - gRPC - would provide the unified tracing
> > layer envisioned by HTrace. The tracing integration is then done exactly
> > in
> > one place. In contrast HTrace requires all of the components to sprinkle
> > spans throughout the application code.
> >
>
> That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> system is actually very straightforward-- you just add two fields to the
> base RPC request definition, and patch the RPC system to use them.
>
> Just instrumenting RPC is not sufficient.  You need programmers to add
> explicit span annotations to your code so that you can have useful
> information beyond what a program like wireshark would find.  Things
> like what disk is a request hitting, what HBase PUT is an HDFS write
> associated with, and so forth.
>
> Also, this is getting off topic, but there is a new RPC system every
> year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> REST/JSON, etc.  They all have advantages and disadvantages.  For
> example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> and performance problems with the protobuf-java library.  I wish GPRC
> luck, but I think it's good for people to experiment with different
> libraries.  It doesn't make sense to try to force everyone to use one
> thing, even if we could.
>
> > The Hadoop ecosystem is always partially at odds with itself, if for no
> > other reason than there is no shared vision among the projects. There are
> > no coordinated releases. There isn't even agreement on which version of
> > shared dependencies to use (hence the recurring pain in various places
> > with
> > downstream version changes of protobuf, guava, jackson, etc. etc).
> > Therefore HTrace is severely constrained on what API changes can be made.
> > Unfortunately the different major versions of HTrace do not interoperate
> > at
> > all. And are not even source compatible. While is not unreasonable at all
> > for a project in incubation, when combined with the inability of the
> > Hadoop
> > ecosystem to coordinate releases as a cross-cutting dependency ships a
> > new
> > version, this has reduced the utility of HTrace to effectively nil for
> > the
> > average user. I am sorry to say that. Only a commercial Hadoop vendor or
> > power user can be expected to patch and build a stack that actually
> > works.
>
> One correction: The different major versions of HTrace are indeed source
> code compatible.  You can build an application that can use both HTrace
> 3 and HTrace 4.  This was absolutely essential for us because of the
> version skew issues you mention.
>
> > On Thu, Aug 17, 2017 at 11:04 AM, lewis john mcgibbney <
> lewi...@apache.org> wrote:
> >
> > > Hi Mike,
> > > I think this is a fair question. We've probably all been associated
> with
> > > projects which just don't really make it. It would appear that HTrace
> is
> > > one of them. This is not to say that there is nothing going on with the
> > > tracing effort generally (as there is) but it looks like HTrace as a
> > > project may be headed to the Attic.
> > > I suppose the response to this thread will determine what happens...
>
> Thanks, Lewis.
>
> I think maybe we should try to identify the top tracing priorities 

Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Colin McCabe
On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> What about OpenTracing (http://opentracing.io/)? Is this the successor
> project to ZipKin? In particular grpc-opentracing (
> https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> fulfill in open source the tracing architecture described in the Dapper
> paper.

OpenTracing is essentially an API which sits on top of another tracing
system.

So you can instrument your code with the OpenTracing library, and then
have that send the trace spans to OpenZipkin.

Here are some thoughts here about this topic from a Zipkin developer: 
https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55c7#what-is-the-relationship-between-zipkin-and-opentracing
.  Probably Adrian Cole can chime in here as well.

In general the OpenTracing folks have been friendly and respectful.  (If
any of them are reading this, I apologize for not following some of the
discussions on gitter more thoroughly-- my time is just split so many
ways right now!)

> 
> If one takes a step back and looks at all of the hand rolled RPC stacks
> in
> the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> everyone
> migrated to a single RPC stack - gRPC - would provide the unified tracing
> layer envisioned by HTrace. The tracing integration is then done exactly
> in
> one place. In contrast HTrace requires all of the components to sprinkle
> spans throughout the application code.
> 

That's not the issue.  We already have HTrace integration with Hadoop
RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
system is actually very straightforward-- you just add two fields to the
base RPC request definition, and patch the RPC system to use them.

Just instrumenting RPC is not sufficient.  You need programmers to add
explicit span annotations to your code so that you can have useful
information beyond what a program like wireshark would find.  Things
like what disk is a request hitting, what HBase PUT is an HDFS write
associated with, and so forth.

Also, this is getting off topic, but there is a new RPC system every
year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
REST/JSON, etc.  They all have advantages and disadvantages.  For
example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
and performance problems with the protobuf-java library.  I wish GPRC
luck, but I think it's good for people to experiment with different
libraries.  It doesn't make sense to try to force everyone to use one
thing, even if we could.

> The Hadoop ecosystem is always partially at odds with itself, if for no
> other reason than there is no shared vision among the projects. There are
> no coordinated releases. There isn't even agreement on which version of
> shared dependencies to use (hence the recurring pain in various places
> with
> downstream version changes of protobuf, guava, jackson, etc. etc).
> Therefore HTrace is severely constrained on what API changes can be made.
> Unfortunately the different major versions of HTrace do not interoperate
> at
> all. And are not even source compatible. While is not unreasonable at all
> for a project in incubation, when combined with the inability of the
> Hadoop
> ecosystem to coordinate releases as a cross-cutting dependency ships a
> new
> version, this has reduced the utility of HTrace to effectively nil for
> the
> average user. I am sorry to say that. Only a commercial Hadoop vendor or
> power user can be expected to patch and build a stack that actually
> works.

One correction: The different major versions of HTrace are indeed source
code compatible.  You can build an application that can use both HTrace
3 and HTrace 4.  This was absolutely essential for us because of the
version skew issues you mention.

> On Thu, Aug 17, 2017 at 11:04 AM, lewis john mcgibbney  
> wrote:
> 
> > Hi Mike,
> > I think this is a fair question. We've probably all been associated with
> > projects which just don't really make it. It would appear that HTrace is
> > one of them. This is not to say that there is nothing going on with the
> > tracing effort generally (as there is) but it looks like HTrace as a
> > project may be headed to the Attic.
> > I suppose the response to this thread will determine what happens...

Thanks, Lewis.

I think maybe we should try to identify the top tracing priorities for
HBase and HDFS and see how HTrace / OpenTracing / OpenZipkin could fit
into those.  Just start from a nice crisp set of requirements, like
Stack suggested, and think about how we could make those a reality.  If
we can advance the state of tracing in hadoop, that will be a good thing
for our users, even if htrace goes to the attic.  I've been mostly
working on Apache Kafka these days but I could drop by to brainstorm.

best,
Colin


> > Lewis
> > ​​
> >
> >
> > On Wed, Aug 16, 2017 at 10:01 AM, <
> > dev-digest-h...@htrace.incubator.apache.org> wrote:
> >
> > >
> > > From: Mike Drob 

Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Andrew Purtell
What about OpenTracing (http://opentracing.io/)? Is this the successor
project to ZipKin? In particular grpc-opentracing (
https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
fulfill in open source the tracing architecture described in the Dapper
paper.

If one takes a step back and looks at all of the hand rolled RPC stacks in
the Hadoop ecosystem it's a mess. It is a heavier lift but getting everyone
migrated to a single RPC stack - gRPC - would provide the unified tracing
layer envisioned by HTrace. The tracing integration is then done exactly in
one place. In contrast HTrace requires all of the components to sprinkle
spans throughout the application code.

The Hadoop ecosystem is always partially at odds with itself, if for no
other reason than there is no shared vision among the projects. There are
no coordinated releases. There isn't even agreement on which version of
shared dependencies to use (hence the recurring pain in various places with
downstream version changes of protobuf, guava, jackson, etc. etc).
Therefore HTrace is severely constrained on what API changes can be made.
Unfortunately the different major versions of HTrace do not interoperate at
all. And are not even source compatible. While is not unreasonable at all
for a project in incubation, when combined with the inability of the Hadoop
ecosystem to coordinate releases as a cross-cutting dependency ships a new
version, this has reduced the utility of HTrace to effectively nil for the
average user. I am sorry to say that. Only a commercial Hadoop vendor or
power user can be expected to patch and build a stack that actually works.
​​

On Thu, A
​​
ug 17, 2017 at 11:04 AM, lewis john mcgibbney  wrote:

> Hi Mike,
> I think this is a fair question. We've probably all been associated with
> projects which just don't really make it. It would appear that HTrace is
> one of them. This is not to say that there is nothing going on with the
> tracing effort generally (as there is) but it looks like HTrace as a
> project may be headed to the Attic.
> I suppose the response to this thread will determine what happens...
> Lewis
> ​​
>
>
> On Wed, Aug 16, 2017 at 10:01 AM, <
> dev-digest-h...@htrace.incubator.apache.org> wrote:
>
> >
> > From: Mike Drob 
> > To: dev@htrace.incubator.apache.org
> > Cc:
> > Bcc:
> > Date: Wed, 16 Aug 2017 12:00:49 -0500
> > Subject: [DISCUSS] Attic podling Apache HTrace?
> > Hi folks,
> >
> > Want to bring up a potentially uncofortable topic for some. Is it time to
> > retire/attic the project?
> >
> > We've seen a minimal amount of activity in the past year. The last
> release
> > had two bug fixes, and had been pending for several months before
> somebody
> > reminded me to push the artifacts to subversion from the staging
> directory.
> >
> > I'd love to see a renewed set of activity here, but I don't think there
> is
> > a ton of interest going on.
> >
> > HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> > which is a good sign, but I haven't heard much from them recently. I
> > definitely do no think we are at the point where a lack of releases and
> > activity is a sign of super advanced maturity and stability.
> >
> > Your thoughts?
> >
> > Mike
> >
> >
>
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread lewis john mcgibbney
Hi Mike,
I think this is a fair question. We've probably all been associated with
projects which just don't really make it. It would appear that HTrace is
one of them. This is not to say that there is nothing going on with the
tracing effort generally (as there is) but it looks like HTrace as a
project may be headed to the Attic.
I suppose the response to this thread will determine what happens...
Lewis

On Wed, Aug 16, 2017 at 10:01 AM, <
dev-digest-h...@htrace.incubator.apache.org> wrote:

>
> From: Mike Drob 
> To: dev@htrace.incubator.apache.org
> Cc:
> Bcc:
> Date: Wed, 16 Aug 2017 12:00:49 -0500
> Subject: [DISCUSS] Attic podling Apache HTrace?
> Hi folks,
>
> Want to bring up a potentially uncofortable topic for some. Is it time to
> retire/attic the project?
>
> We've seen a minimal amount of activity in the past year. The last release
> had two bug fixes, and had been pending for several months before somebody
> reminded me to push the artifacts to subversion from the staging directory.
>
> I'd love to see a renewed set of activity here, but I don't think there is
> a ton of interest going on.
>
> HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> which is a good sign, but I haven't heard much from them recently. I
> definitely do no think we are at the point where a lack of releases and
> activity is a sign of super advanced maturity and stability.
>
> Your thoughts?
>
> Mike
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Colin McCabe
Thanks for bringing this up, Mike.

The original vision for HTrace was trying to unify a bunch of disparate
Hadoop components with a unified tracing layer.  This would allow us to
debug slowness or odd behavior in a much better way.  We started from
that vision and deduced the need to build a frontend API (htrace-core),
backend data store (htrace-hbase, htrace-htraced, etc.), and web UI
(htrace-web).

I still think that vision is valid, but achieving it was a lot harder
than we expected, for a couple of reasons.

First of all, I think building all those components needed someone (or
maybe several someones) to work on it full time.  We tried to do it part
time with a few HDFS and HBase committers.  Ultimately this didn't scale
as much as we needed it to.

Secondly, we were hoping for a lot of buy-in from Hadoop vendors and big
tech companies that used Hadoop.  Unfortunately, we didn't really get
that.  The Hadoop vendors were preoccupied with other things.  Big tech
companies seem to mostly developed their own internal systems using bits
and pieces of open source.  I think this is another area where we just
needed more budget.  In retrospect, having meetups and reaching out to
potential users is something we needed to do.  There are some other
projects that have been a lot better with this than we have.

I think we should try to refocus on some core use-cases.  Basically,
decide what we want to achieve and find the shortest path to that.  If
that involves using other projects, then that's fine-- as long as they
are open source projects compatible with the ideals of the ASF.

Off the top of my head, I can think of a few core use-cases:

* Why is my HDFS request slow?   Figure out if there are disk issues or
network issues.

* Why is my HBase request slow?  Follow HBase requests into the HDFS
layer.

* Who is making the most requests to HDFS?

* What average speed is Hadoop getting from its S3 requests?  How often
do we hit our local caches, versus going over the network?

best,
Colin


On Thu, Aug 17, 2017, at 10:04, Stack wrote:
> On Wed, Aug 16, 2017 at 10:00 AM, Mike Drob  wrote:
> 
> > Hi folks,
> >
> > Want to bring up a potentially uncofortable topic for some. Is it time to
> > retire/attic the project?
> >
> > We've seen a minimal amount of activity in the past year. The last release
> > had two bug fixes, and had been pending for several months before somebody
> > reminded me to push the artifacts to subversion from the staging directory.
> >
> > I'd love to see a renewed set of activity here, but I don't think there is
> > a ton of interest going on.
> >
> > HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> > which is a good sign, but I haven't heard much from them recently. I
> > definitely do no think we are at the point where a lack of releases and
> > activity is a sign of super advanced maturity and stability.
> >
> > Your thoughts?
> 
> 
> Thanks Mike for starting this thread.
> 
> Activity over the last year is here [1].
> 
> Is there any testimony other than evangelizing presentations on how
> htrace
> has provided a benefit?
> 
> HTrace needs a bit of work. In order of import:
> 
> 1. A complete viewer (punt and use zipkin instead?)
> 2. Hooked up systems that tell wholesome trace stories: hdfs is
> incomplete,
> hbase is broke, accumulo/unknown, phoenix/custom-htrace... who else?
> 3. Work needs to be done so an operator can easily enable/disable trace
> and
> easily obtain views without impinging upon general perf
> 
> It could do w/ an API cleanup (v5.0.0?) and study of the fact that it is
> painstaking manual work adding it into a system (and that it is
> subsequently easily damaged by code movement). It needs a particular type
> of barker to drive it cross-project since the cross-project realm is when
> it starts to come into its own (and each project in its turn will resist
> since the benefit not immediate), etc.
> 
> None of the above is under active dev.
> 
> St.Ack
> 
> 1. https://github.com/apache/incubator-htrace/graphs/commit-activity
> 
> 
> 
> > Mike
> >


Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Stack
On Wed, Aug 16, 2017 at 10:00 AM, Mike Drob  wrote:

> Hi folks,
>
> Want to bring up a potentially uncofortable topic for some. Is it time to
> retire/attic the project?
>
> We've seen a minimal amount of activity in the past year. The last release
> had two bug fixes, and had been pending for several months before somebody
> reminded me to push the artifacts to subversion from the staging directory.
>
> I'd love to see a renewed set of activity here, but I don't think there is
> a ton of interest going on.
>
> HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> which is a good sign, but I haven't heard much from them recently. I
> definitely do no think we are at the point where a lack of releases and
> activity is a sign of super advanced maturity and stability.
>
> Your thoughts?


Thanks Mike for starting this thread.

Activity over the last year is here [1].

Is there any testimony other than evangelizing presentations on how htrace
has provided a benefit?

HTrace needs a bit of work. In order of import:

1. A complete viewer (punt and use zipkin instead?)
2. Hooked up systems that tell wholesome trace stories: hdfs is incomplete,
hbase is broke, accumulo/unknown, phoenix/custom-htrace... who else?
3. Work needs to be done so an operator can easily enable/disable trace and
easily obtain views without impinging upon general perf

It could do w/ an API cleanup (v5.0.0?) and study of the fact that it is
painstaking manual work adding it into a system (and that it is
subsequently easily damaged by code movement). It needs a particular type
of barker to drive it cross-project since the cross-project realm is when
it starts to come into its own (and each project in its turn will resist
since the benefit not immediate), etc.

None of the above is under active dev.

St.Ack

1. https://github.com/apache/incubator-htrace/graphs/commit-activity



> Mike
>


Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Adrian Cole
> What are the likely alternatives for downstream projects that want 
> distributed tracing?
Yes, for general purpose or RPC, but I think HTrace is still
positioned well for data services specifically.

> Do we think the field still has a big gap that HTrace can solve?
When at twitter (a couple yrs ago now), I know the data team preferred
htrace eventhough we had zipkin. Most of the tracing projects out
there do not focus on data services, or only recently do. While HTrace
may not be great at filling gaps in traditional RPC (as others do this
well enough), it probably does still have compelling advantages in
data services. I think the main holdback is getting the word out
and/or showing examples where the model and UI really shines in
HTrace's sweet spot (data services).

my 2p


Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Sean Busbey
What are the likely alternatives for downstream projects that want distributed 
tracing?

Do we think the field still has a big gap that HTrace can solve?

On 2017-08-16 12:00, Mike Drob  wrote: 
> Hi folks,
> 
> Want to bring up a potentially uncofortable topic for some. Is it time to
> retire/attic the project?
> 
> We've seen a minimal amount of activity in the past year. The last release
> had two bug fixes, and had been pending for several months before somebody
> reminded me to push the artifacts to subversion from the staging directory.
> 
> I'd love to see a renewed set of activity here, but I don't think there is
> a ton of interest going on.
> 
> HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> which is a good sign, but I haven't heard much from them recently. I
> definitely do no think we are at the point where a lack of releases and
> activity is a sign of super advanced maturity and stability.
> 
> Your thoughts?
> 
> Mike
> 


Re: [DISCUSS] Attic podling Apache HTrace?

2017-08-17 Thread Masatake Iwasaki

Hi Mike,

Thanks for putting this issue up.

> Want to bring up a potentially uncofortable topic for some. Is it time to
> retire/attic the project?

I would like to keep the project alive.
While we are silent for months,
many of the committers are still working on projects
using HTrace (such as Hadoop and HBase) and
we are capable to make new release if new major issues are found.

> HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> which is a good sign, but I haven't heard much from them recently.

I will look into HBASE-14451 again and try to make it move forward.
Since one of the intent of big change in HTrace-4 is
making better end-to-end tracing (e.g. from HBase to HDFS),
bumping HTrace in HBase up to 4 would reveal the next task.

Regards,
Masatake Iwasaki

On 8/17/17 02:00, Mike Drob wrote:

Hi folks,

Want to bring up a potentially uncofortable topic for some. Is it time to
retire/attic the project?

We've seen a minimal amount of activity in the past year. The last release
had two bug fixes, and had been pending for several months before somebody
reminded me to push the artifacts to subversion from the staging directory.

I'd love to see a renewed set of activity here, but I don't think there is
a ton of interest going on.

HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
which is a good sign, but I haven't heard much from them recently. I
definitely do no think we are at the point where a lack of releases and
activity is a sign of super advanced maturity and stability.

Your thoughts?

Mike