[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-29 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561193#comment-16561193
 ] 

Ben Sigelman commented on HADOOP-15566:
---

Re the actual technical issue (there's a PS below about the more FUD-oriented 
points): rather than expecting maintainers of every ASF storage system *and* 
the maintainers of every distributed tracing system to (a) decide on the 
nuances of a data model, then (b) write bindings from "Storage System X's" 
tracing hooks to "Tracing System Y's" client library (for all combinations of 
X's and Y's), we can instrument the ASF storage systems with a single API that 
has been specifically designed to be portable.

To address [~stack]'s question about performance, the noop implementation of 
OpenTracing tracers amount to an empty function call but avoid the costs and/or 
lock contention of generating random numbers, context objects, and so forth.

Another point that [~stack] made:
{quote}For me, the hard part is not which tracing lib to use – if a tracing lib 
discussion, lets do it out on dev?
{quote}
I 90% agree with this. Certainly as a response to [~michaelsembwever], in any 
case, I would be glad to see a side-by-side using OpenTracing vs "something 
custom" to understand the amount of *additional* work required to actually get 
end-to-end tracing to work. That said, doing the tracing lib analysis "on dev" 
should also take the application developer experience into account... whatever 
we decide to do must require a minimum of configuration work (or educational 
work) for application developers, and that means that we should think hard 
about being agnostic about the tracing system "above" the storage systems under 
consideration here – ideally we are able to plug into any of them without 
forcing the application developer / operator to write new code or go on a 
yak-shaving mission.

 

As a concrete next step, I would be curious to see the code / branch that 
[~jojochuang] used to generate the OT+Jaeger screenshots above. I would also 
like to create a dev branch of HDFS or Cassandra that adds "native" OpenTracing 
instrumentation to a distributed code path that the HDFS devs think would be 
instructive/representative... I just think we're going to be hard-pressed to 
make an informed decision without pairings of trace visualizations (ideally in 
many tracing systems to illustrate portability) *and* the respective 
instrumentation code to illustrate non-bloat / maintainability. Would that be 
useful? [~stack] you were suggesting we try this on dev – any pointers to a 
non-HDFS / non-HBase expert for a place to focus on for such an exercise?

 

 

 

{color:#707070}PS: {color}[~michaelsembwever]{color:#707070}, that was a lot of 
FUD to pack into one message ("bloat its API with vendor concerns", "hostile to 
the ASF", "hostile ... to those tracing solutions those vendors see as 
competition", etc). These concerns were also presented without any evidence – 
unsurprisingly, as I doubt that evidence exists. OpenTracing's two most common 
"pairings" are Zipkin and Jaeger, neither of which are commercial solutions. To 
the contrary of what you suggest, the API is intentionally – if not primarily – 
designed to focus on _describing system behavior_ rather than the concerns of 
any downstream tracing system (OSS or commercial). All OpenTracing meetings are 
recorded and the notes are public if people here would like to judge for 
themselves about the openness and intent of the actual decision process (as 
opposed to the one you described/imagined). For those who want a primer on what 
we're up to, I would recommend reading either [this doc that I wrote when we 
were just getting 
started|https://medium.com/opentracing/towards-turnkey-distributed-tracing-5f4297d1736],
 or [this more recent doc explaining how OT fits into the larger 
ecosystem|https://medium.com/opentracing/the-difference-between-tracing-tracing-and-tracing-84b49b2d54ea]
 that's developed in the interim.{color}

 

 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might 

[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-23 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552907#comment-16552907
 ] 

Ben Sigelman commented on HADOOP-15566:
---

[~elek] the projects have similar goals but take different approaches. 
OpenTracing's surface area is intentionally "as narrow as possible" which means 
that it brings in almost no dependencies (OpenCensus is more of a 
fully-featured "agent" model, which necessarily gives it a larger footprint). 
OpenTracing also makes no assumptions about the serialization formats (or 
header names, etc) between peered processes in the distributed 
system/application, or the serialization format of the tracing system itself. 
This means that OpenTracing instrumentation can be used/reused for a wider 
variety of things: straightforward distributed trace 
collectors/indexers/viewers like Zipkin, Jaeger, etc, but also distributed 
debuggers, security applications, and so forth.

Also, building a general-purpose adapter to convert OpenTracing instrumentation 
into OpenCensus API calls would be straightforward (due to the relative 
"thickness" and numbers of implementation assumptions made in each project). 
Going the other way would be challenging or impossible, depending on reliance 
on OpenCensus wire formats.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-07 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535828#comment-16535828
 ] 

Ben Sigelman commented on HADOOP-15566:
---

[~ste...@apache.org] I agree that a new field (for the tracing context) makes 
the most sense from a compatibility standpoint.

[~jojochuang]: are there specific blockers or questions you have about the OT 
port? If so, let me know and I'll do my best to address/answer them. I'm also 
happy to help with the mechanics of the change (or find someone with more 
cycles in the OT community to do the same). This can be a really positive thing 
for HDFS, HBase, etc, as the traces within the datastore/filesystem can be 
connected to the traces in the application above, regardless of the particular 
tracing system in use. (I know that at google, it was valuable for both the 
bigtable core team and for bigtable users to see traces that wend their way 
from app code into bigtable and back, esp for slow / poorly-constructed 
queries).

 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-02 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530630#comment-16530630
 ] 

Ben Sigelman commented on HADOOP-15566:
---

The screenshots are really nice to see!

I'm not sure how you all like to work, but I am happy to help discuss how to 
make all of this work from an OpenTracing best-practices standpoint (and/or try 
to find people to help with the instrumentation or porting effort).

 

There was a question about about Tracer impls:
{quote}I can see people might want an implementation that is more neutral, For 
example, Jaeger comes from Uber, and people might not want to use it (hey, any 
Lyft developers here? :))
{quote}
Typically the idiom is to let the user pass in a `Tracer` impl dynamically, but 
fall back on the `GlobalTracer` mechanism if no user-specified `Tracer` was 
provided. There's also a contributed (and wholly optional) OpenTracing utility 
to do `Tracer` injection dynamically (i.e., with zero code modification): 
[https://github.com/opentracing-contrib/java-tracerresolver]

 

Also, re wire protocols: OpenTracing is designed to be intentionally agnostic 
about wire protocols and abstracts serialization ("injection") and 
deserialization ("extraction") into the `Tracer` implementation. If there are 
questions about best practices around this, please @-mention me and I'll do my 
best to help.

 

(Thanks again, all)

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-01 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529150#comment-16529150
 ] 

Ben Sigelman commented on HADOOP-15566:
---

Hi all – someone sent this my way. I am one of the opentracing co-creators and 
would be delighted to collaborate on adding (minimal-dependency, lightweight) 
OpenTracing instrumentation to ASF projects. In serendipitous news, we have 
recently added some resources to help with the actual instrumentation work for 
well-used and well-loved projects like those in the ASF.

 

(PS: I am generally oversubscribed and can be bad at things like JIRA, but am 
100% happy to help here... if I'm being flaky about Jira responses, please 
reach out to me at [b...@gmail.com|mailto:b...@gmail.com] where I maintain a 
better SLA ;))

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org