[jira] [Commented] (HADOOP-15566) Remove HTrace support

2019-04-02 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807619#comment-16807619
 ] 

Elek, Marton commented on HADOOP-15566:
---

Thanks the questions [~bogdandrutu]

q. What implementation will be shipped with the official HBase binary?

I don't know it depends from the HBase imho. With using OT we can use multiple 
implementation, HBase can provide any implementation.

q. How can somebody use a different implementation?

It should be configurable. AFAIK the only vendor specific part is the 
initialization code. It's easy to create an interface to initialize different 
implementation (eg. a class name which should be called to initialize the 
implementation.

q. How do you ensure that a different implementation (that is not tested with 
your entire test suite) may not corrupt user data? I think it is very important 
that all the tests are running with the implementation that user uses in 
production.

I don't think that we need to test all the implementation. We should prove that 
the OT api used well and use one implementation as an example. And we clearly 
write the documentation what is tested and what is not. Not tested 
implementations which are provided by other vendors can be used but should be 
tested before by the users.

q. use one implementation (pick something that you like the most) and export 
these data to an configurable endpoint

Interesting. Can you please give me more details? What is the configurable 
endpoint? How the tracing information would be stored?

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2019-01-31 Thread Bogdan Drutu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757657#comment-16757657
 ] 

Bogdan Drutu commented on HADOOP-15566:
---

[~elek] - I have some questions about OT integration:

1) What implementation will be shipped with the official HBase binary?

2) How can somebody use a different implementation?

3) How do you ensure that a different implementation (that is not tested with 
your entire test suite) may not corrupt user data? I think it is very important 
that all the tests are running with the implementation that user uses in 
production.

FYI: I know I am biased but I think that a different approach is better here, 
use one implementation (pick something that you like the most) and export these 
data to an configurable endpoint. Then every vendor can consume that format.

 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2019-01-26 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752975#comment-16752975
 ] 

Elek, Marton commented on HADOOP-15566:
---

Thanks the idea [~cmccabe]. It's interesting.

Fix me If I am wrong, but as I see the HTrace is not designed to be extensible. 
For example the Span is an interface but Tracer always creates the MilliSpan 
implementation. To use HTrace as a lightweight layer and support multiple 
tracing implementation (such as opentracing or opencensus) we need to refactor 
the HTrace code. I have two problems with this approach:

 1) The new refactored HTrace won't be compatible the old HTrace. Would be hard 
to support old HTrace.

 2) It wold be equivalent to resurrect the HTrace which is voted to retire. 
(The some thing can be done without importing HTrace code to the Hadoop but 
refactor it on the HTrace side)

But it's a valid concern about creating a new layer (even if Cassandra also 
followed this approach as @mck wrote it). For me it's hard to compare the 
complexity of maintain an own lightweight abstraction layer and maintaining 
HTrace. (Even if the first one seems to be easier).

I think the real alternative here is just to use OpenTracing (despite the 
concerns about the governance raised by [~michaelsembwever]) And follow the 
approach which is prototyped by [~jojochuang], [~fabbri], [~rizaon])

Or (as a first step) it could be added to the existing HTrace code, 
side-by-side, to evaluate it.

 

 

 

 

 

 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2019-01-24 Thread Colin P. McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751408#comment-16751408
 ] 

Colin P. McCabe commented on HADOOP-15566:
--

HTrace *is* "a lightweight Hadoop API for the tracing where multiple 
implementation can be plugged in." :) The "H" originally stood for "Hadoop."  
So you could just move the HTrace API classes into hadoop-common, and then have 
people continue using Zipkin or something as the backend.  And / or write an 
opentracing backend to interface with those systems.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-12-10 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714435#comment-16714435
 ] 

Elek, Marton commented on HADOOP-15566:
---

Thanks [~cmccabe], I agree with your points about the importance of the 
compatibility and to keep the htrace support.

My proposal is:

1.) Create a lightweight Hadoop API for the tracing where multiple 
implementation can be plugged in

2.) Provide a default implementation which uses the existing htrace code.

Implementation details:

a) Add a new optional bytes field for the RpcHeader. Different tracing 
libraries could require different size of serialized context:
{code:java}
diff --git a/hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto 
b/hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto
index aa146162896..e42f64eb631 100644
--- a/hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto
+++ b/hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto
@@ -61,9 +61,9 @@ enum RpcKindProto {
  * what span caused the new span we will create when this message is received.
  */
 message RPCTraceInfoProto {
optional int64 traceId = 1; // parentIdHigh
optional int64 parentId = 2; // parentIdLow

+optional bytes tracingContext = 3; //generic tracingInformation
 }
{code}
This is a a backward-compatible change.

b) In the rpc Server.java a (htrace) TraceScope is initialized based on the rpc 
header and propagated as part of the RpcCall:
{code:java}
  RpcCall call = new RpcCall(this, header.getCallId(),
  header.getRetryCount(), rpcRequest,
  ProtoUtil.convert(header.getRpcKind()),
  header.getClientId().toByteArray(), traceScope, callerContext);
{code}
I propose to replace this traceScope with a hadoop specific TraceScope marker 
interface. The default implementation could be a simple class which contains 
the htrace implementation.

c. We can create a simple Tracing singleton (similar to the 
DefaultMetricsSystem):

Example call:
{code:java}
  try (TracingSpan context = 
HadoopTracing.INSTANCE.newContext(call.tracingSpan, "RpcServerCall")) {
if (remoteUser != null) {
  remoteUser.doAs(call);
} else {
  call.run();
}
}
{code}
d. HadoopTracing could be something like this:
{code:java}
package org.apache.hadoop.tracing;

public enum HadoopTracing {
  INSTANCE;

  private TracingProvider provider;

  public TracingSpan importContext(byte[] data) {
return provider.importContext(data);
  }

  public byte[] exportContext() {
return provider.exportContext();
  }

  public TracingSpan newContext(String name) {
return provider.newContext(name);
  }

  public TracingSpan newContext(TracingSpan parentSpan, String name) {
return null;
  }
}
{code}
e. We can add multiple TracingProvider (and provide one for Htrace for 
compatibility reason.)

+1. Personally I prefer to use some utility which adds trace support to 
specific methods which are annotated. It could simplify the usage of the 
tracing but requires java proxy. But this is an independent question.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-11-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705267#comment-16705267
 ] 

Sean Busbey commented on HADOOP-15566:
--

bq. Sean Busby did a lot of work on shading the Hadoop CP --targeting HBase, 
but it's not been rounded off with all the hadoop-tools modules yet, including 
the cloud storage connectors. Someone needs to volunteer to embrace shading

I don't want to get this jira sidetracked, but could you point me at more 
details on the gap here? I was under the assumption that hadoop-tools stuff was 
project internal and thus didn't need shading.

In the downstream facing shading we expressly don't shade HTrace because doing 
so breaks some of its functionality (tracing from application through libraries 
within the same JVM).

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-11-30 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704820#comment-16704820
 ] 

Steve Loughran commented on HADOOP-15566:
-

bq. It is definitely sad that it didn't make it out of the incubator. There is 
clearly a need for this kind of work in Hadoop and in other projects

yes it is sad, yes there is a need.

Sean Busby did a lot of work on shading the Hadoop CP --targeting HBase, but 
it's not been rounded off with all the hadoop-tools modules yet, including the 
cloud storage connectors. Someone needs to volunteer to embrace shading

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-11-29 Thread Colin P. McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704118#comment-16704118
 ] 

Colin P. McCabe commented on HADOOP-15566:
--

Hi folks,

I just saw this JIRA while searching for something else.  I was one of the guys 
who worked on HTrace, both on the Hadoop integration side and on the HTrace 
project itself.  It is definitely sad that it didn't make it out of the 
incubator.  There is clearly a need for this kind of work in Hadoop and in 
other projects.

I don't have a strong opinion about which other tracing API should be used in 
Hadoop.  I would caution everyone that Hadoop's compatibility shackles are 
heavy -- very heavy indeed.  Just to give an example, a typical Hadoop 
installation might have HDFS, HBase, and Phoenix installed.  These projects all 
have separate developers, PMCs, and release cycles, but expect to be able to 
share the same CLASSPATH happily.  Projects often push back very hard on trying 
to update library dependencies, especially in "minor" releases.  To add to 
that, people often stay on older stable versions of Hadoop for years.

In theory, Hadoop vendors offer a snaphot of the full Hadoop stack, carefully 
configured so that things work together.  In practice, libraries are not always 
harmonized as well as we would like.  Some users want to mix and match versions 
of things, or not even use a vendor distribution at all.  This makes setting up 
end-to-end tracing pretty difficult.

There were some efforts to add better CLASSPATH isolation to Hadoop.  I haven't 
kept up with those, so I don't know how much this situation has improved.

I do think that the idea of keeping HTrace around as a shim API might make 
sense for Hadoop.  This would mean that adding support for a new version of the 
OpenTracing or Zipkin library would only require updating that shim code in 
hadoop-common, rather than trying to coordinate changes across a dozen Hadoop 
projects.

Also, HTrace already has code to export spans to Zipkin, if that helps.  I 
think it would be relatively straightforward to write the same thing for 
opentracing as well.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-09-21 Thread Carlos Alberto Cortez (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623548#comment-16623548
 ] 

Carlos Alberto Cortez commented on HADOOP-15566:


Hi all,

I went ahead and did a Proof-Of-Concept migration from HTrace to OpenTracing 
(using Zipkin as the backend). You can inspect (and play with the code) here: 
[https://github.com/apache/hadoop/compare/trunk...carlosalberto:ot_initial_integration]

Some notes:

1. It creates by default a Tracer instance based on Zipkin running in localhost 
(for simplicity purposes).
2. It uses the notion of a GlobalTracer so create and register and use the 
Tracer from a single place.
3. As Wei-Chiu mentioned, it needed some small extra work to pass around the 
parent-child relationship (which is done trough `SpanId` in HTrace, and 
`SpanContext` in OpenTracing).
4. Added a new SpanContext field in the clases using protobuf to pass trace 
info.

As mentioned, this is a POC, but hope this can throw light into this (and I'm 
happy to answer questions or contribute with this as an actual migration ;) )

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-08-15 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581746#comment-16581746
 ] 

Andrew Purtell commented on HADOOP-15566:
-

What about a HTrace facade for Brave (Zipkin)? 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-08-15 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581273#comment-16581273
 ] 

stack commented on HADOOP-15566:


[~elek] Thanks. Or we could just strip htrace. This would remove any friction 
caused by its injection. This would address the issue title and bulk of the 
description. 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-08-15 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580831#comment-16580831
 ] 

Elek, Marton commented on HADOOP-15566:
---

It seems that there is no consensus, yet. On the other hand (AFAIK) htrace is 
used only at a few places in the hadoop source tree.

Can we create a very lightweight hadoop specific tracing builder and use it in 
the hadoop code? And a generic field to the rpc? Is it possible to support 
multiple tracing implementations? (Existing HTrace could be the default 
implementation and we can provide ot/oc implementations). 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-31 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563253#comment-16563253
 ] 

Elek, Marton commented on HADOOP-15566:
---

Just for the reference, the links for the started mailing list discussions:

https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201807.mbox/%3CCADiq6%3DxdAuPT5q8PNdXBnSODzniKw2zBGo-z9PwCA2_mrDc7wg%40mail.gmail.com%3E

https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201807.mbox/%3cCADcMMgEkJ=OqhJ83-aPFQZ+TZ+5BH=7w6-tsahd9hlpuc3e...@mail.gmail.com%3e

(Thanks to [~stack] and [~jojochuang])
 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-30 Thread Ted Young (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563157#comment-16563157
 ] 

Ted Young commented on HADOOP-15566:


Hi there, I work on the OpenTracing project w/ Ben, thought I would weigh in!

I feel like there is somewhat an apples to oranges comparison going on here. To 
clarify what we are trying to do with OpenTracing:
 * the instrumentation API should be an abstract interface, and should not 
expose implementation details. That's the whole point, it's not about 
additional features.
 * The fact that some clients ship with nifty features, such as z-pages, is 
actually an argument FOR an abstract interface, not against it. You can easily 
put a client with z-pages (or whatever new feature comes next) behind an 
abstract interface. Arguing that abstraction should be abandoned because a 
particular implementation has a useful feature doesn't make any sense. This no 
different than LightStep or any other vendor arguing that you should bake in 
their tracing client because it has a special feature. It's a form of 
implementation lock-in, which is easily avoided. The whole reason we've been 
working on an abstracted interface for the past several years is to decouple 
these choices. So it's not either/or. Use a good client behind an abstraction, 
that's all.
 * Likewise with a wire protocol. I also support the w3c protocol under 
development. But it is most definitely still under development. The v00 
prototype version is still being mutated, and we haven't even had a meeting yet 
to compare notes about initial implementations. What would be the point in 
adding any instrumentation code which baked in something in this state? It's 
better to use this - or any other wire protocol that the users of a hadoop may 
want to use - behind an interface which allows them to swap it out without 
rewriting code. This includes swapping in future versions of the w3c headers.

 

Again, just to reiterate: arguments about how particular clients may expose 
data usefully - or otherwise have special additional features - and arguments 
about the benefits of one wire protocol vs another, are actually arguments FOR 
an abstract instrumentation API. You really want these choices decoupled. 
Better implementation details may exist tomorrow, and the versioning/packaging 
of a tracing subsystem should be orthogonal to the versioning of Hadoop itself.

Hope that adds some clarity! FWIW, I wrote a longer-form version of of my 
thinking here a couple months ago, if you want more detail: 
[https://opensource.com/article/18/5/distributed-tracing]

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562619#comment-16562619
 ] 

stack commented on HADOOP-15566:


bq. I just think we're going to be hard-pressed to make an informed decision 
without pairings of trace visualizations (ideally in many tracing systems to 
illustrate portability) and the respective instrumentation code to illustrate 
non-bloat / maintainability stack you were suggesting we try this on dev – 
any pointers to a non-HDFS / non-HBase expert for a place to focus on for such 
an exercise?

Yeah. I just started a DISCUSS thread that points here up on dev-common. 
Hopefully, we'll attract doers/volunteers.

What you thinking [~bensigelman]? You (or your company) running a compare of 
libs -- OT/OC/Hacked HTrace -- for a neutral party/volunteer to evaluate?

bq. I wonder if it's would be worth evaluating writing a 
htrace-api->opentracing-java or htace-api->census or htrace-api->zipkin...

I just did a refresher and unfortunately it'd be a bit of awkward work to do 
[~michaelsembwever]. HTrace core entities -- probably the font of friction 
(We'd have to check; we could for sure do some fixup around when no trace 
enabled) -- are classes rather than Interfaces and do work passing Spans though 
no trace enabled. The other awkward fact is that there are two htrace APIs 
afloat in Hadoop currently, an htrace3 in older Hadoops and an htrace4 (though 
in different packages).

Getting traces into zipkin though should be easy enough. htrace dumps to 
spanreceiver implementations and these are easy to write and plugin.

[~bogdandrutu] Thanks boss for the OC input. The local-view (z-pages) makes 
sense. Nice instrumentation example over in the hbase client for talking to 
(cloud) bigtable too (smile) -- 
https://github.com/GoogleCloudPlatform/cloud-bigtable-client.







> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-30 Thread BOGDAN DRUTU (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562025#comment-16562025
 ] 

BOGDAN DRUTU commented on HADOOP-15566:
---

Hello all,

First sorry for jumping into this issue, but I will try to be short (edited 
after I finished the comment: I was wrong) and as much possible project 
independent (for the record I am one of the main contributor in OpenCensus, 
also in my previous life I debugged a lot of BigTable issues using the same 
technology as OpenCensus).

Some comments about other comments in this issue:

[~bensigelman] - FYI: OpenCensus does not enforce any wire format. The format 
is configurable and we are adding support for the w3c standard.

[~elek] - About OT vs OC my personal opinion is the philosophy behind these 
projects, OT was designed with a mindset of being an open-source API for 
vendors to implement and because of these certain tradeoffs were made to help 
some vendors (as [~michaelsembwever] mentioned), OC was designed to be a fully 
implemented library that supports multiple different backend (Zipkin, Jagger, 
Stackdriver, AppInsight, etc.) as well as in-process debugging capabilities. 
For example one of the key feature that I used a lot when I debugged BigTable 
issues is what OpenCensus calls z-pages (in-process handlers to track active 
requests, in-memory latency based sampled spans, stats, etc.). You can take a 
look here [https://opencensus.io/core-concepts/z-pages/#1].

Based on my small experience there are 3 components that are critical in the 
instrumentation of a service:
 # Wire propagation (I saw a previous discussion about this). 
[https://github.com/w3c/distributed-tracing] - it is a w3c standard proposed by 
couple of APM vendors and cloud providers. Even though the format is mostly 
focus on HTTP requests HBase can define their own format if needed, the only 
requirement being the ability to propagate all fields defined in the format 
(trace-id, span-id, trace-options and tracestate). This part is critical when 
HBase is used as a service (e.g. something like Google Bigtable which works 
with the HBase client), having standard fields that are propagated allows 
service owners to correlate incoming requests from a customer with the internal 
trace. Also similar issue may occur when only HDFS is used as a service.
 # APIs to start/end a span, record tracing events, etc. There are multiple 
open source APIs including (OpenCensus, OpenTracing, Zipkin, etc.).
 # In-process propagation. This can be implemented in two ways: explicitly 
propagate the current "Span" between function calls, runnable, callable, etc. 
or implicitly usually using a thread-local mechanism. From a previous comment 
from [~stack] about keeping this working, my personal experience is that you 
can achieve this using the "implicit" mechanism described before by having a 
clean context api (for an example of a context api that works good I can 
recommend the [https://grpc.io/grpc-java/javadoc/io/grpc/Context.html)] and 
ensure that all async calls are wrapped accordingly (e.g wrapping all 
Executors), the "explicit" mechanism may be very hard to maintain and based on 
my experience annoying for developers. This part is very important when 
instrumenting the HBase client (which I think should be instrumented in order 
to debug more complex issues) because the client is used as a library and a 
standard way to propagate the current Span is very important in order to 
continue the same trace between client application and bigtable client.

When OpenCensus was designed I thought that it is very important that the 
library ensures all these 3 components are covered. Some may say that the 1) it 
is not important when deployed internally but with the new cloud providers this 
becomes more common, others may say that 3) it is not important but when 
instrument client libraries (like HBase client) this becomes very important in 
my opinion. FYI there are other libraries that solve these issues as well like 
Zipkin, etc. but I am not here to suggest one particular library, just to 
explain the concepts, issues and what is important to think about.

 

In my personal opinion OpenTracing does not deal very well with 1 and 3 
(probably on purpose) but I am not an expert in OpenTracing or one of the 
owner/author/co-author so I cannot comment on what is good or what is bad in 
their design choices.

 

These are my thoughts about what you should consider when you pick one library 
vs other. Related to OpenCensus we are happy to help if you have any questions 
about our design choices, or about stats/metrics support in OpenCensus and why 
we think that these are very important as well.

 

PS: Hope the comment makes sense, it became larger than expected but I tried to 
give an overview of the whole instrumentation issue.

> Remove HTrace support
> -
>
> Key: 

[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-29 Thread Adrian Cole (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561363#comment-16561363
 ] 

Adrian Cole commented on HADOOP-15566:
--

TL;DR; I would advise evaluating all the options, perhaps by resurrecting a 
small part of htrace in order to give a more seamless migration and support 
path. This allows *sites* to participate in the decision making *before 
committing to an approach.*

 

 

Depending on choices permitted, this might imply api or model changes to make 
it work.. doing this decoupled from hadoop moves the thrash to where it belongs.

 

 

Ironically, while at twitter the data services team preferred htrace to zipkin, 
eventhough zipkin was there. It would be nice to both have a focus on brown 
field, like a solution that works with today and tomorrow. *Many won't upgrade 
hadoop for many years* to 3.1. Sites should be preferred and deferring input 
from them, we should try to act on their behalf... saying again thrash behind 
the api before considering thrashing an api.

 

Resurrecting the "api" part would also allow a less conjectured guide to moving 
forward, one that has to firstly tackle concerns technically, such as parents. 
It is easy to say how something might work and another thing entirely to have 
it work, and have it work efficiently, and have it work in ways that are safe. 
Doing this buys more time to make informed decisions, have people who have 
never worked on data systems a chance to get that experience first. Even in 
services tracing, we've noticed a lot of things left to end users to sort out.. 
seems data services should have even more rigor.

 

For example, HTrace code includes a lot of guards that prevent excess network 
communication. These things are inconsistent across OT as threading concerns 
are an implementation detail, there is neither a spec nor TCK on reporting, 
except some guidance to be good. Census one could conjecture would be good for 
hadoop if it is good for google internally with bigtable. However, even that 
shouldn't be left to conjecture. Many ecosystems have a fair amount of full 
time staff, and possibly could use those staff towards vetting of the concerns 
already implemented by the htrace libraries.

 

Anyway I hope this response is not ruffling feathers.. I've tried hard to not 
have it do such. While less qualified than some to participate in this 
discussion, you can look at the source history and otherwise. I have personally 
fixed code here and elsewhere to make interop work. I also collaborated with a 
site owner to open up the transports. I primarily take care of the openzipkin 
volunteer community even if I am paid a salary. I don't make any more or less 
money if hadoop chooses one thing vs another.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-29 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561322#comment-16561322
 ] 

mck commented on HADOOP-15566:
--

bq. Thats this stuff: 
https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/tracing
 ?

That's correct. It's a bit different i'm presuming for the hadoop ecosphere as 
its tracing api is htrace. 
So I'm speaking off-the-cuff but I wonder if it's would be worth evaluating 
writing a htrace-api->opentracing-java or htace-api->census or 
htrace-api->zipkin (as many backends now accept zipkin traces, in fact more 
than opentracing last time i checked, so zipkin might well be considered thee 
de facto standard atm) layer. But of these could form a template to help others 
to write htrace->xyz plugins. While htrace may be disappearing, maintaining 
just its api in this form for plugin, may not be a big deal, and provides 
end-to-end tracing in many *existing* hadoop ecosystems.

bq. Could try re-emitting existing (h)traces to zipkin – it used to work – or 
whatever sink. 

Yup, that's what I was trying to explain above. But as a plugin. Folk will 
appreciate that they only need to instrument one api rather than a whole 
ecosystem again. And I wouldn't be comfortable betting on one abstraction layer 
over another, not right now.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-29 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561193#comment-16561193
 ] 

Ben Sigelman commented on HADOOP-15566:
---

Re the actual technical issue (there's a PS below about the more FUD-oriented 
points): rather than expecting maintainers of every ASF storage system *and* 
the maintainers of every distributed tracing system to (a) decide on the 
nuances of a data model, then (b) write bindings from "Storage System X's" 
tracing hooks to "Tracing System Y's" client library (for all combinations of 
X's and Y's), we can instrument the ASF storage systems with a single API that 
has been specifically designed to be portable.

To address [~stack]'s question about performance, the noop implementation of 
OpenTracing tracers amount to an empty function call but avoid the costs and/or 
lock contention of generating random numbers, context objects, and so forth.

Another point that [~stack] made:
{quote}For me, the hard part is not which tracing lib to use – if a tracing lib 
discussion, lets do it out on dev?
{quote}
I 90% agree with this. Certainly as a response to [~michaelsembwever], in any 
case, I would be glad to see a side-by-side using OpenTracing vs "something 
custom" to understand the amount of *additional* work required to actually get 
end-to-end tracing to work. That said, doing the tracing lib analysis "on dev" 
should also take the application developer experience into account... whatever 
we decide to do must require a minimum of configuration work (or educational 
work) for application developers, and that means that we should think hard 
about being agnostic about the tracing system "above" the storage systems under 
consideration here – ideally we are able to plug into any of them without 
forcing the application developer / operator to write new code or go on a 
yak-shaving mission.

 

As a concrete next step, I would be curious to see the code / branch that 
[~jojochuang] used to generate the OT+Jaeger screenshots above. I would also 
like to create a dev branch of HDFS or Cassandra that adds "native" OpenTracing 
instrumentation to a distributed code path that the HDFS devs think would be 
instructive/representative... I just think we're going to be hard-pressed to 
make an informed decision without pairings of trace visualizations (ideally in 
many tracing systems to illustrate portability) *and* the respective 
instrumentation code to illustrate non-bloat / maintainability. Would that be 
useful? [~stack] you were suggesting we try this on dev – any pointers to a 
non-HDFS / non-HBase expert for a place to focus on for such an exercise?

 

 

 

{color:#707070}PS: {color}[~michaelsembwever]{color:#707070}, that was a lot of 
FUD to pack into one message ("bloat its API with vendor concerns", "hostile to 
the ASF", "hostile ... to those tracing solutions those vendors see as 
competition", etc). These concerns were also presented without any evidence – 
unsurprisingly, as I doubt that evidence exists. OpenTracing's two most common 
"pairings" are Zipkin and Jaeger, neither of which are commercial solutions. To 
the contrary of what you suggest, the API is intentionally – if not primarily – 
designed to focus on _describing system behavior_ rather than the concerns of 
any downstream tracing system (OSS or commercial). All OpenTracing meetings are 
recorded and the notes are public if people here would like to judge for 
themselves about the openness and intent of the actual decision process (as 
opposed to the one you described/imagined). For those who want a primer on what 
we're up to, I would recommend reading either [this doc that I wrote when we 
were just getting 
started|https://medium.com/opentracing/towards-turnkey-distributed-tracing-5f4297d1736],
 or [this more recent doc explaining how OT fits into the larger 
ecosystem|https://medium.com/opentracing/the-difference-between-tracing-tracing-and-tracing-84b49b2d54ea]
 that's developed in the interim.{color}

 

 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might 

[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561007#comment-16561007
 ] 

stack commented on HADOOP-15566:


Thanks for the input [~michaelsembwever].

bq.  as the effort is more in adding the instrumentation code in the first 
place, and not so much writing the abstraction layer.

Agree

bq. With Cassandra ...of maintaining the existing tracing code as the 
abstraction layer, and allowing plugins to it.

Thats this stuff: 
https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/tracing
 ?

Could try re-emitting existing (h)traces to zipkin -- it used to work -- or 
whatever sink. Would also need to fix it so trace inserts are friction-free 
when disabled (currently they drag).



> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-28 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560933#comment-16560933
 ] 

mck commented on HADOOP-15566:
--

I've become pretty ho-hum about OpenTracing, and I write that as one of the 
original authors to OpenTracing-Java.

It's not the de facto abstraction layer many presume it to be. Having 
participated in the tracing community the past 5 years, being there as Zipkin 
became one community from many github forks into OpenZipkin, and now mentoring 
SkyWalking through the incubator process and into the ASF, I was at first a big 
fan of OT and promoted it at conferences. In the beginning it did hold a lot of 
potential to become that de facto standard. As time went by we've seen it 
become controlled by commercial interests, bloat its API with vendor concerns, 
and be at times hostile to the ASF and to those tracing solutions those vendors 
see as competition. Part of this is how the commercial world works I accept, 
but I have used it in conference presentations as a counter-example to why the 
Apache Way is so important when what we want is project stability. 

With Cassandra we took the approach of maintaining the existing tracing code as 
the abstraction layer, and allowing plugins to it. This proved the easiest 
approach as the effort is more in adding the instrumentation code in the first 
place, and not so much writing the abstraction layer. A Cassandra to Zipkin 
plugin was added, along with a Cassandra to OpenTracing plugin, but the latter 
was dropped as it became obvious that writing a Cassandra plugin to whatever 
tracing solution you wanted was not really so much work.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554695#comment-16554695
 ] 

Steve Loughran commented on HADOOP-15566:
-

Stack is of course correct: we want this stuff used end-to-end. We do this 
today with logging across our JARs; we need something beyond logging to track 
down performance/blame across everything.

Avoiding dictating "you must use reporting tool X" for your analysis limits 
which people will want to use the tracing, and so how broadly it gets used. I 
don't want to have to worry about what they do with that data, 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-23 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553366#comment-16553366
 ] 

Elek, Marton commented on HADOOP-15566:
---

[~bensigelman] Thank you very much your answer. It was very informative and 
reasonable arguments. Especially the last paragraph:

{quote}
Also, building a general-purpose adapter to convert OpenTracing instrumentation 
into OpenCensus API calls would be straightforward (due to the relative 
"thickness" and numbers of implementation assumptions made in each project). 
Going the other way would be challenging or impossible, depending on reliance 
on OpenCensus wire formats.
{quote}

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553144#comment-16553144
 ] 

stack commented on HADOOP-15566:


For me, the hard part is not which tracing lib to use -- if a tracing lib 
discussion, lets do it out on dev? We should also invite others to the 
discussion -- but rather discussion around resourcing:

 * Ensuring traces tell a good narrative across the different code paths and 
over processes, and that trace paths remain intact across code churn; they are 
brittle and easily broken/disconnected as dev goes on.
* Instrumenting/coverage -- inserting trace points is time consuming whose 
value is only realized down-the-road by operator/dev trying to figure a 
slowdown (so the https://github.com/opentracing-contrib/java-tracerresolver 
looks interesting).
* Tooling to enable tracing and visualize needs to be easy-to-deploy and use 
else all will go to rot (Some orgs trace every transaction with a simple switch 
for dumping to visualizer that is up and always available..)
* Ensuring traces are friction-free else they'll be removed or not taken-on in 
the first place.
* Evangelizing and pushing trace across hadoop components; the more components 
instrumented, the more we all will benefit.

Thanks.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-23 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552907#comment-16552907
 ] 

Ben Sigelman commented on HADOOP-15566:
---

[~elek] the projects have similar goals but take different approaches. 
OpenTracing's surface area is intentionally "as narrow as possible" which means 
that it brings in almost no dependencies (OpenCensus is more of a 
fully-featured "agent" model, which necessarily gives it a larger footprint). 
OpenTracing also makes no assumptions about the serialization formats (or 
header names, etc) between peered processes in the distributed 
system/application, or the serialization format of the tracing system itself. 
This means that OpenTracing instrumentation can be used/reused for a wider 
variety of things: straightforward distributed trace 
collectors/indexers/viewers like Zipkin, Jaeger, etc, but also distributed 
debuggers, security applications, and so forth.

Also, building a general-purpose adapter to convert OpenTracing instrumentation 
into OpenCensus API calls would be straightforward (due to the relative 
"thickness" and numbers of implementation assumptions made in each project). 
Going the other way would be challenging or impossible, depending on reliance 
on OpenCensus wire formats.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-23 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552808#comment-16552808
 ] 

Elek, Marton commented on HADOOP-15566:
---

As far as I know the problem could be solved with both Opentracing and 
Opencensus. Is there any reason to prefer opentracing?

What would be the advantages/disadvantages to use OC/OT?

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-07 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535828#comment-16535828
 ] 

Ben Sigelman commented on HADOOP-15566:
---

[~ste...@apache.org] I agree that a new field (for the tracing context) makes 
the most sense from a compatibility standpoint.

[~jojochuang]: are there specific blockers or questions you have about the OT 
port? If so, let me know and I'll do my best to address/answer them. I'm also 
happy to help with the mechanics of the change (or find someone with more 
cycles in the OT community to do the same). This can be a really positive thing 
for HDFS, HBase, etc, as the traces within the datastore/filesystem can be 
connected to the traces in the application above, regardless of the particular 
tracing system in use. (I know that at google, it was valuable for both the 
bigtable core team and for bigtable users to see traces that wend their way 
from app code into bigtable and back, esp for slow / poorly-constructed 
queries).

 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534634#comment-16534634
 ] 

Steve Loughran commented on HADOOP-15566:
-

Nice screenshots

* The HDFS team need to be involved in all discussions w.r.t tracing and wire 
protocols; I see Anu is watching; [~jnp] should keep an eye on it too
* And hbase, eg @stack.

I don't see the existing field being reusable unless there's no risk that an 
htrace client -> opentrace server or opentrace client to htrace-enabled server 
isn't going to do bad things. Even if we don't know of anything in production, 
it's part of our [compatibility 
definition|http://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_Protocols].
 Making it a new optional field & ignoring the htrace one is the safe route.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-02 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530630#comment-16530630
 ] 

Ben Sigelman commented on HADOOP-15566:
---

The screenshots are really nice to see!

I'm not sure how you all like to work, but I am happy to help discuss how to 
make all of this work from an OpenTracing best-practices standpoint (and/or try 
to find people to help with the instrumentation or porting effort).

 

There was a question about about Tracer impls:
{quote}I can see people might want an implementation that is more neutral, For 
example, Jaeger comes from Uber, and people might not want to use it (hey, any 
Lyft developers here? :))
{quote}
Typically the idiom is to let the user pass in a `Tracer` impl dynamically, but 
fall back on the `GlobalTracer` mechanism if no user-specified `Tracer` was 
provided. There's also a contributed (and wholly optional) OpenTracing utility 
to do `Tracer` injection dynamically (i.e., with zero code modification): 
[https://github.com/opentracing-contrib/java-tracerresolver]

 

Also, re wire protocols: OpenTracing is designed to be intentionally agnostic 
about wire protocols and abstracts serialization ("injection") and 
deserialization ("extraction") into the `Tracer` implementation. If there are 
questions about best practices around this, please @-mention me and I'll do my 
best to help.

 

(Thanks again, all)

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-02 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530265#comment-16530265
 ] 

Aaron Fabbri commented on HADOOP-15566:
---

Nice work [~jojochuang].  I had fun hacking on this for a day.  Attaching a 
screenshot from the S3A tracing I added, uploading a file to S3.

 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530226#comment-16530226
 ] 

Steve Loughran commented on HADOOP-15566:
-

bq. we'll need to update client -> namenode RPC messages, as well as client -> 
datanode RPC, KMS Rest API. So wire compatibility needs to be considered. (Some 
messages already carries htrace trace id. Would it make sense to replace the 
htrace trace id field with opentracing trace id field? 

if it breaks wire compatibility, unless is a protobuf optional field, it'll be 
an incompatible protocol change. If the field is reused, the servers need to 
handle the situation of "older client with htrace enabled makes RPC call to 
server with opentrace". 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-02 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530149#comment-16530149
 ] 

Wei-Chiu Chuang commented on HADOOP-15566:
--

Attached a screenshot of a hdfs client data write pipeline trace.  !Screen Shot 
2018-06-29 at 11.59.16 AM.png! 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-02 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530144#comment-16530144
 ] 

Wei-Chiu Chuang commented on HADOOP-15566:
--

Hi Ben!
With the help from [~tlipcon], I worked with [~fabbri] and [~rizaon] and spent 
a day or two on porting htrace to opentracing. It turns out to be a quite fun 
exercise.

Most of the porting is mechanical, changing htrace span to opentracing span; 
took me a while to figure out how to pass trace id in opentracing, but doable. 
I was even able to add a few more tracing code that was lacking before.

Some observation I have:
# porting the code in Hadoop seems straightforward.
# I am not aware of any one using htrace in production. So I don't expect too 
much resistance in replacing it. (Shout out if this is not the case)
# By embracing opentracing, which is becoming the de facto tracing standard, it 
makes it possible to trace end-to-end, from non-Hadoop applications into Hadoop.

Some possible hurdles
# To pass trace id around, we'll need to update client -> namenode RPC 
messages, as well as client -> datanode RPC, KMS Rest API. So wire 
compatibility needs to be considered. (Some messages already carries htrace 
trace id. Would it make sense to replace the htrace trace id field with 
opentracing trace id field? Or should the opentracing trace id be appended? 
Hopefully there's not much overhead)
# opentracing is just a set of APIs. We used Jaeger as the implementation. I 
can see people might want an implementation that is more neutral, For example, 
Jaeger comes from Uber, and people might not want to use it (hey, any Lyft 
developers here? :))
# Community adoption: I am aware Hbase uses Htrace. So if we switch to 
opentracing, there'll need some coordination to convince HBase community to 
switch too (I'd be happy to contribute). And I am hoping to convince other 
communities to adopt opentracing as well. It's not too interesting if 
opentracing is adopted in Hadoop but not in Hive or Spark or Kafka.

Thoughts?

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-01 Thread Ben Sigelman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529150#comment-16529150
 ] 

Ben Sigelman commented on HADOOP-15566:
---

Hi all – someone sent this my way. I am one of the opentracing co-creators and 
would be delighted to collaborate on adding (minimal-dependency, lightweight) 
OpenTracing instrumentation to ASF projects. In serendipitous news, we have 
recently added some resources to help with the actual instrumentation work for 
well-used and well-loved projects like those in the ASF.

 

(PS: I am generally oversubscribed and can be bad at things like JIRA, but am 
100% happy to help here... if I'm being flaky about Jira responses, please 
reach out to me at [b...@gmail.com|mailto:b...@gmail.com] where I maintain a 
better SLA ;))

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-06-28 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526619#comment-16526619
 ] 

Aaron Fabbri commented on HADOOP-15566:
---

Agreed.  Htrace is great but suffered from everyone being too busy to give it 
the love it needed to develop.

We're going to spend some time seeing if we can plug in an opentracing 
implementation today.  Will report back with any interesting findings.

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-06-27 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525547#comment-16525547
 ] 

Steve Loughran commented on HADOOP-15566:
-

It's a shame to hear about HTrace.

We've an outstanding JIRA to add it to S3A ( HADOOP-12949) , and HADOOP-15407 
includes it in the ABFS connector, so I'd like to have an alternative. All we 
really want in the Hadoop code is the instrumentation to publish information, 
and to propagate context information all the way down from applications. And we 
want those applications to wire up to it too, obviously. 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org