Re: [DISCUSS] Attic podling Apache HTrace?
> Thanks Adrian for the editorial on the landscape. Helps, especially coming > from yourself. we aim to please > Given current state of the project, a retrofit to come up on OT is not the > solution to the topic-at-hand (and besides I have a colored opinion on > taking on the API of another after spending a bunch of time recently > undoing our mistake letting third-party Interfaces and Classes show through > in hbase). sensible for any api highly disconnected from the ecosystem, especially without practice yet. > I appreciate the higher-level point made by Andrew, that it is hard to > thread a cross-cutting library across the Hadoop landscape whether because > releases happen on the geologic time scale or that there is little by way > of coordination. I think this is indeed leading a path towards focus, eg the H in Htrace :) > Can we do a focused 'win' like Colin suggests? E.g. hook up hbase and hdfs > end-to-end with connection to a viewer (zipkin? Or text dumps in a > webpage?). A while back I had a go at the hbase side but it was burning up > the hours just getting it hooked up w/ tests to scream if any spans were > broken in a refactor. I had to put it aside. Incidentally, I wouldn't necessarily say Zipkin is ready out of box because htrace UI and query is more advanced (in some ways due to some data storage options we have available). So, something like this could be a move of focus which would require investment on the other side to avail features needed, or discuss how to upgrade into them (ex if using hbase storage, certain queries would work). It is fair to say zipkin has a great devops pipeline, we are good at fixing things. At the same time, we are imperfect in impl and inexperienced in hadoop ecosystem. Having some way to join together could be really beneficial, at the cost of up-front effort (due to model, UI and storage differences). I would be happy to direct time, though would need some help because of my irrelevance in the data services space (something this might correct!) > Like the rest of you, my time is a little occupied elsewhere these times so > I can't revive the project, not at the moment at least. ack
Re: [DISCUSS] Attic podling Apache HTrace?
e thing that really held back HTrace 4.0 was that it was originally scheduled to be part of Hadoop 2.8-- and the Hadoop 2.8 release was delayed for a really, really long time, to the point when it almost became a punchline. So people had to use vendor releases to get HTrace 4, because those were the only releases with new Hadoop code. Colin > > > > On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe <cmcc...@apache.org> wrote: > > > On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote: > > > What about OpenTracing (http://opentracing.io/)? Is this the successor > > > project to ZipKin? In particular grpc-opentracing ( > > > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally > > > fulfill in open source the tracing architecture described in the Dapper > > > paper. > > > > OpenTracing is essentially an API which sits on top of another tracing > > system. > > > > So you can instrument your code with the OpenTracing library, and then > > have that send the trace spans to OpenZipkin. > > > > Here are some thoughts here about this topic from a Zipkin developer: > > https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55 > > c7#what-is-the-relationship-between-zipkin-and-opentracing > > . Probably Adrian Cole can chime in here as well. > > > > In general the OpenTracing folks have been friendly and respectful. (If > > any of them are reading this, I apologize for not following some of the > > discussions on gitter more thoroughly-- my time is just split so many > > ways right now!) > > > > > > > > If one takes a step back and looks at all of the hand rolled RPC stacks > > > in > > > the Hadoop ecosystem it's a mess. It is a heavier lift but getting > > > everyone > > > migrated to a single RPC stack - gRPC - would provide the unified tracing > > > layer envisioned by HTrace. The tracing integration is then done exactly > > > in > > > one place. In contrast HTrace requires all of the components to sprinkle > > > spans throughout the application code. > > > > > > > That's not the issue. We already have HTrace integration with Hadoop > > RPC, such that a Hadoop RPC creates a span. Integration with any RPC > > system is actually very straightforward-- you just add two fields to the > > base RPC request definition, and patch the RPC system to use them. > > > > Just instrumenting RPC is not sufficient. You need programmers to add > > explicit span annotations to your code so that you can have useful > > information beyond what a program like wireshark would find. Things > > like what disk is a request hitting, what HBase PUT is an HDFS write > > associated with, and so forth. > > > > Also, this is getting off topic, but there is a new RPC system every > > year or two. Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC, > > REST/JSON, etc. They all have advantages and disadvantages. For > > example, GRPC depends on protobuf-- and Hadoop has a lot of deployment > > and performance problems with the protobuf-java library. I wish GPRC > > luck, but I think it's good for people to experiment with different > > libraries. It doesn't make sense to try to force everyone to use one > > thing, even if we could. > > > > > The Hadoop ecosystem is always partially at odds with itself, if for no > > > other reason than there is no shared vision among the projects. There are > > > no coordinated releases. There isn't even agreement on which version of > > > shared dependencies to use (hence the recurring pain in various places > > > with > > > downstream version changes of protobuf, guava, jackson, etc. etc). > > > Therefore HTrace is severely constrained on what API changes can be made. > > > Unfortunately the different major versions of HTrace do not interoperate > > > at > > > all. And are not even source compatible. While is not unreasonable at all > > > for a project in incubation, when combined with the inability of the > > > Hadoop > > > ecosystem to coordinate releases as a cross-cutting dependency ships a > > > new > > > version, this has reduced the utility of HTrace to effectively nil for > > > the > > > average user. I am sorry to say that. Only a commercial Hadoop vendor or > > > power user can be expected to patch and build a stack that actually > > > works. > > > > One correction: The different major versions of HTrace are indeed source > > code compatible. You can build an application that ca
Re: [DISCUSS] Attic podling Apache HTrace?
> What are the likely alternatives for downstream projects that want > distributed tracing? Yes, for general purpose or RPC, but I think HTrace is still positioned well for data services specifically. > Do we think the field still has a big gap that HTrace can solve? When at twitter (a couple yrs ago now), I know the data team preferred htrace eventhough we had zipkin. Most of the tracing projects out there do not focus on data services, or only recently do. While HTrace may not be great at filling gaps in traditional RPC (as others do this well enough), it probably does still have compelling advantages in data services. I think the main holdback is getting the word out and/or showing examples where the model and UI really shines in HTrace's sweet spot (data services). my 2p
Re: [DISCUSS] OpenTracing API implementation
> If nobody steps up to implement it before me or working on it yet, I'll try > to get my hands into it. Unfortunately I'm not sure when I'd have spare > time for it. There will be some work needed in htrace as the OT interface is wider in most places, but not precise in others. It should be easier than zipkin, since htrace has a single span per tracer model. That said there are some key concerns which may or may not play out well, particularly htrace's lack of a trace ID. For example, you'll have choices to make on how to handle nested data structures that are sent via its "log" api. Also, if or how to map "special" properties defined in opentracing such as "span.kind". How to deal with the required propagation apis (for example, how to encode and decode a trace context in binary and text form). Sadly, there's no compatibility kit or interop tests of any kind, so figuring out if things work will largely be up to testers. Anyway, in case it helps, here are a couple bridge projects: https://github.com/openzipkin/brave-opentracing https://github.com/DealerDotCom/sleuth-opentracing > I had two things in mind starting this discussion: > - check that nobody is working on it yet (to avoid wasting time if someone > is); pretty confident you are it. > - check that such work would be useful for someone else. this is an important step! There's certainly work to do, and best to not go at it without at least a user to help q/a. I'd ping their gitter and/or use twitter to recruit others interested if I were you. > I haven't checked with IPMC and legal but opentracing seems to be under > Apache License v2.0 and has no externtal deps, so it should be ok as > external dependency. agreed, though bear in mind OT java at least is <1.0, so you'd need to plan for how to manage version updates (since htrace is more coarse grained at >1.0. Good luck on your journey!
Re: How to deal with htrace conversion of values to base64
ok well update your issue here, then? https://github.com/openzipkin/zipkin/issues/1455 There's a test ScribeSpanConsumerTest which covers some base64 related concerns. It is unfortunate that base64 is a requirement as indeed there are edge cases. https://github.com/openzipkin/zipkin/blob/master/zipkin-collector/scribe/src/test/java/zipkin/collector/scribe/ScribeSpanConsumerTest.java#L152 -A On Fri, Dec 30, 2016 at 10:37 PM, Raam Rosh-Hai <r...@findhotel.net> wrote: > Hey Adrian, > > After poking around, and changing org.apache.htrace.impl.ScribeTransport > which didn't seem to solve this issue I am starting to think it's on the > zipkin side seeing that most of the data is displayed correctly other then > the KV annotations. further more the annotations themselves are being sent > to zipkin in a binary format and only the thrift object is base64 encoded. > I am trying to find that place that encodes the annotations to base64 but > no luck up until now in htrace > > On 23 December 2016 at 10:21, Adrian Cole <adrian.f.c...@gmail.com> wrote: > >> My guess is that this has to do with url encoding. Can you patch >> org.apache.htrace.impl.ScribeTransport to use >> encodeBase64URLSafeString instead of encodeBase64String? >> >> that might answer it.. >> >> On Fri, Dec 23, 2016 at 5:04 PM, Raam Rosh-Hai <r...@findhotel.net> wrote: >> > Hi St.Ack, >> > >> > Thank you for your reply, I am using "org.apache.htrace" % >> "htrace-core4" % >> > "4.1.0-incubating", >> > "org.apache.htrace" % "htrace-zipkin" % "4.1.0-incubating" in scala. >> > I was looking at an older version of htrace (non incubating master on >> > github) and now I see your are no longer doing that. >> > >> > What I am getting in zipkin UI is a malformed base64 string. where the >> `/` >> > were converted to `_` after debugging the zipkin receiver it seems like >> the >> > spans are sent correctly, maybe you have an idea what can go wrong? >> > >> > On 22 December 2016 at 18:47, Stack <st...@duboce.net> wrote: >> > >> >> Show us where in the code this is happening Raam Rosh-Hai and tell us >> what >> >> version of htrace you are using. Thanks. >> >> St.Ack >> >> >> >> On Thu, Dec 22, 2016 at 3:41 AM, Raam Rosh-Hai <r...@findhotel.net> >> wrote: >> >> >> >> > I am saving a simple string value and it seems like the trace >> >> > pipkin connector is converting all values to base64 I then get the >> base64 >> >> > values in the zipkin fronted, any suggestions? >> >> > >> >> > Thanks, >> >> > Raam >> >> > >> >> >>
Re: How to deal with htrace conversion of values to base64
My guess is that this has to do with url encoding. Can you patch org.apache.htrace.impl.ScribeTransport to use encodeBase64URLSafeString instead of encodeBase64String? that might answer it.. On Fri, Dec 23, 2016 at 5:04 PM, Raam Rosh-Haiwrote: > Hi St.Ack, > > Thank you for your reply, I am using "org.apache.htrace" % "htrace-core4" % > "4.1.0-incubating", > "org.apache.htrace" % "htrace-zipkin" % "4.1.0-incubating" in scala. > I was looking at an older version of htrace (non incubating master on > github) and now I see your are no longer doing that. > > What I am getting in zipkin UI is a malformed base64 string. where the `/` > were converted to `_` after debugging the zipkin receiver it seems like the > spans are sent correctly, maybe you have an idea what can go wrong? > > On 22 December 2016 at 18:47, Stack wrote: > >> Show us where in the code this is happening Raam Rosh-Hai and tell us what >> version of htrace you are using. Thanks. >> St.Ack >> >> On Thu, Dec 22, 2016 at 3:41 AM, Raam Rosh-Hai wrote: >> >> > I am saving a simple string value and it seems like the trace >> > pipkin connector is converting all values to base64 I then get the base64 >> > values in the zipkin fronted, any suggestions? >> > >> > Thanks, >> > Raam >> > >>
Re: [DISCUSS] Release for HTrace/Drive Towards Graduation
OK, well do announce once you've decided how to do a CI pipeline that includes pre-commit testing and context-sensitive review comments. I look forward to it.
Re: Build instructions for Htrace
> +1. Creating a docker build environment for Jenkins has been on our to-do > list for a while. It would be a great contribution for someone. It seems > like it would also help anyone who wanted to build the project as well. Not used it yet, but dotci seems to be available for jenkins folks http://groupon.github.io/DotCi/ I suspect a good first step is getting docker-compose working in general? That would be independent of what sort of CI slave would invoke it.