[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603146#comment-15603146 ] Mike Drob commented on SOLR-9641: - Which one is the "default"? I see {{./solr/example/exampledocs/solr.xml}} and {{./solr/server/solr/solr.xml}} > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603141#comment-15603141 ] Mike Drob commented on SOLR-9641: - bq. in CoreContainer there is one zkSys.getZkController().getNodeName() and one getZkController().getNodeName() call, they could be combined into one call with result kept in local variable or both could use or not use zkSys for clarity. Done. bq. In SearchHandler, how about also having trace scopes for the handleResponses and finishStage steps? Or if the intention is to only trace component methods which typically make requests to other shards maybe not trace the prepare step? Hmm... yes, this could make sense. I didn't want to put too much in for the distributed request portion because that also gets traced on the remote peers. But you're right that something should be looked at here. Adding it around only handleResponse and finishStage seems insufficient? There is a lot of other things going on in the distribute branch there. Will come back to this later... bq. In CoreAdminHandler for the callInfo.call(); there is the traceDescription + " async" scope i.e. differentiation between sync and async. Just wondering if something similar might be useful for SearchHandler's without-debug and with-debug prepare and process scopes? You mean labelling the debug scope with a debug description? Yea, that's doable. My async description was largely a hack, I think, and will probably go away in favor of something more generic. bq. In the tests, curious why only [0] is being added in the getReceivers methods? Because there was only one receiver configured per jetty. I'll change this to grab them all. bq. In the tests, might the Random random() method be passed down to SpanId Good idea. I'll make a utility method in Solr for now, but also filed HTRACE-391 > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15598675#comment-15598675 ] Christine Poerschke commented on SOLR-9641: --- bq. This is really cool Mike Drob! ... +1 to that, I am also looking forward to having tracing support in Solr. Here's my comments from looking at the patch: * minor: in CoreContainer there is one {{zkSys.getZkController().getNodeName()}} and one {{getZkController().getNodeName()}} call, they could be combined into one call with result kept in local variable or both could use or not use {{zkSys}} for clarity. * In SearchHandler, how about also having trace scopes for the {{handleResponses}} and {{finishStage}} steps? Or if the intention is to only trace component methods which typically make requests to other shards maybe not trace the {{prepare}} step? * In CoreAdminHandler for the {{callInfo.call();}} there is the {{traceDescription + " async"}} scope i.e. differentiation between sync and async. Just wondering if something similar might be useful for SearchHandler's without-debug and with-debug prepare and process scopes? * In the tests, curious why only \[0\] is being added in the getReceivers methods? * In the tests, might the {{Random random()}} method be passed down to SpanId i.e. for the tests {code} - ... SpanId.fromRandom() ... + ... SpanId.fromRandom(random()) ... {code} and for [SpanId.java|https://github.com/apache/incubator-htrace/blob/master/htrace-core4/src/main/java/org/apache/htrace/core/SpanId.java] something along the lines of {code} + import java.util.Random; + + private static long nonZeroRand64(Random random) { + while (true) { + long r = random.nextLong(); + if (r != 0) { + return r; + } + } + } + + public static SpanId fromRandom(Random random) { + return new SpanId(nonZeroRand64(random), nonZeroRand64(random)); + } {code} > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592405#comment-15592405 ] David Smiley commented on SOLR-9641: Yes. Perhaps the default solr.xml might have a commented trace section -- brief. > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592319#comment-15592319 ] Mike Drob commented on SOLR-9641: - Documenting what goes in the {{trace}} section in solr.xml would also be ref-guide, yes? > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592025#comment-15592025 ] David Smiley commented on SOLR-9641: Docs: * javadocs: probably on the Tracer field you added to core container. TracerUtils.java should refer to that so people know where it's placed in Solr. * user docs: we'll probably want to add this to the ref guide... at least something very brief that can demonstrate the simplest useful way to see it in action, and then we refer users to other possibilities (i.e. ZipKin). There ought to be a reference to this feature in the vicinity of where debugQuery/debug=timing is so people know of this more sophisticated option. > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592003#comment-15592003 ] David Smiley commented on SOLR-9641: See HttpSolrCall.call around line 469 (writeResponse) > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591981#comment-15591981 ] Mike Drob commented on SOLR-9641: - [~tomasflobbe] - we were talking last week about adding a trace around the response writer, but I'm struggling to find where that logic is. Can you give me a pointer? > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591976#comment-15591976 ] Mike Drob commented on SOLR-9641: - Thanks for taking a look, [~dsmiley]! bq. Can you recommend a tool that can be used with Solr after this patch is applied to visualize or otherwise make use of it to help us analyze Solr performance? The built in HTrace viewer is reasonable for some purposes, but probably not ideal for all purposes. There is also a Zipkin bridge, so you could use that as your visualizer. Both are configured by setting the {{span.receiver.classes}} configuration to the appropriate value. My docs are pretty sparse at the moment, where would you suggest placing them? We can have a short description and then refer to the full HTrace docs for completeness. {quote} * SolrCore.newScope: guard log.debug with log.isDebugEnabled to avoid toString * HttpShardHandler: maybe instead of always wrapping task with traceTask we instead conditionally replace task with a tracing one? This way we conveniently avoid the wrapping if there is no tracing. * CommonParams.java:TRACE_ID: a one-liner comment referencing "HTrace" would be useful. {quote} Done. I'm not going to upload a new patch yet, since the changes are relatively minimal and I don't want to clutter the issue. {quote} * loadTraceConfig: could you use NamedList.asMap(1) or perhaps not because "String" type? {quote} I tried this and it worked, but something about it feels incredibly fragile. I'll leave it in for now, however. {quote} * TracerUtils: I like this. Question: should newScope(SolrQueryRequest request, String description) also look in the request params to see if there is a parent, and if so conditionally call tracer.newScope with that parent? {quote} Hmm, maybe. I know that it is possible to have multiple parents per span, but I think the APIs around it are a little clunky. Will need to think on this more. Actually, no. I don't think we need to pull the parent from the request params here, since we already do that in {{SolrCore.newScope}}, which should be handling most things. The method in {{TracerUtils}} is more of a convenience thing to get at the core container so we can get the tracer. > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9641) Emit distributed tracing information from Solr
[ https://issues.apache.org/jira/browse/SOLR-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588732#comment-15588732 ] David Smiley commented on SOLR-9641: This is really cool [~mdrob]! I learned about tracing at Apache Big Data this year and I became hopeful that one day Solr would get tracing abilities. Can you recommend a tool that can be used with Solr after this patch is applied to visualize or otherwise make use of it to help us analyze Solr performance? I looked at the patch; the approach is overall quite nice I think. Some comments: * SolrCore.newScope: guard log.debug with log.isDebugEnabled to avoid toString * loadTraceConfig: could you use NamedList.asMap(1) or perhaps not because "String" type? * TracerUtils: I like this. Question: should newScope(SolrQueryRequest request, String description) also look in the request params to see if there is a parent, and if so conditionally call tracer.newScope with that parent? * HttpShardHandler: maybe instead of always wrapping task with traceTask we instead conditionally replace task with a tracing one? This way we conveniently avoid the wrapping if there is no tracing. * CommonParams.java:TRACE_ID: a one-liner comment referencing "HTrace" would be useful. > Emit distributed tracing information from Solr > -- > > Key: SOLR-9641 > URL: https://issues.apache.org/jira/browse/SOLR-9641 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob > Fix For: master (7.0) > > Attachments: SOLR-9641.patch > > > While Solr already offers a few tools for exposing timing, this information > can be difficult to aggregate and analyze. By integrating distributed tracing > into Solr operations, we can gain new performance and behaviour insights. > One such solution can be accomplished via Apache HTrace (incubating). > (More rationale to follow.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org