Hello Riza Suminto, Pranav Lodha, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23294 to look at the new patch set (#6). Change subject: IMPALA-13237: [Patch 7] - Lock ClientRequestState during Opentelemetry Traces ...................................................................... IMPALA-13237: [Patch 7] - Lock ClientRequestState during Opentelemetry Traces Updates the SpanManager class so it takes the ClientRequestState lock when reading from that object. Updates startup flag otel_trace_span_processor to be hidden. Manual testing revealed that setting this flag to "simple" (which uses SimpleSpanProcessor when forwarding OpenTelemetry traces) causes the SpanManager object to block until the destination OpenTelemetry collector receives the request and responds. Thus, network slowness or an overloaded OpenTelemetry collector will block the entire query processing flow since SpanManager will hold the ClientRequestState lock throughout the duration of the communication with the OpenTelemetry collector. Since the SimpleSpanProcessor is useful in testing, this flag was changed to hidden to avoid incorrect usage in production. When generating span attribute values on OpenTelemetry traces for queries, data is read from ClientRequestState without holding its lock. The documentation in client-request-state.h specifically states reading most fields requires holding its lock. An examination of the opentelemetry-cpp SDK code revealed the ClientRequestState lock must be held until the StartSpan() and EndSpan() functions complete. The reason is span attribute keys and values are deep copied from the source nostd::string_view objects during these functions. Testing accomplished by running the test_otel_trace.py custom cluster tests as regression tests. Additionally, manual testing with intentionally delayed network communication to an OpenTelemetry collector demonstrated that the StartSpan() and EndSpan() functions do not block waiting on the OpenTelemetry collector if the batch span processor is used. However, these functions do block if the simple span processor is used. Change-Id: I649bdb6f88176995d45f7d10db898188bbe0b609 --- M be/src/observe/otel-flags-trace.cc M be/src/observe/otel.cc M be/src/observe/otel.h M be/src/observe/span-manager.cc M be/src/observe/span-manager.h M be/src/service/client-request-state.cc M be/src/service/impala-server.cc 7 files changed, 124 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/23294/6 -- To view, visit http://gerrit.cloudera.org:8080/23294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I649bdb6f88176995d45f7d10db898188bbe0b609 Gerrit-Change-Number: 23294 Gerrit-PatchSet: 6 Gerrit-Owner: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Pranav Lodha <pranav.lo...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>