Hello Riza Suminto, Pranav Lodha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/23294

to look at the new patch set (#6).

Change subject: IMPALA-13237: [Patch 7] - Lock ClientRequestState during 
Opentelemetry Traces
......................................................................

IMPALA-13237: [Patch 7] - Lock ClientRequestState during Opentelemetry Traces

Updates the SpanManager class so it takes the ClientRequestState lock
when reading from that object.

Updates startup flag otel_trace_span_processor to be hidden. Manual
testing revealed that setting this flag to "simple" (which uses
SimpleSpanProcessor when forwarding OpenTelemetry traces) causes the
SpanManager object to block until the destination OpenTelemetry
collector receives the request and responds. Thus, network slowness
or an overloaded OpenTelemetry collector will block the entire query
processing flow since SpanManager will hold the ClientRequestState
lock throughout the duration of the communication with the
OpenTelemetry collector. Since the SimpleSpanProcessor is useful in
testing, this flag was changed to hidden to avoid incorrect usage in
production.

When generating span attribute values on OpenTelemetry traces for
queries, data is read from ClientRequestState without holding its
lock. The documentation in client-request-state.h specifically states
reading most fields requires holding its lock.

An examination of the opentelemetry-cpp SDK code revealed the
ClientRequestState lock must be held until the StartSpan() and
EndSpan() functions complete. The reason is span attribute keys and
values are deep copied from the source nostd::string_view objects
during these functions.

Testing accomplished by running the test_otel_trace.py custom cluster
tests as regression tests. Additionally, manual testing with
intentionally delayed network communication to an OpenTelemetry
collector demonstrated that the StartSpan() and EndSpan() functions
do not block waiting on the OpenTelemetry collector if the batch span
processor is used. However, these functions do block if the simple
span processor is used.

Change-Id: I649bdb6f88176995d45f7d10db898188bbe0b609
---
M be/src/observe/otel-flags-trace.cc
M be/src/observe/otel.cc
M be/src/observe/otel.h
M be/src/observe/span-manager.cc
M be/src/observe/span-manager.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
7 files changed, 124 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/23294/6
--
To view, visit http://gerrit.cloudera.org:8080/23294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I649bdb6f88176995d45f7d10db898188bbe0b609
Gerrit-Change-Number: 23294
Gerrit-PatchSet: 6
Gerrit-Owner: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Pranav Lodha <pranav.lo...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to