Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/7577
Change subject: IMPALA-5749: coordinator race hits DCHECK 'num_remaining_backends_ > 0' ...................................................................... IMPALA-5749: coordinator race hits DCHECK 'num_remaining_backends_ > 0' In Coordinator::UpdateBackendExecStatus(), we check if the backend has already completed with BackendState::IsDone() and return without applying the update if so to avoid updating num_remaining_backends_ twice for the same completed backend. The problem is that the value of BackendState::IsDone() is updated by the call to BackendState::ApplyExecStatusReport() that comes after it, but these operations are not performed atomically, so if there are two simultaneous calls to UpdateBackendExecStatus(), they can both call IsDone(), both get 'false', and then proceed to erroneously both update num_remaining_backends_, hitting a DCHECK. The solution is to perform both the call to IsDone() and the update to it atomically by holding the BackendState::lock_. Testing: - Ran test_finst_cancel_when_query_complete 10,000 times without hitting the DCHECK (previously, it would hit about once per 300 runs). Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc 3 files changed, 16 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/7577/1 -- To view, visit http://gerrit.cloudera.org:8080/7577 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
