Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5749: coordinator race hits DCHECK 'num_remaining_backends_ > 0' ......................................................................
Patch Set 1: > Does this trigger only when there are two concurrent calls to > UpdateBackendExecStatus() from the same backend? If so, do we > understand why that happens so often? My understanding is this: A fragment instance sends reports every 'n' seconds. Due to a congested network, two of these reports for the same fragment instance from a backend can arrive at the coordinator and start being processed at around the same time, hence leading to this issue. Ideally a second report cannot be send until the first one is ACKd by the coordinator, since a lock is held until the report is ACKd, in the ReportProfileThread(); but there is only one case where a second report will be sent before the first one is responded to, i.e. from FragmentInstanceState::Finalize(). So ReportProfileThread() sends the one report of the last finstance, then Finalize() sends the second report of the same finstance before the first one is responded to. -- To view, visit http://gerrit.cloudera.org:8080/7577 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: No
