[Impala-ASF-CR] IMPALA-5749: coordinator race hits DCHECK 'num remaining backends > 0'

Sailesh Mukil (Code Review) Thu, 03 Aug 2017 13:15:37 -0700

Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5749: coordinator race hits DCHECK 
'num_remaining_backends_ > 0'
......................................................................



Patch Set 1:

> Does this trigger only when there are two concurrent calls to
 > UpdateBackendExecStatus() from the same backend? If so, do we
 > understand why that happens so often?

My understanding is this:
A fragment instance sends reports every 'n' seconds. Due to a congested 
network, two of these reports for the same fragment instance from a backend can 
arrive at the coordinator and start being processed at around the same time, 
hence leading to this issue.

Ideally a second report cannot be send until the first one is ACKd by the 
coordinator, since a lock is held until the report is ACKd, in the 
ReportProfileThread(); but there is only one case where a second report will be 
sent before the first one is responded to, i.e.  from 
FragmentInstanceState::Finalize().

So ReportProfileThread() sends the one report of the last finstance, then 
Finalize() sends the second report of the same finstance before the first one 
is responded to.

-- 
To view, visit http://gerrit.cloudera.org:8080/7577
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5749: coordinator race hits DCHECK 'num remaining backends > 0'

Reply via email to