Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9718 )
Change subject: IMPALA-6338: Fix flaky test_profile_fragment_instances ...................................................................... Patch Set 1: (5 comments) Thanks for the replies. I left some question around potential simplifications but the current logic looks correct to me. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.h@119 PS1, Line 119: boost::mutex* lock() { return &lock_; } > Is there a reasonable way to make that work with the way that we have a lis I don't think it will help with the iteration part, but it'll prevent mistakes where we forget to release the lock. During the locking phase we can just populate a vector of these scoped lock handles. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@237 PS1, Line 237: return num_remaining_instances_ == 0 || (!status_.ok() && !status_.IsCancelled()); > Sure, that would be equivalent. I don't have a strong feeling either way ab That makes sense, thanks for the explanation. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@249 PS1, Line 249: for (const TFragmentInstanceExecStatus& instance_exec_status: > I didn't want to add any more locking to the profile update path, which may My main concern was making the locking protocol less complicated but I wasn't aware that the lock I'm looking for is shared across backends. Looking at the calls to ComputeQuerySummary(), two of them already follow a call to WaitForBackendCompletion(). Can we call WaitForBackendCompletion() in Coordinator::Cancel(), too, before calling ComputeQuerySummary()? http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@251 PS1, Line 251: instance_status > Not quite, since (at least for successfully completed queries where > fragments are cancelled because returned_all_results_ becomes true) > BackendState::status_ is only updated as a result of these status > reports, so the first time an instance reports CANCELLED, status_ > won't already be CANCELLED. Can the coordinator cancel the backend_states in this case before sending out the cancellations? > > What you would need is access to Coordinator::query_status_, which > we could pass into ApplyExecStatusReport(), except that accessing > it requires taking Coordinator::lock_, which of course we don't > want to hold for the duration of ApplyExecStatusReport(). > > So, I think that what we could do is: in UpdateBackendExecStatus, > take Coordinator::lock_ and make a copy of Coordinator::query_status_. > Release lock_ and pass the copy into ApplyExecStatusReport(), along > with a copy of the value of returned_all_results_. We can then say > something like: > DCHECK(!instance_status.IsCancelled() || (!query_status_.ok() || > returned_all_results_)) http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@314 PS1, Line 314: return IsDone(); > Right, that's the intended behavior. Thx for the explanation. -- To view, visit http://gerrit.cloudera.org:8080/9718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f8295e4ec580b476f7668c77aec1348fb2e868c Gerrit-Change-Number: 9718 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Wed, 21 Mar 2018 18:07:56 +0000 Gerrit-HasComments: Yes
