Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16215 )
Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators ...................................................................... Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.h@38 PS4, Line 38: /// query execution. Could you update this comment to include: - A little about the new cancellation stuff - A little about the locking protocol here (i.e. that you have to hold the shard lock in order to increment the refcnt for a QueryState safely) http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.h@97 PS4, Line 97: boost::scoped_ptr We prefer to use std::unique_ptr in new code http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.cc@61 PS4, Line 61: retried on the next statestore heartbeat I don't think this is true (I see that you copied this comment from impala-server.cc and I think its also incorrect there, though it may have been correct in 2013 when it was written), though you might check me by changing the queue size to 1 and seeing if your test passes. I'm pretty sure the callbacks will only get called on statestore updates that actually contain changes to the membership, so if a node is removed we'll only run CancelQueriesForFailedCoordinators once. That should be okay though - with such a large queue size its unlikely we'll hit the limit (and if you do, your cluster probably has bigger problems), and the queries will get cancelled eventually anyways when ReportExecStatus() hits the timeout. http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.cc@238 PS4, Line 238: if (cancellation_thread_pool_->GetQueueSize() + to_cancel.size() Rather than ignoring all of the cancellations if there's too many, can we fill the queue up and ignore the rest? -- To view, visit http://gerrit.cloudera.org:8080/16215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd Gerrit-Change-Number: 16215 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Fri, 24 Jul 2020 16:32:34 +0000 Gerrit-HasComments: Yes
