Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16215 )

Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.h@38
PS4, Line 38: /// query execution.
Could you update this comment to include:
- A little about the new cancellation stuff
- A little about the locking protocol here (i.e. that you have to hold the 
shard lock in order to increment the refcnt for a QueryState safely)


http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.h@97
PS4, Line 97: boost::scoped_ptr
We prefer to use std::unique_ptr in new code


http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.cc@61
PS4, Line 61: retried on the next statestore heartbeat
I don't think this is true (I see that you copied this comment from 
impala-server.cc and I think its also incorrect there, though it may have been 
correct in 2013 when it was written), though you might check me by changing the 
queue size to 1 and seeing if your test passes.

I'm pretty sure the callbacks will only get called on statestore updates that 
actually contain changes to the membership, so if a node is removed we'll only 
run CancelQueriesForFailedCoordinators once.

That should be okay though - with such a large queue size its unlikely we'll 
hit the limit (and if you do, your cluster probably has bigger problems), and 
the queries will get cancelled eventually anyways when ReportExecStatus() hits 
the timeout.


http://gerrit.cloudera.org:8080/#/c/16215/4/be/src/runtime/query-exec-mgr.cc@238
PS4, Line 238:   if (cancellation_thread_pool_->GetQueueSize() + 
to_cancel.size()
Rather than ignoring all of the cancellations if there's too many, can we fill 
the queue up and ignore the rest?



--
To view, visit http://gerrit.cloudera.org:8080/16215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
Gerrit-Change-Number: 16215
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Fri, 24 Jul 2020 16:32:34 +0000
Gerrit-HasComments: Yes

Reply via email to