Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16215 )
Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators ...................................................................... Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc@554 PS3, Line 554: server->CancelQueriesOnFailedBackends(current_backend_set); > I think this might be unnecessary on executor-only nodes? Yeah for this and your other comment below, probably the cleanest thing is two just have two distinct callbacks, one that is registered if is_executor and calls CancelQueriesForFailedCoordinators, and one for is_coordinator that call CancelQueriesOnFailedBackends. If both is_executor and is_coordinator is true you'll just register both callbacks, but ClusterMembershipMgr can handle that. http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc@222 PS3, Line 222: // TODO: create cancellation task queue and working thread to run cancellation tasks : // on a separate thread. If the queue is full, ignore the cancellations since we'll : // be able to process them on the next heartbeat instead. : : for (auto& qs : to_cancel) { : VLOG(1) << "CancelQueriesForFailedCoordinators(): cancel query " << qs->query_id(); : qs->Cancel(); : qs->is_coord_active_.Store(false); : ReleaseQueryState(qs); : } > if you move this into impala-server.cc you can use the ImpalaServer::Cancel I had suggested not to put this in ImpalaServer because its primarily the client interface (eg. beeswax and hs2) and is therefore really a coordinator component. It's true that executor-only impalads will still have an ImpalaServer instance, but we might someday want to change that as we clean up the separation between executors and coordinators. On the other hand, ImpalaServer is probably so intertwined with everything else, and the benefit would be small enough, that we'll never actually do that. It also doesn't seem like that much code duplication to put it here. I don't really feel strongly about it, though. http://gerrit.cloudera.org:8080/#/c/16215/3/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/16215/3/tests/custom_cluster/test_restart_services.py@241 PS3, Line 241: def test_kill_coordinator(self): This is an okay place for this test (obviously we have a lot of different test files and the distinction between them isn't always clear), but maybe instead put it test_process_failures.py instead? -- To view, visit http://gerrit.cloudera.org:8080/16215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd Gerrit-Change-Number: 16215 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Tue, 21 Jul 2020 19:25:00 +0000 Gerrit-HasComments: Yes