Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10744 )
Change subject: IMPALA-1760: Implement shutdown command ...................................................................... Patch Set 11: (10 comments) http://gerrit.cloudera.org:8080/#/c/10744/11//COMMIT_MSG Commit Message: PS11: RE: "I have a question that should we call CheckNotShuttingDown() in ImpalaInternalService::ExecQueryFInstances? If somehow the quiesce period is too short that coordinators still schedule fragment instances to the shutting down node, the queries can be failed fast." I kind of like the suggestion but I'd be concerned about adding another code path to test - for the moment it seems simpler to just have one code path for failing queries that run past the quiesce period. http://gerrit.cloudera.org:8080/#/c/10744/11//COMMIT_MSG@26 PS11, Line 26: e.g. statestore down > Could you add tests for this? Could you explain more about how the shutdown I added a basic test that kills the statestore. This property is pretty trivial to verify by looking at the code - the shutdown code path doesn't communicate with the statestore or refer to the cluster membership or anything like that. http://gerrit.cloudera.org:8080/#/c/10744/11//COMMIT_MSG@29 PS11, Line 29: * If shutting down, a banner is shown on the root debug page. > does this get exposed programatically somehow as well? I would think that c You can get it in JSON form from the root debug page, i.e. http://host:port/?json=true, although that might not be the "right" interface. I thought about adding more functions to query status, etc but decided not to do that in this patch because it felt a little speculative and the patch is already large. http://gerrit.cloudera.org:8080/#/c/10744/11//COMMIT_MSG@32 PS11, Line 32: 1. (if a coordinator) clients are prevented from submitting : queries to this coordinator via some out-of-band mechanism, : e.g. load balancer > should shutting-down coordinators reject new queries or new sessions after The current patch starts rejecting new queries and sessions immediately once a coordinator is shut down. Clarified here. http://gerrit.cloudera.org:8080/#/c/10744/11/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/10744/11/be/src/service/client-request-state.cc@628 PS11, Line 628: for (int i = 0; i < 3; ++i) { > What about sleep several seconds before the next retry like this? Seems like a good idea to consider but I don't want to make the change in this patch (we want to keep this consistent with the backend exec RPC). I filed a JIRA to track this IMPALA-7283 To that end though, I removed the code duplication with the other place that does retry to make it easier to make such changes in the future. http://gerrit.cloudera.org:8080/#/c/10744/11/be/src/util/default-path-handlers.cc File be/src/util/default-path-handlers.cc: http://gerrit.cloudera.org:8080/#/c/10744/11/be/src/util/default-path-handlers.cc@228 PS11, Line 228: bool is_quiescing = impala_server->IsShuttingDown(); > I think for now we should just stick to is_shutting_down since this stateme I think that the Impala daemon can be quiesceing even after the period has elapsed - it's only successfully quiesced once nothing is running on it. So that really means that the quiesce period is the minimum quiesce period. Updated some comments accordingly. http://gerrit.cloudera.org:8080/#/c/10744/11/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: http://gerrit.cloudera.org:8080/#/c/10744/11/common/thrift/StatestoreService.thrift@79 PS11, Line 79: it > nit: typo Done http://gerrit.cloudera.org:8080/#/c/10744/11/fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java File fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java: http://gerrit.cloudera.org:8080/#/c/10744/11/fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java@95 PS11, Line 95: * Supports optionally specifying the backend and the deadline: either shutdown(), : * shutdown('host:port'), shutdown(deadline), shutdown('host:port', deadline). > Cool! It'd be better to mention these in the commit message: Updated the commit message. http://gerrit.cloudera.org:8080/#/c/10744/11/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/10744/11/tests/custom_cluster/test_restart_services.py@94 PS11, Line 94: r'quiesce period left: ([0-9ms]*), deadline left: ([0-9ms]*), ' + : r'fragment instances: ([0-9]*), queries registered: ([0-9]*)' > I'd be great if these can be shown in the web page dynamically. This appears in a banner on the front page of the web UI - is that what you were asking for? http://gerrit.cloudera.org:8080/#/c/10744/11/tests/custom_cluster/test_restart_services.py@212 PS11, Line 212: # Test that we can reduce the deadline after setting it to a high value. > maybe test that we cannot increase the deadline too Done -- To view, visit http://gerrit.cloudera.org:8080/10744 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0 Gerrit-Change-Number: 10744 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Thu, 12 Jul 2018 00:20:29 +0000 Gerrit-HasComments: Yes
