Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10744 )
Change subject: IMPALA-1760: Implement shutdown command ...................................................................... Patch Set 11: (4 comments) This is a great patch which can tremendously improve the SLA of Impala! I have a question that should we call CheckNotShuttingDown() in ImpalaInternalService::ExecQueryFInstances? If somehow the quiesce period is too short that coordinators still schedule fragment instances to the shutting down node, the queries can be failed fast. http://gerrit.cloudera.org:8080/#/c/10744/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10744/11//COMMIT_MSG@26 PS11, Line 26: e.g. statestore down Could you add tests for this? Could you explain more about how the shutdown workflow works if statestored is down or stuck? http://gerrit.cloudera.org:8080/#/c/10744/11/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/10744/11/be/src/service/client-request-state.cc@628 PS11, Line 628: for (int i = 0; i < 3; ++i) { What about sleep several seconds before the next retry like this? for (int i = 0; i < 3; ++i, sleep(3)) Usually it will increase success rate if there're network issues or the target server is stuck temporarily. http://gerrit.cloudera.org:8080/#/c/10744/11/fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java File fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java: http://gerrit.cloudera.org:8080/#/c/10744/11/fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java@95 PS11, Line 95: * Supports optionally specifying the backend and the deadline: either shutdown(), : * shutdown('host:port'), shutdown(deadline), shutdown('host:port', deadline). Cool! It'd be better to mention these in the commit message: (1) the deadline is configurable in the shutdown statement; (2) the deadline is a period, not a timestamp http://gerrit.cloudera.org:8080/#/c/10744/11/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/10744/11/tests/custom_cluster/test_restart_services.py@94 PS11, Line 94: r'quiesce period left: ([0-9ms]*), deadline left: ([0-9ms]*), ' + : r'fragment instances: ([0-9]*), queries registered: ([0-9]*)' I'd be great if these can be shown in the web page dynamically. -- To view, visit http://gerrit.cloudera.org:8080/10744 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0 Gerrit-Change-Number: 10744 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Sat, 07 Jul 2018 07:30:17 +0000 Gerrit-HasComments: Yes
