Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16949 )
Change subject: IMPALA-9224: Blacklist nodes with faulty disk for spilling ...................................................................... IMPALA-9224: Blacklist nodes with faulty disk for spilling This patch extends blacklist functionality by adding executor node to blacklist if a query fails caused by disk failure during spill-to-disk. Also classifies disk error codes and defines a blacklistable error set for non-transient disk errors. Coordinator blacklists executor only if the executor hitted blacklistable error during spill-to-disk. Adds a new debug action to simulate disk write error during spill-to- disk. To use, specify in query options as: 'debug_action': 'IMPALA_TMP_FILE_WRITE:<hostname>:<port>:<action>' where <hostname> and <port> represent the impalad which execute the fragment instances, <port> is the BE krpc port (default 27000). Adds new test cases for blacklist and query-retry to cover the code changes. Testing: - Passed new test cases. - Passed exhaustive test. - Manually simulated disk failures in scratch directories on nodes of a cluster, verified that the nodes were blacklisted as expected. Change-Id: I04bfcb7f2e0b1ef24a5b4350f270feecd8c47437 Reviewed-on: http://gerrit.cloudera.org:8080/16949 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/io/error-converter.cc M be/src/runtime/io/error-converter.h M be/src/runtime/query-state.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M common/protobuf/control_service.proto M common/thrift/generate_error_codes.py M tests/custom_cluster/test_blacklist.py M tests/custom_cluster/test_query_retries.py 14 files changed, 510 insertions(+), 11 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I04bfcb7f2e0b1ef24a5b4350f270feecd8c47437 Gerrit-Change-Number: 16949 Gerrit-PatchSet: 17 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
