Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5750: Catch exceptions from boost thread creation ......................................................................
IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Reviewed-on: http://gerrit.cloudera.org:8080/7730 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 455 insertions(+), 215 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
