Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation ......................................................................
Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/7730/3//COMMIT_MSG Commit Message: Line 28: variables or direct declarations in classes. > For testing, it might be worth adding a stress startup option to make threa I have done hand testing on a variety of codepaths. I'm looking into automating the testing. It is useful to make a distinction between the thread creates that are necessary for startup and/or instance life and those that are only needed for query success. I think tagging the query locations can lead to a repeatable test: either the queries return the right answer or they return this error. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 361: COUNTER_ADD(num_scanner_threads_started_counter_, -1); > It seems ok to omit fixing up num_scanner_threads_started_counter_. I don't I'm ok with either way. My thought is that in cases where the query doesn't abort we may want to know how many threads actually ran. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 154: ++num_active_scanners_; > This is protected by lock_ so we could just wait until the thread is create Done http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/rpc/thrift-thread.cc File be/src/rpc/thrift-thread.cc: Line 37: if (!status.ok()) throw atc::SystemResourceException("Thread::Create() failed"); > This could use some explanation, since it's an unusual idiom in the code. I Added a comment for this, but it is a bit unclear to me how thrift handles the exception. It seems like it gets sent back to the client, but I'm not 100% sure. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: Line 469: HS2_RETURN_ERROR(return_val, status.GetDetail(), SQLSTATE_GENERAL_ERROR); > It looks like we have identical error-handling code in three places. May be Changed to gotos and shared code. PS3, Line 476: (void) > discard_result Done http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread-pool.h File be/src/util/thread-pool.h: Line 61: /// error spawning the threads. > How do things get cleaned up on an error? Does Shutdown() handle that case? When we get an error, only successfully started threads are in the threads_ ThreadGroup. ~ThreadPool calls Shutdown() and Join(), so we know that the threads will be cleaned up eventually. The owner of the ThreadPool might also call Shutdown() and Join(), and this is fine, because both of those functions are idempotent. The only thing the owner can do wrong is to Join() without calling Shutdown(), which makes no sense. It might make sense to have the error case automatically call Shutdown() + Join(). http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread.cc File be/src/util/thread.cc: Line 301: return Status(e.what()); > Might be worth adding a status code to common/thrift/generate_error_codes.p Added this. I don't think the category or name would be useful to the customer, but if they would be useful to us, we can add it to the error message. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread.h File be/src/util/thread.h: PS3, Line 69: std::unique_ptr<Thread>& > we'd normally use pointers for output arguments - https://google.github.io/ Done -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
