Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation ......................................................................
Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: PS7, Line 371: COUNTER_ADD(num_scanner_threads_started_counter_, 1); The problem with moving this here is when the node is under heavy stress, there may be a window where the scanner thread starts running, but this thread never gets scheduled for a while causing the counter to be slightly misleading. Not that it's a huge problem, but I just wanted to point it out. http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: PS7, Line 171: ++num_active_scanners_; This can cause quite a few races. It can race with L220, L244 and L245. http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS7, Line 335: // Fragment instance successfully started : // update fis_map_ : fis_map_.emplace(fis->instance_id(), fis); : // update fragment_map_ : vector<FragmentInstanceState*>& fis_list = fragment_map_[instance_ctx.fragment_idx]; : fis_list.push_back(fis); Is it safe to update the map with the Fragment instance state after already starting the fragment instance? I tried going through some scenarios, and they all checked out fine since critical RPCs like Cancel() are protected by 'instances_prepared_promise_', but I'm not sure if I'm missing some other failure case. http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/util/thread.cc File be/src/util/thread.cc: PS7, Line 303: rand() Not to be pedantic, but a rand() without seeding the PRNG first, will cause the same series of numbers to be generated on a particular node for multiple different runs. Causing the failure injections to be fairly deterministic if we're running the same queries over these test runs. Some thing closer to actual pseudo-randomness would require something like the following: https://github.com/apache/incubator-impala/blob/master/be/src/rpc/authentication.cc#L493 But getting a new random device every time could be expensive. Do you know of a better but cheap way to do this? -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
