Joe McDonnell has posted comments on this change.

Change subject: IMPALA-5750: Catch exceptions from boost thread creation
......................................................................


Patch Set 6:

(9 comments)

Still going through the comments, but I thought I'd put up some quick replies.

http://gerrit.cloudera.org:8080/#/c/7730/6//COMMIT_MSG
Commit Message:

Line 28: variables or direct declarations in classes.
> any reason Thread and ThreadPool don't follow the same pattern? i.e. both u
ThreadPool uses the constructor+init pattern because we subclass it for the 
CallableThreadPool. Subclassing gets awkward with the static create pattern. If 
someone forgets to run Init(), they find out when they offer a piece of work to 
the ThreadPool.

For Thread, I think it would be feasible to use the constructor+init pattern. 
The difference is that if someone forgets an Init(), they might be expecting a 
thread and it just isn't there.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

PS6, Line 363: pool->ReleaseThreadToken(false, true);
> Do you need to drop the lock before calling this? Based on the comment from
Yes, that is what in_callback is for. ReleaseThreadToken can call these thread 
availability callbacks, so we could end up back in this code. The 
in_callback=true prevents that.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS6, Line 59:   RETURN_IF_ERROR(Thread::Create("query-exec-mgr",
            :       Substitute("start-query-finstances-$0", PrintId(query_id)),
            :           &QueryExecMgr::StartQueryHelper, this, qs, &t, true));
> Shouldn't we call ReleaseQueryState(qs); if the thread failed to start here
Yes, that looks like a bug. I will fix in the next upload.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS6, Line 199:     // If this fragment instance is not prepared, it must be 
reporting an error.
             :     DCHECK(fis->IsPrepared() || !status.ok());
> Is this necessarily true?
The race might exist. I'm looking at the code to check.

The old DCHECK is stricter than the new one. Right now, we require that fis is 
nullptr or fis->IsPrepared(). I'm carving out an exception for failure during 
thread spawn when the fis won't have a chance to be prepared.


PS6, Line 199:     // If this fragment instance is not prepared, it must be 
reporting an error.
             :     DCHECK(fis->IsPrepared() || !status.ok());
> I don't really understand this DCHECK.  oh, because we pass 'fis' in the ne
In testing, I noticed that fis==nullptr doesn't actually convey the status. If 
I use fis==nullptr, the coordinator never gets notified of the status and the 
query hangs. When I went through ReportExecStatusAux, I noticed that apart from 
some DCHECKs, we don't use the status argument unless fis is non-null. 
TReportExecStatusParams doesn't really have a place for status apart from in 
the TFragmentInstanceExecStatus's.


PS6, Line 328: prepare_status
> why is this called "prepare_status"?  the old 'prepare_status' was the stat
The idea is to share the codepath below, which is waiting for the fragment 
instance states to be prepared. To incorporate the thread create status into 
the instances_prepared_promise_, the status needs to be available outside the 
loop.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

PS6, Line 598: Thread::Create("query-exec-state", "wait-thread",
             :       &ClientRequestState::Wait, this, &wait_thread_);
> Why not inject the thread failures here as well? I think it's an important 
Good point. I will add it to the list of locations that are eligible for fault 
injection. I will be doing more test runs as I go, so this will now be tested.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/util/thread.cc
File be/src/util/thread.cc:

Line 304:     if ((nanos % 100) == 1) {
> on some platforms, the monotonic clock may not actually have nanosecond res
Good point, will make that change.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/util/thread.h
File be/src/util/thread.h:

PS6, Line 158: thread
> nit: 'thread'
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7730
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to