[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.cc File be/src/service/child-query.cc: PS5, Line 171: lock_guard lock(lock_); > you can avoid this if status.ok(), I think. Not sure if it's worth it. I can combine this block with the if() below, works out shorter. Line 185: if (thread == NULL) return Status::OK(); > DCHECK(is_cancelled_)? I think that's the only time that thread == nullptr. It's also false if we never ran any child queries. Added DCHECK(!is_running_). http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.h File be/src/service/child-query.h: PS5, Line 148: is > must be Done PS5, Line 178: Exec > Is there a more descriptive name than Exec()? Every class seems to have an Done Line 194: bool is_running_; > Comment here that, once started, the queries must be allowed to complete or Done http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/query-exec-state.h File be/src/service/query-exec-state.h: PS5, Line 226: Acquires > Acquirers Done PS5, Line 227: acquired > required I think I actually intended acquired, but "required" also works. -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. Patch Set 6: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Hello Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4163 to look at the new patch set (#6). Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. IMPALA-4037,IMPALA-4038: fix locking during query cancellation * Refactor the child query handling out of QueryExecState and clarify locking rules. * Avoid holding QueryExecState::lock_ while calling Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs or acquire ImpalaServer::query_exec_state_map_lock_. * Fix a potential race between QueryExecState::Exec() and QueryExecState::Cancel() where the cancelling thread did an unlocked read of the 'coordinator_' field and may not have cancelled the coordinator. Testing: Ran exhaustive build, ran local stress test for a bit. Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 --- M be/src/runtime/coordinator.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/impala-server.cc M be/src/service/query-exec-state.cc M be/src/service/query-exec-state.h M tests/query_test/test_cancellation.py 7 files changed, 233 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/6 -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Henry Robinson has posted comments on this change. Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. Patch Set 5: Code-Review+1 (7 comments) http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.cc File be/src/service/child-query.cc: PS5, Line 171: lock_guard lock(lock_); you can avoid this if status.ok(), I think. Not sure if it's worth it. Line 185: if (thread == NULL) return Status::OK(); DCHECK(is_cancelled_)? I think that's the only time that thread == nullptr. http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.h File be/src/service/child-query.h: PS5, Line 148: is must be PS5, Line 178: Exec Is there a more descriptive name than Exec()? Every class seems to have an Exec() method... maybe ExecSequentially()? ExecChildQueries()? Line 194: bool is_running_; Comment here that, once started, the queries must be allowed to complete or fail (when the class is destroyed, we dcheck(!is_running); and say what methods force that as a post-condition (WaitForAll(), Cancel()). You have that in the class comment, but worth a mention here as well I think. http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/query-exec-state.h File be/src/service/query-exec-state.h: PS5, Line 226: Acquires Acquirers PS5, Line 227: acquired required -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Henry Robinson has posted comments on this change. Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.h File be/src/service/child-query.h: PS4, Line 193: has not been Join()ed > Are you sure? The comment on Thread::Join() says "Blocks until this thread Sorry, I read the comment as 'true up until Join() is called', not 'true up until Join() completes'. I think it's clear enough as it is. -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. Patch Set 4: (15 comments) http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.cc File be/src/service/child-query.cc: PS4, Line 158: std > remove std:: Done Line 181: is_running_ = false; > Move into lock scope? And maybe move to Exec(), since it's a postcondition Moved into Exec(). I think that works. Line 195: Thread* thread; > = GetChildQueriesThread() ? Or are you worried about taking and yielding th Reworked this function. PS4, Line 196: vector child_queries_copy; > after is_running_ is set, child_queries_ should never change (that should b I agree it should be safe, I was being conservative. I added documentation to the header about when fields are immutable and then avoided acquiring 'lock_' in the cases when its unnecessary. I think it worked out simpler on the net. Line 211: for (ChildQuery* child_query: child_queries_copy) child_query->Cancel(); > space before : Rewrote this bit anyway. http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.h File be/src/service/child-query.h: PS4, Line 153: ChildQueryExecutor(); > If you pass the child queries in at construction you can ensure the invaria I considered that, but it makes the locking in QES way hairier if ChildQueryExecutor is created midway through Exec(), since we then need to acquire an external lock to avoid a race in checking whether child_query_executor_ is non-NULL. Line 159: void ExecAsync(std::vector* child_queries); > why not std::vector&& to make it clear that the argument must b Works for me. PS4, Line 175: is blocking > blocks until all queries are completed Done. PS4, Line 187: mutex > SpinLock might be better. Done PS4, Line 193: has not been Join()ed > Join() doesn't imply the thread has stopped (i.e. this seems necessary but Are you sure? The comment on Thread::Join() says "Blocks until this thread finishes execution." http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.cc File be/src/service/query-exec-state.cc: PS4, Line 440: setup > nit: set up Done http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.h File be/src/service/query-exec-state.h: PS4, Line 130: yet > Can this be called after the query has finished? If so, remove 'yet'. Good point. Done. PS4, Line 227: various ImpalaServer operations > bit vague - do you mean that service-wide operations may be blocked on gett Reworded and mentioned IMPALA-3882 (since that's what causes it to really get out of control). Line 230: ExecEnv* exec_env_; > blank line before this one. Done Line 338: Status ExecQueryOrDmlRequest(const TQueryExecRequest& query_exec_request); > Add a comment that coord() is only valid after this method (if this path is Done -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. IMPALA-4037,IMPALA-4038: fix locking during query cancellation * Refactor the child query handling out of QueryExecState and clarify locking rules. * Avoid holding QueryExecState::lock_ while calling Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs or acquire ImpalaServer::query_exec_state_map_lock_. * Fix a potential race between QueryExecState::Exec() and QueryExecState::Cancel() where the cancelling thread did an unlocked read of the 'coordinator_' field and may not have cancelled the coordinator. Testing: Ran exhaustive build, ran local stress test for a bit. Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 --- M be/src/runtime/coordinator.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/impala-server.cc M be/src/service/query-exec-state.cc M be/src/service/query-exec-state.h M tests/query_test/test_cancellation.py 7 files changed, 228 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/5 -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Henry Robinson has posted comments on this change. Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. Patch Set 4: (15 comments) Refactoring looks sensible. Does the stress test run compute stats to exercise the CQ path? http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.cc File be/src/service/child-query.cc: PS4, Line 158: std remove std:: Line 181: is_running_ = false; Move into lock scope? And maybe move to Exec(), since it's a postcondition of that method. Line 195: Thread* thread; = GetChildQueriesThread() ? Or are you worried about taking and yielding the lock several times? PS4, Line 196: vector child_queries_copy; after is_running_ is set, child_queries_ should never change (that should be an invariant of only calling ExecAsync once). Is it overkill to have this protective locking and copying since several clients could safely read the vector at once? i.e. if you guarantee to read only after checking is_running_ == true on line 200, the vector is essentially const. Line 211: for (ChildQuery* child_query: child_queries_copy) child_query->Cancel(); space before : http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.h File be/src/service/child-query.h: PS4, Line 153: ChildQueryExecutor(); If you pass the child queries in at construction you can ensure the invariant of only ever executing one set of queries. I don't feel strongly in either direction though. Line 159: void ExecAsync(std::vector* child_queries); why not std::vector&& to make it clear that the argument must be movable from? PS4, Line 175: is blocking blocks until all queries are completed PS4, Line 187: mutex SpinLock might be better. PS4, Line 193: has not been Join()ed Join() doesn't imply the thread has stopped (i.e. this seems necessary but not sufficient). http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.cc File be/src/service/query-exec-state.cc: PS4, Line 440: setup nit: set up http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.h File be/src/service/query-exec-state.h: PS4, Line 130: yet Can this be called after the query has finished? If so, remove 'yet'. PS4, Line 227: various ImpalaServer operations bit vague - do you mean that service-wide operations may be blocked on getting this lock, and that therefore other queries / clients may be affected? If so, good to be explicit about that. Line 230: ExecEnv* exec_env_; blank line before this one. Line 338: Status ExecQueryOrDmlRequest(const TQueryExecRequest& query_exec_request); Add a comment that coord() is only valid after this method (if this path is used), and it may not be valid if it returns an error. -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. IMPALA-4037,IMPALA-4038: fix locking during query cancellation * Refactor the child query handling out of QueryExecState and clarify locking rules. * Avoid holding QueryExecState::lock_ while calling Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs or acquire ImpalaServer::query_exec_state_map_lock_. * Fix a potential race between QueryExecState::Exec() and QueryExecState::Cancel() where the cancelling thread did an unlocked read of the 'coordinator_' field and may not have cancelled the coordinator. Testing: Ran exhaustive build, ran local stress test for a bit. Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 --- M be/src/runtime/coordinator.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/impala-server.cc M be/src/service/query-exec-state.cc M be/src/service/query-exec-state.h M tests/query_test/test_cancellation.py 7 files changed, 233 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/4 -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation .. IMPALA-4037,IMPALA-4038: fix locking during query cancellation * Refactor the child query handling out of QueryExecState and clarify locking rules. * Avoid holding QueryExecState::lock_ while calling Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs or acquire ImpalaServer::query_exec_state_map_lock_. * Fix a potential race between QueryExecState::Exec() and QueryExecState::Cancel() where the cancelling thread did an unlocked read of the 'coordinator_' field and may not have cancelled the coordinator. Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 --- M be/src/runtime/coordinator.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/impala-server.cc M be/src/service/query-exec-state.cc M be/src/service/query-exec-state.h M tests/query_test/test_cancellation.py 7 files changed, 233 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/3 -- To view, visit http://gerrit.cloudera.org:8080/4163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
