[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-09-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.cc
File be/src/service/child-query.cc:

PS5, Line 171: lock_guard lock(lock_);
> you can avoid this if status.ok(), I think. Not sure if it's worth it.
I can combine this block with the if() below, works out shorter.


Line 185:   if (thread == NULL) return Status::OK();
> DCHECK(is_cancelled_)? I think that's the only time that thread == nullptr.
It's also false if we never ran any child queries. Added 
DCHECK(!is_running_).


http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.h
File be/src/service/child-query.h:

PS5, Line 148: is
> must be
Done


PS5, Line 178: Exec
> Is there a more descriptive name than Exec()? Every class seems to have an 
Done


Line 194:   bool is_running_;
> Comment here that, once started, the queries must be allowed to complete or
Done


http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/query-exec-state.h
File be/src/service/query-exec-state.h:

PS5, Line 226: Acquires
> Acquirers
Done


PS5, Line 227: acquired
> required
I think I actually intended acquired, but "required" also works.


-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-09-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..


Patch Set 6: Code-Review+1

Carry +1

-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-09-02 Thread Tim Armstrong (Code Review)
Hello Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4163

to look at the new patch set (#6).

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..

IMPALA-4037,IMPALA-4038: fix locking during query cancellation

* Refactor the child query handling out of QueryExecState and clarify
  locking rules.
* Avoid holding QueryExecState::lock_ while calling
  Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs
  or acquire ImpalaServer::query_exec_state_map_lock_.
* Fix a potential race between QueryExecState::Exec() and
  QueryExecState::Cancel() where the cancelling thread did an unlocked
  read of the 'coordinator_' field and may not have cancelled the
  coordinator.

Testing:
Ran exhaustive build, ran local stress test for a bit.

Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
---
M be/src/runtime/coordinator.h
M be/src/service/child-query.cc
M be/src/service/child-query.h
M be/src/service/impala-server.cc
M be/src/service/query-exec-state.cc
M be/src/service/query-exec-state.h
M tests/query_test/test_cancellation.py
7 files changed, 233 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-09-02 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..


Patch Set 5: Code-Review+1

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.cc
File be/src/service/child-query.cc:

PS5, Line 171: lock_guard lock(lock_);
you can avoid this if status.ok(), I think. Not sure if it's worth it.


Line 185:   if (thread == NULL) return Status::OK();
DCHECK(is_cancelled_)? I think that's the only time that thread == nullptr.


http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/child-query.h
File be/src/service/child-query.h:

PS5, Line 148: is
must be


PS5, Line 178: Exec
Is there a more descriptive name than Exec()? Every class seems to have an 
Exec() method... maybe ExecSequentially()? ExecChildQueries()?


Line 194:   bool is_running_;
Comment here that, once started, the queries must be allowed to complete or 
fail (when the class is destroyed, we dcheck(!is_running); and say what methods 
force that as a post-condition (WaitForAll(), Cancel()). You have that in the 
class comment, but worth a mention here as well I think.


http://gerrit.cloudera.org:8080/#/c/4163/5/be/src/service/query-exec-state.h
File be/src/service/query-exec-state.h:

PS5, Line 226: Acquires
Acquirers


PS5, Line 227: acquired
required


-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-09-01 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.h
File be/src/service/child-query.h:

PS4, Line 193: has not been Join()ed
> Are you sure? The comment on Thread::Join() says "Blocks until this thread 
Sorry, I read the comment as 'true up until Join() is called', not 'true up 
until Join() completes'. I think it's clear enough as it is.


-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-09-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.cc
File be/src/service/child-query.cc:

PS4, Line 158: std
> remove std::
Done


Line 181:   is_running_ = false;
> Move into lock scope? And maybe move to Exec(), since it's a postcondition 
Moved into Exec(). I think that works.


Line 195:   Thread* thread;
> = GetChildQueriesThread() ? Or are you worried about taking and yielding th
Reworked this function.


PS4, Line 196: vector child_queries_copy;
> after is_running_ is set, child_queries_ should never change (that should b
I agree it should be safe, I was being conservative.

I added documentation to the header about when fields are immutable and then 
avoided acquiring 'lock_' in the cases when its unnecessary. I think it worked 
out simpler on the net.


Line 211:   for (ChildQuery* child_query: child_queries_copy) 
child_query->Cancel();
> space before :
Rewrote this bit anyway.


http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.h
File be/src/service/child-query.h:

PS4, Line 153: ChildQueryExecutor();
> If you pass the child queries in at construction you can ensure the invaria
I considered that, but it makes the locking in QES way hairier if 
ChildQueryExecutor is created midway through Exec(), since we then need to 
acquire an external lock to avoid a race in checking whether 
child_query_executor_ is non-NULL.


Line 159:   void ExecAsync(std::vector* child_queries);
> why not std::vector&& to make it clear that the argument must b
Works for me.


PS4, Line 175: is blocking
> blocks until all queries are completed
Done.


PS4, Line 187: mutex
> SpinLock might be better.
Done


PS4, Line 193: has not been Join()ed
> Join() doesn't imply the thread has stopped (i.e. this seems necessary but 
Are you sure? The comment on Thread::Join() says "Blocks until this thread 
finishes execution."


http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

PS4, Line 440: setup
> nit: set up
Done


http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.h
File be/src/service/query-exec-state.h:

PS4, Line 130: yet
> Can this be called after the query has finished? If so, remove 'yet'.
Good point. Done.


PS4, Line 227: various ImpalaServer operations
> bit vague - do you mean that service-wide operations may be blocked on gett
Reworded and mentioned IMPALA-3882 (since that's what causes it to really get 
out of control).


Line 230:   ExecEnv* exec_env_;
> blank line before this one.
Done


Line 338:   Status ExecQueryOrDmlRequest(const TQueryExecRequest& 
query_exec_request);
> Add a comment that coord() is only valid after this method (if this path is
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-09-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..

IMPALA-4037,IMPALA-4038: fix locking during query cancellation

* Refactor the child query handling out of QueryExecState and clarify
  locking rules.
* Avoid holding QueryExecState::lock_ while calling
  Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs
  or acquire ImpalaServer::query_exec_state_map_lock_.
* Fix a potential race between QueryExecState::Exec() and
  QueryExecState::Cancel() where the cancelling thread did an unlocked
  read of the 'coordinator_' field and may not have cancelled the
  coordinator.

Testing:
Ran exhaustive build, ran local stress test for a bit.

Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
---
M be/src/runtime/coordinator.h
M be/src/service/child-query.cc
M be/src/service/child-query.h
M be/src/service/impala-server.cc
M be/src/service/query-exec-state.cc
M be/src/service/query-exec-state.h
M tests/query_test/test_cancellation.py
7 files changed, 228 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-08-31 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..


Patch Set 4:

(15 comments)

Refactoring looks sensible. Does the stress test run compute stats to exercise 
the CQ path?

http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.cc
File be/src/service/child-query.cc:

PS4, Line 158: std
remove std::


Line 181:   is_running_ = false;
Move into lock scope? And maybe move to Exec(), since it's a postcondition of 
that method.


Line 195:   Thread* thread;
= GetChildQueriesThread() ? Or are you worried about taking and yielding the 
lock several times?


PS4, Line 196: vector child_queries_copy;
after is_running_ is set, child_queries_ should never change (that should be an 
invariant of only calling ExecAsync once). Is it overkill to have this 
protective locking and copying since several clients could safely read the 
vector at once?

i.e. if you guarantee to read only after checking is_running_ == true on line 
200, the vector is essentially const.


Line 211:   for (ChildQuery* child_query: child_queries_copy) 
child_query->Cancel();
space before :


http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/child-query.h
File be/src/service/child-query.h:

PS4, Line 153: ChildQueryExecutor();
If you pass the child queries in at construction you can ensure the invariant 
of only ever executing one set of queries. I don't feel strongly in either 
direction though.


Line 159:   void ExecAsync(std::vector* child_queries);
why not std::vector&& to make it clear that the argument must be 
movable from?


PS4, Line 175: is blocking
blocks until all queries are completed


PS4, Line 187: mutex
SpinLock might be better.


PS4, Line 193: has not been Join()ed
Join() doesn't imply the thread has stopped (i.e. this seems necessary but not 
sufficient).


http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

PS4, Line 440: setup
nit: set up


http://gerrit.cloudera.org:8080/#/c/4163/4/be/src/service/query-exec-state.h
File be/src/service/query-exec-state.h:

PS4, Line 130: yet
Can this be called after the query has finished? If so, remove 'yet'.


PS4, Line 227: various ImpalaServer operations
bit vague - do you mean that service-wide operations may be blocked on getting 
this lock, and that therefore other queries / clients may be affected? If so, 
good to be explicit about that.


Line 230:   ExecEnv* exec_env_;
blank line before this one.


Line 338:   Status ExecQueryOrDmlRequest(const TQueryExecRequest& 
query_exec_request);
Add a comment that coord() is only valid after this method (if this path is 
used), and it may not be valid if it returns an error.


-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-08-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..

IMPALA-4037,IMPALA-4038: fix locking during query cancellation

* Refactor the child query handling out of QueryExecState and clarify
  locking rules.
* Avoid holding QueryExecState::lock_ while calling
  Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs
  or acquire ImpalaServer::query_exec_state_map_lock_.
* Fix a potential race between QueryExecState::Exec() and
  QueryExecState::Cancel() where the cancelling thread did an unlocked
  read of the 'coordinator_' field and may not have cancelled the
  coordinator.

Testing:
Ran exhaustive build, ran local stress test for a bit.

Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
---
M be/src/runtime/coordinator.h
M be/src/service/child-query.cc
M be/src/service/child-query.h
M be/src/service/impala-server.cc
M be/src/service/query-exec-state.cc
M be/src/service/query-exec-state.h
M tests/query_test/test_cancellation.py
7 files changed, 233 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4037,IMPALA-4038: fix locking during query cancellation

2016-08-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-4037,IMPALA-4038: fix locking during query cancellation
..

IMPALA-4037,IMPALA-4038: fix locking during query cancellation

* Refactor the child query handling out of QueryExecState and clarify
  locking rules.
* Avoid holding QueryExecState::lock_ while calling
  Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs
  or acquire ImpalaServer::query_exec_state_map_lock_.
* Fix a potential race between QueryExecState::Exec() and
  QueryExecState::Cancel() where the cancelling thread did an unlocked
  read of the 'coordinator_' field and may not have cancelled the
  coordinator.

Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
---
M be/src/runtime/coordinator.h
M be/src/service/child-query.cc
M be/src/service/child-query.h
M be/src/service/impala-server.cc
M be/src/service/query-exec-state.cc
M be/src/service/query-exec-state.h
M tests/query_test/test_cancellation.py
7 files changed, 233 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4163/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe3024803e03595ee69c47759b58e8443d7bd167
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong