[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState

2017-06-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7155/2/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 253:   return request_state->query_state();
You need to take request_state->lock() before the return here.


Line 283:   stringstream error_log_ss;
Same here. You need to take request_state->lock() before retrieving the query 
status and details.


-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState

2017-06-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#2).

Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState
..

IMPALA-5427: Fix order of status handling in ClientRequestState

There was a race in ClientRequestState::UpdateQueryStatus() leading to
the rare situation that a query would abort with an error, but the error
message would be empty.

The fix is to update the query_status_ before setting it to preserve the
error message.

To test this I ran test_corrupt_files in a loop for 2 days. Without this
fix, it would usually fail within a few hours.

I changed the test to allow running it in parallel like so:

@pytest.mark.parametrize('multiplier', xrange(32))
def test_corrupt_files(self, vector, multiplier):

Then I ran it in a loop like so:

i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test
tests/query_test/test_scanners.py::TestParquet::test_corrupt_files
--exploration_strategy=exhaustive -n8; done

Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
---
M be/src/service/impala-beeswax-server.cc
1 file changed, 3 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState

2017-06-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState
..


Patch Set 1:

> > (1 comment)
 > >
 > > This is really subtle. In order to be correct, you'd need at
 > least
 > > a compiler barrier to ensure the writes stay in the intended
 > order.
 > >
 > > However, the HS2 version of the RPCs is already taking the
 > request
 > > state lock: GetOperationStatus(). I think we should just do the
 > > same inside of get_state() and get_log().  Then we can assert,
 > like
 > > the HS2 code, that an error state implies a !ok() status.  Is
 > there
 > > a reason we don't want to take the lock?
 > 
 > You're right, if the compiler reorders the instructions, this won't
 > be helpful. My prior reasoning was that get_log() and get_state()
 > could be called multiple times potentially slowing down the rest of
 > the operations happening under that query if we hold the
 > CRS::lock_. But seeing as how HS2 takes it, it shouldn't be an
 > issue.
 > 
 > Also, I didn't notice this earlier, but we take a global lock
 > inside get_log() and get_state() anyway (client_request_state_map_lock_),
 > so taking a query wide lock shouldn't make these functions all that
 > much slower.

Just be careful to not take the CRS::lock_ while still holding the map lock. 
Follow the same pattern elsewhere by using GetClientRequestState() and then 
take the lock_ after owning a shared ptr to the CLR itself.

-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState

2017-06-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState
..


Patch Set 1: -Code-Review

> (1 comment)
 > 
 > This is really subtle. In order to be correct, you'd need at least
 > a compiler barrier to ensure the writes stay in the intended order.
 > 
 > However, the HS2 version of the RPCs is already taking the request
 > state lock: GetOperationStatus(). I think we should just do the
 > same inside of get_state() and get_log().  Then we can assert, like
 > the HS2 code, that an error state implies a !ok() status.  Is there
 > a reason we don't want to take the lock?

You're right, if the compiler reorders the instructions, this won't be helpful. 
My prior reasoning was that get_log() and get_state() could be called multiple 
times potentially slowing down the rest of the operations happening under that 
query if we hold the CRS::lock_. But seeing as how HS2 takes it, it shouldn't 
be an issue. 

Also, I didn't notice this earlier, but we take a global lock inside get_log() 
and get_state() anyway (client_request_state_map_lock_), so taking a query wide 
lock shouldn't make these functions all that much slower.

-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState

2017-06-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState
..


Patch Set 1:

(1 comment)

This is really subtle. In order to be correct, you'd need at least a compiler 
barrier to ensure the writes stay in the intended order.

However, the HS2 version of the RPCs is already taking the request state lock: 
GetOperationStatus(). I think we should just do the same inside of get_state() 
and get_log().  Then we can assert, like the HS2 code, that an error state 
implies a !ok() status.  Is there a reason we don't want to take the lock?

http://gerrit.cloudera.org:8080/#/c/7155/1//COMMIT_MSG
Commit Message:

PS1, Line 9: ClientRequestState::UpdateQueryStatus()
the race isn't really in that function. It's between that function and the pair 
of beeswax get_state()/get_log() RPCs.  Please clarify.


-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState

2017-06-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState

2017-06-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7155

Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState
..

IMPALA-5427: Fix order of status handling in ClientRequestState

There was a race in ClientRequestState::UpdateQueryStatus() leading to
the rare situation that a query would abort with an error, but the error
message would be empty.

The fix is to update the query_status_ before setting it to preserve the
error message.

To test this I ran test_corrupt_files in a loop for 2 days. Without this
fix, it would usually fail within a few hours.

I changed the test to allow running it in parallel like so:

@pytest.mark.parametrize('multiplier', xrange(32))
def test_corrupt_files(self, vector, multiplier):

Then I ran it in a loop like so:

i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test
tests/query_test/test_scanners.py::TestParquet::test_corrupt_files
--exploration_strategy=exhaustive -n8; done

Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
---
M be/src/service/client-request-state.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker