[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7155/2/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 253: return request_state->query_state(); You need to take request_state->lock() before the return here. Line 283: stringstream error_log_ss; Same here. You need to take request_state->lock() before retrieving the query status and details. -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState
Lars Volker has uploaded a new patch set (#2). Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState .. IMPALA-5427: Fix order of status handling in ClientRequestState There was a race in ClientRequestState::UpdateQueryStatus() leading to the rare situation that a query would abort with an error, but the error message would be empty. The fix is to update the query_status_ before setting it to preserve the error message. To test this I ran test_corrupt_files in a loop for 2 days. Without this fix, it would usually fail within a few hours. I changed the test to allow running it in parallel like so: @pytest.mark.parametrize('multiplier', xrange(32)) def test_corrupt_files(self, vector, multiplier): Then I ran it in a loop like so: i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test tests/query_test/test_scanners.py::TestParquet::test_corrupt_files --exploration_strategy=exhaustive -n8; done Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 --- M be/src/service/impala-beeswax-server.cc 1 file changed, 3 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/2 -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState
Dan Hecht has posted comments on this change. Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState .. Patch Set 1: > > (1 comment) > > > > This is really subtle. In order to be correct, you'd need at > least > > a compiler barrier to ensure the writes stay in the intended > order. > > > > However, the HS2 version of the RPCs is already taking the > request > > state lock: GetOperationStatus(). I think we should just do the > > same inside of get_state() and get_log(). Then we can assert, > like > > the HS2 code, that an error state implies a !ok() status. Is > there > > a reason we don't want to take the lock? > > You're right, if the compiler reorders the instructions, this won't > be helpful. My prior reasoning was that get_log() and get_state() > could be called multiple times potentially slowing down the rest of > the operations happening under that query if we hold the > CRS::lock_. But seeing as how HS2 takes it, it shouldn't be an > issue. > > Also, I didn't notice this earlier, but we take a global lock > inside get_log() and get_state() anyway (client_request_state_map_lock_), > so taking a query wide lock shouldn't make these functions all that > much slower. Just be careful to not take the CRS::lock_ while still holding the map lock. Follow the same pattern elsewhere by using GetClientRequestState() and then take the lock_ after owning a shared ptr to the CLR itself. -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState .. Patch Set 1: -Code-Review > (1 comment) > > This is really subtle. In order to be correct, you'd need at least > a compiler barrier to ensure the writes stay in the intended order. > > However, the HS2 version of the RPCs is already taking the request > state lock: GetOperationStatus(). I think we should just do the > same inside of get_state() and get_log(). Then we can assert, like > the HS2 code, that an error state implies a !ok() status. Is there > a reason we don't want to take the lock? You're right, if the compiler reorders the instructions, this won't be helpful. My prior reasoning was that get_log() and get_state() could be called multiple times potentially slowing down the rest of the operations happening under that query if we hold the CRS::lock_. But seeing as how HS2 takes it, it shouldn't be an issue. Also, I didn't notice this earlier, but we take a global lock inside get_log() and get_state() anyway (client_request_state_map_lock_), so taking a query wide lock shouldn't make these functions all that much slower. -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState
Dan Hecht has posted comments on this change. Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState .. Patch Set 1: (1 comment) This is really subtle. In order to be correct, you'd need at least a compiler barrier to ensure the writes stay in the intended order. However, the HS2 version of the RPCs is already taking the request state lock: GetOperationStatus(). I think we should just do the same inside of get_state() and get_log(). Then we can assert, like the HS2 code, that an error state implies a !ok() status. Is there a reason we don't want to take the lock? http://gerrit.cloudera.org:8080/#/c/7155/1//COMMIT_MSG Commit Message: PS1, Line 9: ClientRequestState::UpdateQueryStatus() the race isn't really in that function. It's between that function and the pair of beeswax get_state()/get_log() RPCs. Please clarify. -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5427: Fix order of status handling in ClientRequestState
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/7155 Change subject: IMPALA-5427: Fix order of status handling in ClientRequestState .. IMPALA-5427: Fix order of status handling in ClientRequestState There was a race in ClientRequestState::UpdateQueryStatus() leading to the rare situation that a query would abort with an error, but the error message would be empty. The fix is to update the query_status_ before setting it to preserve the error message. To test this I ran test_corrupt_files in a loop for 2 days. Without this fix, it would usually fail within a few hours. I changed the test to allow running it in parallel like so: @pytest.mark.parametrize('multiplier', xrange(32)) def test_corrupt_files(self, vector, multiplier): Then I ran it in a loop like so: i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test tests/query_test/test_scanners.py::TestParquet::test_corrupt_files --exploration_strategy=exhaustive -n8; done Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 --- M be/src/service/client-request-state.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/1 -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker