[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function Recent patch for IMPALA-5746 registers a callback function for the updating of cluster membership. The callback function cancels the queries scheduled by the failed coordinators. This callback function was called during Expr-test and caused crash. This patch checks if the process running for tests and only registers the callback function if it's not running for BE/FE tests. Testing: - The issue could be reproduced by running expr-test for 10-20 iterations. Verified the fixing by running expr-test over 1000 iterations without crash. - Passed TestProcessFailures::test_kill_coordinator. - Passed core tests. Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Reviewed-on: http://gerrit.cloudera.org:8080/16299 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- M be/src/runtime/exec-env.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Thomas Tauber-Marshall: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 07 Aug 2020 23:50:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6830/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 07 Aug 2020 18:45:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6248/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 07 Aug 2020 18:31:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 07 Aug 2020 18:31:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function Recent patch for IMPALA-5746 registers a callback function for the updating of cluster membership. The callback function cancels the queries scheduled by the failed coordinators. This callback function was called during Expr-test and caused crash. This patch checks if the process running for tests and only registers the callback function if it's not running for BE/FE tests. Testing: - The issue could be reproduced by running expr-test for 10-20 iterations. Verified the fixing by running expr-test over 1000 iterations without crash. - Passed TestProcessFailures::test_kill_coordinator. - Passed core tests. Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 --- M be/src/runtime/exec-env.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16299/4 -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. Patch Set 3: (1 comment) TestInfo part is required to stop expr-test from crashing. I will check in this part first to stop the builds from breaking, and fix the other part in a separate patch. http://gerrit.cloudera.org:8080/#/c/16299/3/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/16299/3/be/src/runtime/query-state.cc@863 PS3, Line 863: discard_result(initialized_.Get(1, &timed_out)); > So using this timeout here is less than ideal (for example because how do w agree -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 07 Aug 2020 18:17:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. Patch Set 3: (1 comment) If I'm understanding correctly, just the TestInfo part is required to stop expr-test from crashing, so if you want to submit that as a patch by itself and then take my suggestion for fixing the other part and submit it separately, that might be nice so that we can stop the builds from breaking as much. Depends a bit on how quickly you'll be able to get the rest of the stuff updated, tested, and resubmitted. http://gerrit.cloudera.org:8080/#/c/16299/3/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/16299/3/be/src/runtime/query-state.cc@863 PS3, Line 863: discard_result(initialized_.Get(1, &timed_out)); So using this timeout here is less than ideal (for example because how do we choose a reasonable value for how long to wait?), and I'm concerned this may still not be correct, eg. it seems that you're assuming that if this times out then Init() won't get called at some point after that, which I don't think is always true. I think that there's a better way to do this that doesn't require the timeout: something like have a 'bool is_initialized_' and a 'std::mutex init_lock_'. Have Init() take 'init_lock_', check if is_cancelled_ is true and if it is return and don't init, otherwise continue holding init_lock_ until Init() is done and 'is_initialized_' is set to true. Then Cancel() will do the equivalent: also take 'init_lock_', check if is_initialized is true, if not just sset is_cancelled_ true and return without doing the cancel stuff. -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 07 Aug 2020 17:24:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6815/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 06 Aug 2020 20:52:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16299 ) Change subject: IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function .. IMPALA-10039: Fixed Expr-test crash caused by thread unsafe function Recent patch for IMPALA-5746 registers a callback function for the updating of cluster membership. The callback function cancels the queries scheduled by the failed coordinators. This callback function was called during Expr-test and caused crash since QueryState::Cancel() was called before thread unsafe function QueryState::Init() was completed. This patch make QueryState::Cancel() to wait until QueryState::Init() is completed, checks if the process running for tests and only registers the callback function if it's not running for BE/FE tests. Testing: - The issue could be reproduced by running expr-test for 10-20 iterations. Verified the fixing by running expr-test over 1000 iterations without crash. - Passed TestProcessFailures::test_kill_coordinator. - Passed core tests. Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 --- M be/src/runtime/exec-env.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h 3 files changed, 23 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16299/3 -- To view, visit http://gerrit.cloudera.org:8080/16299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I85245bf4bffb469913d53741847e67773b7d4627 Gerrit-Change-Number: 16299 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall