[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout Recently, we see many timeout failures of test_concurrent_ddls.py in S3 builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful to dump the server stacktraces so we can understand why some RPCs are slow/stuck. This patch extracts the logic of dumping stacktraces in script-timeout-check.sh to a separate script, dump-stacktraces.sh. The script also dumps jstacks of HMS and NameNode. Dumping all these stacktraces is time-consuming so we do them in parallel, which also helps to get consistent snapshots of all servers. When any tests in test_concurrent_ddls.py timeout, we use dump-stacktraces.sh to dump the stacktraces before exit. Previously, some tests depend on pytest.mark.timeout for detecting timeouts. It's hard to add a customized callback for dumping server stacktraces. So this patch refactors test_concurrent_ddls.py to only use timeout of multiprocessing. Tests: - Tested the scripts locally. - Verified the error handling of timeout logics in Jenkins jobs Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Reviewed-on: http://gerrit.cloudera.org:8080/16800 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- A bin/dump-stacktraces.sh M bin/script-timeout-check.sh M tests/custom_cluster/test_concurrent_ddls.py M tests/util/shell_util.py 4 files changed, 105 insertions(+), 38 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 03 Dec 2020 08:05:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 03 Dec 2020 02:30:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6725/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 03 Dec 2020 02:30:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 3: Code-Review+2 (1 comment) Thank Joe! Carry on the +2. http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG@15 PS2, Line 15: dump-stacktraces.sh. > Nit: dump-stacktraces.sh Done -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 03 Dec 2020 02:24:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16800 to look at the new patch set (#3). Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout Recently, we see many timeout failures of test_concurrent_ddls.py in S3 builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful to dump the server stacktraces so we can understand why some RPCs are slow/stuck. This patch extracts the logic of dumping stacktraces in script-timeout-check.sh to a separate script, dump-stacktraces.sh. The script also dumps jstacks of HMS and NameNode. Dumping all these stacktraces is time-consuming so we do them in parallel, which also helps to get consistent snapshots of all servers. When any tests in test_concurrent_ddls.py timeout, we use dump-stacktraces.sh to dump the stacktraces before exit. Previously, some tests depend on pytest.mark.timeout for detecting timeouts. It's hard to add a customized callback for dumping server stacktraces. So this patch refactors test_concurrent_ddls.py to only use timeout of multiprocessing. Tests: - Tested the scripts locally. - Verified the error handling of timeout logics in Jenkins jobs Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 --- A bin/dump-stacktraces.sh M bin/script-timeout-check.sh M tests/custom_cluster/test_concurrent_ddls.py M tests/util/shell_util.py 4 files changed, 105 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16800/3 -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 2: Code-Review+2 (1 comment) This makes sense to me. Thanks for debugging this! http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG@15 PS2, Line 15: script-timeout-check.sh Nit: dump-stacktraces.sh -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 03 Dec 2020 01:12:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7759/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 01 Dec 2020 12:02:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16800 to look at the new patch set (#2). Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout Recently, we see many timeout failures of test_concurrent_ddls.py in S3 builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful to dump the server stacktraces so we can understand why some RPCs are slow/stuck. This patch extracts the logic of dumping stacktraces in script-timeout-check.sh to a separate script, script-timeout-check.sh. The script also dumps jstacks of HMS and NameNode. Dumping all these stacktraces is time-consuming so we do them in parallel, which also helps to get consistent snapshots of all servers. When any tests in test_concurrent_ddls.py timeout, we use dump-stacktraces.sh to dump the stacktraces before exit. Previously, some tests depend on pytest.mark.timeout for detecting timeouts. It's hard to add a customized callback for dumping server stacktraces. So this patch refactors test_concurrent_ddls.py to only use timeout of multiprocessing. Tests: - Tested the scripts locally. - Verified the error handling of timeout logics in Jenkins jobs Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 --- A bin/dump-stacktraces.sh M bin/script-timeout-check.sh M tests/custom_cluster/test_concurrent_ddls.py M tests/util/shell_util.py 4 files changed, 105 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16800/2 -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7756/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 01 Dec 2020 09:17:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16800/1/bin/dump-stacktraces.sh File bin/dump-stacktraces.sh: http://gerrit.cloudera.org:8080/#/c/16800/1/bin/dump-stacktraces.sh@53 PS1, Line 53: collect_gdb_backtraces catalogd $CATALOGD_PID && collect_jstacks catalogd $CATALOGD_PID & line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/16800/1/tests/util/shell_util.py File tests/util/shell_util.py: http://gerrit.cloudera.org:8080/#/c/16800/1/tests/util/shell_util.py@32 PS1, Line 32: def dump_server_stacktraces(): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 01 Dec 2020 08:56:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16800 Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout Recently, we see many timeout failures of test_concurrent_ddls.py in S3 builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful to dump the server stacktraces so we can understand why some RPCs are slow/stuck. This patch extracts the logic of dumping stacktraces in script-timeout-check.sh to a separate script, script-timeout-check.sh. The script also dumps jstacks of HMS and NameNode. Dumping all these stacktraces is time-consuming so we do them in parallel, which also helps to get consistent snapshots of all servers. When any tests in test_concurrent_ddls.py timeout, we use dump-stacktraces.sh to dump the stacktraces before exit. Previously, some tests depend on pytest.mark.timeout for detecting timeouts. It's hard to add a customized callback for dumping server stacktraces. So this patch refactors test_concurrent_ddls.py to only use timeout of multiprocessing. Tests: - Tested the scripts locally. - (WIP) Run jenkins jobs. Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 --- A bin/dump-stacktraces.sh M bin/script-timeout-check.sh M tests/custom_cluster/test_concurrent_ddls.py M tests/util/shell_util.py 4 files changed, 103 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16800/1 -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang