Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14794 )
Change subject: IMPALA-9196: Dump jstack and collect logs when tests timeout ...................................................................... Patch Set 2: > > Patch Set 2: > > > > > I'm uncertain about how the privileges work. There are ptrace > > > limitations in Ubuntu that restrict ptrace by the same user to > a > > > parent process, which I think is why the gdb part of this > script > > > works. I'm not sure what permissions jstack would need, and if > this > > > would work. > > > > > > If you haven't already, a test that you could run for the > > > permissions is to run the end to end tests and set > > > TIMEOUT_FOR_RUN_ALL_TESTS_MINS to some modest value (15 mins) > and > > > verify you get the logs you want and jstack works. > > > > > > Once we verify that the permissions are ok in the normal way > we run > > > this, the code looks good to me. > > > > Circling back to this review. My guess is that this doesn't work > in its current form on Ubuntu, but it might work on other > platforms. > > > > It looks like it is harmless if these debug commands fail > (because the script doesn't have "set -euo pipefail"). I think any > step forward in this debugging information is ok to merge as long > as it improves some platform without regressing anything. We should > add comments about dump statements that don't work on some > platforms, but that shouldn't stop us from adding statements that > do work on Centos7 or some other platform. Obviously, it would be > nice for these things to work on Ubuntu. > > I'm still testing this script in internal jenkeins jobs. It looks > wired to me that the script fails with "lsof: command not found". > But when installing lsof explictly, it saids it's already > installed: > > ++ sudo yum install -y lsof > Loaded plugins: fastestmirror > Loading mirror speeds from cached hostfile > Package lsof-4.87-4.el7.x86_64 already installed and latest version > Nothing to do > ++ which lsof > which: no lsof in > (/usr/lib64/qt-3.3/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin) > > I think it's the problem with PATH. Will check it later. Internal > job link: > https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/6139 Just as a reminder, it is important not to post links that are not publicly accessible. Reviews need to be conducted in a way that everyone can participate. This also protects any companies that may participate in Apache projects. About the lsof: Unfortunately, this is an area where different Linux distributions will be different. We've dealt with that in a couple ways: 1. Try to find a subset that works. If we are copying some log files, copying too many log files is pretty harmless as long as we get the ones we want. I think if you found a command that listed the most recent 10 logs files that were modified in that directory using basic utilities like find, it would work on all Linux distributions. 2. In other scripts like bin/bootstrap_system.sh, we have commands that are conditional on the Linux version. -- To view, visit http://gerrit.cloudera.org:8080/14794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib8a5b140024c236209c7e44149660189890b9d06 Gerrit-Change-Number: 14794 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 04 Dec 2019 01:14:03 +0000 Gerrit-HasComments: No
