Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails ...................................................................... IMPALA-9864: Produce a minidump when TestValidateMetrics fails After running end-to-end tests, run-tests.py runs verifiers to check that a set of metrics are zero. When this fails, it can indicate a hung query fragment or other resource leak (see IMPALA-9842 for example). To track this down, it is useful to have a minidump, so this adds a step to have every Impalad/Catalogd generate a minidump (by sending SIGUSR1) when we hit the timeout. Also, the current error message dumps a bunch of unformatted JSON from our Web UI. This is hard to read and painful to cut/paste. This now dumps that JSON to files in a diagnostic directory under the logs directory. The JSON is formatted in a readable way. These files would be preserved along with the rest of the logs directory for automated runs. The new error message looks like this: E AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s. E Dumping debug webpages in JSON format... E Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/memz.json E Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/metrics.json E Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/queries.json E Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/sessions.json E Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/threadz.json E Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/rpcz.json E Dumping minidumps for impalads/catalogds... E Dumped minidump for Impalad PID 2709 E Dumped minidump for Impalad PID 2714 E Dumped minidump for Impalad PID 2721 E Dumped minidump for Catalogd PID 2627 This also fixes various flake8 errors (unnecessary imports, etc), so now impala_service.py is flake8 clean. Testing: - Tried out the dump function on my developer machine - Verified the minidumps exist - Verified the JSON is readable Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d Reviewed-on: http://gerrit.cloudera.org:8080/16690 Reviewed-by: Qifan Chen <[email protected]> Reviewed-by: Csaba Ringhofer <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M tests/common/impala_service.py 1 file changed, 89 insertions(+), 21 deletions(-) Approvals: Qifan Chen: Looks good to me, but someone else must approve Csaba Ringhofer: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d Gerrit-Change-Number: 16690 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]>
