openQA crash reporting

Adam Williamson Thu, 19 Mar 2015 01:16:33 -0700

hey folks! I mentioned this to jskladan on IRC, but just for the 
permanent record, I'm working on optional crash report submission for 
openQA.


at first I had the workers clicking through the graphical report 
submission process, but that has several problems:

a) needles and keypresses and blah
b) workers don't actually know the job ID or URL, so can't include it 
in the bug report
c) requires inventing some kind of way to get a BZ username and 
password into the workers without it being logged (doable, but just 
unnecessary work, when libreport-plugin-bugzilla already has this set 
up)

so instead I'm doing it in report_job_results.py in 
openqa_fedora_tools. It actually builds off D310, Jan's improvement to 
upload the contents of /var/tmp after a crash.

Given a job_id, we check if there's a var_tmp.tar.gz for that job, and 
if there is, we look for libreport 'problem directories' inside it. If 
we find any, we extract them from the tarball and run 'reporter-
bugzilla -d (directory)' on them.

That's really it in a nutshell, the rest is just error checks and glue 
and frills. There's an attempt to include the web UI job URL in the 
bug report for new crash reports (though so far I've been testing with 
a problem directory that shows up as a dupe of an existing report, so I
haven't tested this yet), and we capture the IDs of the bugs reported.

I also refactored the reporting functions a bit to avoid code 
duplication between calling report_job_results directly and using it 
from openqa_trigger, and made it possible to specify the openQA URL in 
a config file (so you can do result reporting from a system other than 
the openQA host itself - like, fr'instance, a Fedora system with 
libreport-plugin-bugzilla installed...)

To test it out you need a job in some openQA instance which has a 
var_tmp.tar.gz with a crash directory inside it: I've been testing 
with https://openqa.happyassassin.net/tests/2736 . You also need to 
put a valid BZ username and password in 
/etc/libreport/plugins/bugzilla.conf and, unless you're running on the 
openQA host itself (there *are* libreport packages for openSUSE in 
some OBS repository, but I haven't tried them), you'll want to create 
/etc/openqa_fedora.conf with this content:

[site]
url = https://openqa.happyassassin.net

(or whatever URL is appropriate).

Then you can do this:

python report_job_results.py --crashes 2736

(or whatever the job ID is).

This probably still needs a bit more testing and polish before I 
submit it as a differential, but I wanted to give people a heads-up 
that I was working on it and explain the general design. My current 
patch (against 'develop' branch, to which I've merged the 'live' work 
now) is attached.

In case you're wondering what happens with duplicate reports: I tested 
and it seems like 'not a lot'. When calling reporter-bugzilla in this 
way, if the crash has already been reported, it will only generate BZ 
activity if the BZ account in question isn't already on the CC list: 
it will add it. But if the BZ account is already on the CC list, it 
doesn't change the bug at all, it doesn't add the extra comment saying 
'another user encountered this issue'. I checked libreport and it 
actually only does that when some comment text has been provided, and 
we aren't providing one, so it gets skipped.

If we're still worried about noise on dupes it *is* possible to test 
if a bug is a dupe by checking the output of:

reporter-bugzilla -h $(cat duphash)

and completely skip the report submission step if it is, and I 
actually had that written, but took it out as it seemed unnecessary. 
Easy enough to put it back if we want to, though.

In the current version of the patch things are set up so that 
openqa_trigger current or openqa_trigger all or openqa_trigger compose 
--submit-results runs will try and report all crashes, but it's 
absolutely trivial to change that if we only want to report crashes 
via a separate invocation.

Comments welcome!
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

From aa20bc32c79004269c2e10fdaf0fc69ed3ae4d81 Mon Sep 17 00:00:00 2001
From: Adam Williamson <[email protected]>
Date: Wed, 18 Mar 2015 20:58:43 -0700
Subject: [PATCH] allow reporting of crashes

This provides report_job_results with the ability to file bugs
for crashes encountered during tests. It requires a pending
change to openqa_fedora which uploads the contents of /var/tmp
for failed tests; libreport 'problem directories' are located
here.

report_crash() grabs the var_tmp.gz for the given job (if it's
available), extracts any problem directories found, and runs
reporter-bugzilla (with a slightly tweaked config file) on
them. It then finds the bug URL(s) and returns a list of them.
---
 tools/openqa_trigger/report_job_results.py | 186 +++++++++++++++++++++++++----
 1 file changed, 163 insertions(+), 23 deletions(-)

diff --git a/tools/openqa_trigger/report_job_results.py b/tools/openqa_trigger/report_job_results.py
index ef58fd7..d53c450 100644
--- a/tools/openqa_trigger/report_job_results.py
+++ b/tools/openqa_trigger/report_job_results.py
@@ -1,18 +1,123 @@
 import requests
 import argparse
+import ConfigParser
 import os
+import pprint
+import re
+import subprocess32 as subprocess
+import tarfile
+import tempfile
 import time
 import conf_test_suites
 
+# Allow openQA URL to be specified in a config file, for submitting
+# reports from a system other than the openQA host.
+CONFIG = ConfigParser.SafeConfigParser()
+CONFIG.read('{0}/openqa_fedora.conf'.format(path) for path in ('/etc', os.path.expanduser('~')))
+try:
+    SITEURL = CONFIG.get('site', 'url')
+except (ConfigParser.NoSectionError, ConfigParser.NoOptionError):
+    SITEURL = 'http://localhost'
 
-API_ROOT = "http://localhost/api/v1";
+API_ROOT = "{0}/api/v1".format(SITEURL)
 SLEEPTIME = 60
 
+def report_crash(job_id):
+    """
+    job_id ~ int (job id)
+    Returns ~ list of int - bug IDs, if new reports are successfully
+    submitted or dupes are found. List will be empty if no dupes are
+    found and report submission fails.
 
-def get_passed_testcases(job_ids):
+    Report each problem directory found in the var_tmp tarball for a
+    job to Bugzilla, via the command reporter-bugzilla.
+    """
+    # reporter-bugzilla always prints a line identifying the 'final' bug
+    # ID (after dupe detection etc) in this form. Captures the bug ID.
+    bug_regex = re.compile(r'Status.*show_bug\.cgi\?id=(\d{6,8})')
+    # openqa_fedora uploads this when a crash is detected.
+    tarurl = "{0}/tests/{1}/file/var_tmp.tar.gz".format(SITEURL, job_id)
+    probdirs = []
+    bugs = []
+    tmpdir = tempfile.mkdtemp()
+    # Check BZ username and password are configured.
+    try:
+        user = pw = ''
+        bzconf = open('/etc/libreport/plugins/bugzilla.conf', 'r')
+        for line in bzconf:
+            if line.startswith('Login'):
+                user = line.split('=')[-1].strip()
+            if line.startswith('Password'):
+                pw = line.split('=')[-1].strip()
+    except IOError:
+        pass
+    if not user or not pw:
+        print("Bugzilla user name and password must be set in "
+              "/etc/libreport/plugins/bugzilla.conf and readable by the "
+              "user running this command! Cannot report crashes!")
+        return bugs
+
+    try:
+        with open('{0}/var_tmp.gz'.format(tmpdir), 'w') as varfile:
+            varfile.write(requests.get(tarurl).content)
+            vartmp = tarfile.open('{0}/var_tmp.gz'.format(tmpdir))
+    except IOError:
+        # This job has no var_tmp archive. Abort.
+        return bugs
+    members = vartmp.getmembers()
+    for member in members:
+        if member.isfile and member.name.endswith('/duphash'):
+            # The directory this file is inside is a 'problem directory'.
+            dirname = member.name.replace('/duphash', '')
+            try:
+                # Skip any problem dir that has somehow already been reported.
+                vartmp.getmember('{0}/reported_to'.format(dirname))
+                continue
+            except KeyError:
+                probdirs.append(dirname)
+    if not probdirs:
+        # Job has a var_tmp archive, but we didn't find any problem dirs.
+        return bugs
+
+    # Extract all problem directories and their contents from the archive.
+    toget = (mem for mem in members if
+             any(mem.name.startswith(pd) for pd in probdirs))
+    vartmp.extractall(path=tmpdir, members=toget)
+
+    for probdir in probdirs:
+        path = '{0}/{1}'.format(tmpdir, probdir)
+        # Write the job's URL into the problem directory, so reporter-bugzilla
+        # can include it in the bug report later.
+        jobfile = open('{0}/openqajob'.format(path), 'w')
+        jobfile.write('{0}/tests/{1}'.format(SITEURL, job_id))
+        jobfile.close()
+        # This is our slightly tweaked bug format config file which marks
+        # the report as coming from openQA and includes the URL.
+        conf = '{0}/bugzilla_format.conf'.format(os.path.realpath(__file__))
+        args = ('reporter-bugzilla', '-d', path, '-F', conf)
+        try:
+            output = subprocess.check_output(args, stderr=subprocess.STDOUT).decode().splitlines()
+        except OSError:
+            # probably means the command isn't available.
+            print("reporter-bugzilla not installed? Cannot report crashes!")
+            return bugs
+        # Find the bug ID from the output. Should only ever be one, but
+        # let's handle more just in case.
+        for line in output:
+            match = bug_regex.search(line)
+            if match:
+                bugs.append(int(match.group(1)))
+
+    return bugs
+
+def _wait_for_jobs(job_ids):
     """
     job_ids ~ list of int (job ids)
-    Returns ~ list of str - names of passed testcases
+    Returns ~ dict, keys int job id, values string job state
+
+    Wait for all jobs to finish, then return a dict keyed on the job
+    IDs with the value for each job being the dict produced by parsing
+    the JSON state information provided by the API for that job.
     """
     running_jobs = dict([(job_id, "%s/jobs/%s" % (API_ROOT, job_id)) for job_id in job_ids])
     finished_jobs = {}
@@ -27,7 +132,18 @@ def get_passed_testcases(job_ids):
         if running_jobs:
            time.sleep(SLEEPTIME)
 
+    return finished_jobs
+
+def get_passed_testcases(job_ids):
+    """
+    job_ids ~ list of int (job ids)
+    Returns ~ list of str - names of passed testcases
+
+    Wait for all jobs to finish, then derive a dict providing information
+    on which Wikitcms test cases / 'test instances' we have passes for.
+    """
     passed_testcases = {} # key = VERSION_BUILD_ARCH
+    finished_jobs = _wait_for_jobs(job_ids)
     for job_id in job_ids:
         job = finished_jobs[job_id]
         if job['result'] =='passed':
@@ -39,6 +155,16 @@ def get_passed_testcases(job_ids):
         passed_testcases[key] = sorted(list(set(value)))
     return passed_testcases
 
+def get_failed_jobs(job_ids):
+    """
+    job_ids ~ list of int (job ids)
+    Returns ~ list of int - ids of only jobs which failed
+
+    Wait for all jobs to finish, then return a list of only the IDs of
+    jobs which failed.
+    """
+    finished_jobs = _wait_for_jobs(job_ids)
+    return [jid for jid in job_ids if finished_jobs[jid]['result'] == 'failed']
 
 def get_relval_commands(passed_testcases):
     relval_template = "relval report-auto"
@@ -60,32 +186,46 @@ def get_relval_commands(passed_testcases):
 
     return commands
 
+def report_crashes(job_ids):
+    """
+    job_ids ~ list of int (job ids)
 
-def report_results(job_ids):
-    commands = get_relval_commands(get_passed_testcases(job_ids))
-    print "Running relval commands:"
-    for command in commands:
-        print command
-        os.system(command)
+    For each job specified, try and report any crashes that happened to
+    Bugzilla.
+    """
+    bugs = []
+    for job_id in job_ids:
+        bugs.extend(report_crash(job_id))
+    if bugs:
+        print "Reported bugs:"
+        for bug in bugs:
+            print "https://bugzilla.redhat.com/show_bug.cgi?id={0}".format(bug)
+
+def report_passes(job_ids, printcases=False, report=True):
+    passed_testcases = get_passed_testcases(job_ids)
+    if printcases:
+        pprint.pprint(passed_testcases)
+    commands = get_relval_commands(passed_testcases)
+    if report:
+        print "Reporting test passes:"
+        for command in commands:
+            print command
+            os.system(command)
+    else:
+        print "\n\n### No reporting is done! ###\n\n"
+        pprint.pprint(commands)
 
+def report_results(job_ids):
+    report_passes(job_ids)
+    report_crashes(job_ids)
 
 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description="Evaluate per-testcase results from OpenQA job runs")
     parser.add_argument('jobs', type=int, nargs='+')
     parser.add_argument('--report', default=False, action='store_true')
+    parser.add_argument('--crashes', default=False, action='store_true')
 
     args = parser.parse_args()
-
-    passed_testcases = get_passed_testcases(args.jobs)
-    commands = get_relval_commands(passed_testcases)
-
-    import pprint
-    pprint.pprint(passed_testcases)
-    if not args.report:
-        print "\n\n### No reporting is done! ###\n\n"
-        pprint.pprint(commands)
-    else:
-        for command in commands:
-            print command
-            os.system(command)
-
+    report_passes(args.jobs, printcases=True, report=args.report)
+    if args.crashes:
+        report_crashes(args.jobs)
-- 
2.3.2

_______________________________________________
qa-devel mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/qa-devel

openQA crash reporting

Reply via email to