[PATCH v3 1/9] tests: introduce tree-wide code style checking

Daniel P . Berrangé Thu, 07 Jul 2022 09:41:27 -0700

Historically QEMU has used the 'scripts/checkpatch.pl' script to
validate various style rules but there are a number of issues:


 - Contributors / maintainers are reluctant to add new
   tests to it, nor fix existint rules, because the Perl
   code is much too hard to understand for most people.

 - It requires the contributor to remember to run it as it
   is not wired into 'make check'

 - While it can be told to check whole files, in practice
   it is usually only used to check patch diffs, because the
   former would flag up pages of pre-existing violations that
   have never been fixed

 - It is said to be OK to ignore some things reported by the
   script, but don't record these exceptional cases anywere.
   Thus contributors looking at existing violations in tree
   are never sure whether they are intentional or historical
   unfixed technical debt.

 - There is no distinct reporting for each type of check
   performed and as a consequence there is also no way to
   mark particular files to be skipped for particular checks

This commit aims to give us a better approach to checking many
types of code style problems by introducing a flexible and simple
way to define whole tree style checks.

The logic provide is inspired by the GNULIB 'top/maint.mk' file,
but has been re-implemented in a simple Python script, using a
YAML config file, in an attempt to make it easier to understand
than the make rules.

This commit does the bare minimum introducing the basic infra:

 - tests/style.py - the script for evaluating coding style rules
 - tests/style.yml - the config defining the coding style rules

The concept behind the style checking is to perform simple regular
expression matches across the source file content.

The key benefit of regular expression matching is that it is very
fast, and can match against all types of source file in the repository,
regardless of whether it is used in the current build, or the language
the source is written in.

The downside, compared to a compiler based approach (eg libclang) is
that it does not have semantic understanding of the code, which makes
some checks difficult to implement.

As such this style matching framework is not proposed as a solution for
all possible coding style rules. It is general enough that it can
accomplish many useful checks, and is intended to be complimentary to
any style checkers with semantic knowledge of the code like libclang,
or pylint/flake8.

It would be possible to use Python's regular expression engine to
perform this matching directly, it instead calls out to 'grep' (for
single line matches) and 'perl' (for multiline matches). These are
highly optimized regex engines, so should maximise performance. They
also avoid problems with python's charset encoding causing it to
throw exceptions when encountering invalid utf8, rather than continue
on a best effort basis.

In terms of defining checks, a short bit of yaml is all that is
required. For example, consider we want to stop people using the
'bool' type entirely in C source files. A rule in tests/style.yml
would say

  prohibit_bool:
        files: \.c$
        prohibit: \bbool\b
        message: do not use the bool type

The 'prohibit' rule is matched line-wise across every .c source
file. If any violation is found, the contents of that line are
printed, and 'message' is shown as a error message.

There are many more advanced options, which are documented in
comments in the style.yml file in this commit.

The tool can be invoked directly

   ./tests/style.py --config test/style.yml check

Or for individual checks

   ./tests/style.py --config test/style.yml check --rule prohibit_bool

If a file is known to intentionally violate a style check rule
this can be recorded in the style.yml and will result in it being
ignored.  The '--ignored' flag can be used to see ignored failures.

This is all wired up into meson, such that a 'style' test suite is
defined and each individual style check is exposed as a test case.

This results in creation of a 'make check-style' target that is
triggerd by 'make check' by default.

Note that the checks require the use of 'git' to detect the list of
source files to search. Thus the check is skipped when not running
from a git repository.

Signed-off-by: Daniel P. Berrangé <berra...@redhat.com>
---
 meson.build            |   2 +
 tests/Makefile.include |   3 +-
 tests/meson.build      |  17 ++++
 tests/style.py         | 218 +++++++++++++++++++++++++++++++++++++++++
 tests/style.yml        |  88 +++++++++++++++++
 5 files changed, 327 insertions(+), 1 deletion(-)
 create mode 100755 tests/style.py
 create mode 100644 tests/style.yml

diff --git a/meson.build b/meson.build
index 65a885ea69..d8ef24bacb 100644
--- a/meson.build
+++ b/meson.build
@@ -18,6 +18,8 @@ config_host = keyval.load(meson.current_build_dir() / 
'config-host.mak')
 enable_modules = 'CONFIG_MODULES' in config_host
 enable_static = 'CONFIG_STATIC' in config_host
 
+in_gitrepo = run_command('test', '-d', '.git', check: false).returncode() == 0
+
 # Allow both shared and static libraries unless --enable-static
 static_kwargs = enable_static ? {'static': true} : {}
 
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 3accb83b13..f7c1d2644e 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -3,12 +3,13 @@
 .PHONY: check-help
 check-help:
        @echo "Regression testing targets:"
-       @echo " $(MAKE) check                  Run block, qapi-schema, unit, 
softfloat, qtest and decodetree tests"
+       @echo " $(MAKE) check                  Run block, qapi-schema, unit, 
style, softfloat, qtest and decodetree tests"
        @echo " $(MAKE) bench                  Run speed tests"
        @echo
        @echo "Individual test suites:"
        @echo " $(MAKE) check-qtest-TARGET     Run qtest tests for given target"
        @echo " $(MAKE) check-qtest            Run qtest tests"
+       @echo " $(MAKE) check-style            Run style checks"
        @echo " $(MAKE) check-unit             Run qobject tests"
        @echo " $(MAKE) check-qapi-schema      Run QAPI schema tests"
        @echo " $(MAKE) check-block            Run block tests"
diff --git a/tests/meson.build b/tests/meson.build
index 8e318ec513..f3140428c3 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -89,6 +89,23 @@ if get_option('tcg').allowed()
   endif
 endif
 
+if in_gitrepo
+  stylecmd = files('style.py')
+  stylecfg = files('style.yml')
+
+  checks = run_command(
+      stylecmd, '--config', stylecfg, 'list',
+      check: true)
+
+  foreach check: checks.stdout().strip().split()
+     test(check,
+          stylecmd,
+          args: [ '--config', stylecfg, 'check', '--rule', check ],
+          workdir: meson.project_source_root(),
+          suite: 'style')
+  endforeach
+endif
+
 subdir('unit')
 subdir('qapi-schema')
 subdir('qtest')
diff --git a/tests/style.py b/tests/style.py
new file mode 100755
index 0000000000..a6c05bbb32
--- /dev/null
+++ b/tests/style.py
@@ -0,0 +1,218 @@
+#!/usr/bin/python
+
+import argparse
+import re
+import subprocess
+import sys
+import time
+import yaml
+
+
+def source_files():
+    src = subprocess.check_output(
+        ["git", "ls-tree", "--name-only", "-r", "HEAD:"])
+
+    return src.decode("utf8").strip().split("\n")
+
+
+# Expand a regular expression from the config file which
+# can be in several formats
+#
+#  - a plain string - used as-is as a regular expression
+#  - a list of strings - each element is joined with '|'
+#  - a dict containing
+#      - 'terms' - interpreted as a string / list of strings
+#      - 'prefix' - added to the front of the regular
+#      - 'prefix' - added to the end of the regular
+#
+# Returns: a regulare expression string
+def expand_re(restr):
+    if restr is None:
+        return None
+
+    if type(restr) == list:
+        return "|".join(restr)
+
+    if type(restr) == dict:
+        terms = "(?:" + expand_re(restr["terms"]) + ")"
+
+        return restr.get("prefix", "") + terms + restr.get("suffix", "")
+
+    return restr
+
+
+# Expand the regular expression and then compile it
+#
+# Returns: a compiled regular expresison object for matching
+def compile_re(restr):
+    if restr is None:
+        return None
+
+    return re.compile(expand_re(restr))
+
+
+# Take a list of source files and filter it returning a subset
+#
+# If @match is non-NULL, it is expanded as a regular expression
+# and the source file name is included if-and-only-if it matches
+# the regex.
+#
+# If @nonmatch is non-NULL, it is expanded as a regular expression
+# and the source file name is excluded if-and-only-if it matches
+# the regex.
+#
+# Returns: the filtered list of soruces
+def filtered_sources(sources, match, nonmatch):
+    matchre = compile_re(match)
+    nonmatchre = compile_re(nonmatch)
+
+    filtered = []
+    for name in sources:
+        if ((matchre is None or matchre.search(name)) and
+            (nonmatchre is None or not nonmatchre.search(name))):
+            filtered.append(name)
+    return filtered
+
+
+# Sanity check the configuration of a rule
+#
+# Returns: true if the rule is valid
+def validate(name, rule):
+    if "prohibit" not in rule and "require" not in rule:
+        raise Exception("Either 'prohibit' or 'require' regex is needed")
+
+    if "prohibit" in rule and "require" in rule:
+        raise Exception("Only one of 'prohibit' or 'require' regex is needed")
+
+
+# Evalate the rule against the designated sources
+#
+# Returns: 1 if the rule failed against one or more sources, 0 otherwise
+def evaluate(sources, name, rule, ignored=False):
+    if not rule.get("enabled", True):
+        return
+
+    validate(name, rule)
+
+    ignorere = None
+    if not ignored:
+        ignorere = rule.get("ignore")
+
+    print("CHECK: %s: " % name, end='')
+    sources = filtered_sources(sources,
+                               rule.get("files"),
+                               ignorere)
+
+    input = "\n".join(sources)
+
+    then = time.time()
+
+    # For single line matching, 'grep' is most efficient,
+    # but it can't do the required multi-line matching
+    # so for the latter we turn to 'perl'
+    if not rule.get("multiline", False):
+        if "prohibit" in rule:
+            # The output is the list of lines that have invalid content
+            proc = subprocess.run(["xargs", "grep", "-nE",
+                                   expand_re(rule["prohibit"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        elif "require" in rule:
+            # The output is the list of filenames which don't have
+            # the required content
+            proc = subprocess.run(["xargs", "grep", "-LE",
+                                   expand_re(rule["require"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        else:
+            raise Exception("Unexpected rule config")
+    else:
+        if "prohibit" in rule:
+            # The output is the list of lines that have invalid content
+            proc = subprocess.run(["xargs", "perl", "-0777", "-ne",
+                                   (r'while (m,%s,gs) {' + \
+                                    r'    $n = ($` =~ tr/\n/\n/ + 1);' + \
+                                    r'    ($v = $&) =~ s/\n/\\n/g;' + \
+                                    r'    print "$ARGV:$n:$v\n";' +\
+                                    r'}') % expand_re(rule["prohibit"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        elif "require" in rule:
+            # The output is the list of filenames which don't have
+            # the required content
+            proc = subprocess.run(["xargs", "perl", "-0777", "-ne",
+                                   ("unless (m,%s,s) {" + \
+                                    "    print \"$ARGV\n\";" + \
+                                    "}") % expand_re(rule["require"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        else:
+            raise Exception("Unexpected rule config")
+
+        if proc.returncode != 0:
+            raise Exception(proc.stderr)
+
+    now = time.time()
+    delta = now - then
+
+    if len(proc.stdout) > 0:
+        print("\033[31;1mFAIL\033[0m ❌ (%0.2f secs)" % delta)
+        print(proc.stdout.strip())
+        print("\033[31;1mERROR\033[0m: %s: %s ❌" % (name, rule["message"]))
+        return 1
+    else:
+        print("\033[32;1mPASS\033[0m ✅ (%0.2f secs)" % delta)
+        return 0
+
+
+def parse_args():
+    parser = argparse.ArgumentParser("Code style checker")
+    parser.add_argument("--config",
+                        default="tests/style.yml",
+                        help="Path to style rules file")
+
+    subparsers = parser.add_subparsers(dest="command")
+    subparsers.required = True
+
+    list = subparsers.add_parser("list", help="list rules")
+
+    check = subparsers.add_parser("check", help="check rules")
+    check.add_argument("--rule",
+                       help="Name of rule to check")
+    check.add_argument("--ignored",
+                       action="store_true",
+                       help="Show intentionally ignored violations")
+
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+
+    sources = source_files()
+
+    with open(args.config, "r") as fh:
+        rules = yaml.safe_load(fh)
+        if rules is None:
+            rules = {}
+
+    if args.command == "list":
+        for name, rule in rules.items():
+            if rule.get("enabled", True):
+                print(name)
+    elif args.command == "check":
+        errs = 0
+        for name, rule in rules.items():
+            if args.rule == None or args.rule == name:
+                errs += evaluate(sources, name, rule, args.ignored)
+        if errs:
+            return 1
+    else:
+        raise Exception("unknown command '%s'" % args.command)
+    return 0
+
+try:
+    sys.exit(main())
+except Exception as e:
+    print("ERROR: %s: %s" % (sys.argv[0], str(e)))
+    sys.exit(2)
diff --git a/tests/style.yml b/tests/style.yml
new file mode 100644
index 0000000000..b4e7c6111f
--- /dev/null
+++ b/tests/style.yml
@@ -0,0 +1,88 @@
+# Source code style checking rules
+#
+# Each top level key defines a new check, that is
+# exposed as a test case in the meson 'style' test
+# suite.
+#
+# Within each check, the following keys are valid
+#
+#  * files
+#
+#    A regular expression matching filenames that
+#    are to be checked. Typically used to filter
+#    based on file extension. If omitted all files
+#    managed by git will be checked.
+#
+#  * prohibit
+#
+#    A regular expression matching content that is
+#    not allowed to be present in source files. Matches
+#    against single lines of text, unless 'multiline'
+#    option overrides. Either this option or 'require'
+#    must be present
+#
+#  * require
+#
+#    A regular expression matching content that must
+#    always be present in source files. Matches against
+#    single lines of text, unless 'multiline' option
+#    overrides. Either this option of 'prohibit' must
+#    be present
+#
+#  * multiline
+#
+#    A boolean controlling whether 'prohibit' and 'require'
+#    regular expressions match single lines or the entire
+#    file contents. Defaults to 'false', matching single
+#    lines at a time.
+#
+#  * ignore
+#
+#    A regular expression matching files to exclude from
+#    the check. This is typically used when certain files
+#    otherwise checked have known acceptable violations
+#    of the test.
+#
+#  * message
+#
+#    A string providing a message to emit when the test
+#    condition fails. Must be present
+#
+#  * enabled
+#
+#    A boolean providing a way to temporarily disable
+#    a check. Defaults to 'true' if omitted.
+#
+# For all the keys above which accept a regular expression,
+# one of three syntaxes are permitted
+#
+#  * string
+#
+#    The full regular expression to match
+#
+#  * list of strings
+#
+#    Each element of the list will be combined with '|'
+#    to form the final regular expression. This is typically
+#    useful to keep line length short when specifying matches
+#    across many filenames
+#
+#  * dict
+#
+#    Contains the keys:
+#
+#      * terms
+#
+#        Either a string or list of strings interpreted as above
+#
+#      * prefix
+#
+#        A match added to the front of the regex. Useful when
+#        'terms' is a list of strings and a common prefix is
+#        desired
+#
+#      * suffix
+#
+#        A match added to the front of the regex. Useful when
+#        'terms' is a list of strings and a common prefix is
+#        desired
-- 
2.36.1

[PATCH v3 1/9] tests: introduce tree-wide code style checking

Reply via email to