Am 31.10.22 um 08:52 schrieb Thomas Huth:

On 31/10/2022 08.43, Stefan Weil wrote:
`make check-spelling` can now be used to get a list of spelling errors.
It uses the latest version of codespell, a spell checker implemented in Python.

Signed-off-by: Stefan Weil <s...@weilnetz.de>
---

This RFC can already be used for manual tests, but still reports false
positives, mostly because some variable names are interpreted as words.
These words can either be ignored in the check, or in some cases the code
might be changed to use different variable names.

The check currently only skips a few directories and files, so for example
checked out submodules are also checked.

The rule can be extended to allow user provided ignore and skip lists,
for example by introducing Makefile variables CODESPELL_SKIP=userfile
or CODESPELL_IGNORE=userfile. A limited check could be implemented by
providing a base directory CODESPELL_START=basedirectory, for example
CODESPELL_START=docs.

Regards,
Stefan
[...]
I like the idea, but I think it's unlikely that we can make this work for the whole source tree any time soon. So maybe it makes more sense to start with some few directories first (e.g. docs/ ) and then the maintainers can opt-in by cleaning up their directories first and then by adding their directories to this target here?

 Thomas


Even without implementing CODESPELL_START as described above, the script can already be used and integrated into CI scripts.

It takes about 60 seconds to check the whole source tree including submodules on my (slow) virtual machine.

The resulting output has about 20000 lines or 1272 KiB. It can be filtered for relevant parts of the source tree or used for a summary.

Sample script: grep "^[.]" spellcheck.log | sed s/^..// | sed 's/\/.*//' | sed s/:.*// | sort | uniq -c

This produces a summary for the top level hierarchy of files and directories:

      3 accel
      1 audio
      1 backends
     77 block
      7 block.c
     20 bsd-user
    386 capstone
     12 chardev
      1 configure
      8 contrib
      6 crypto
     64 disas
     32 docs
     31 dtc
      8 fpu
      1 gdbstub
      1 gdb-xml
      1 .github
    537 hw
      7 inc
    114 include
      1 libdecnumber
     33 linux-user
      1 MAINTAINERS
    150 meson
      6 meson.build
     16 migration
      1 nbd
      5 net
     12 pc-bios
      7 python
      3 qapi
      2 qemu
      5 qemu-options.hx
     22 qga
  14175 roms
     43 scripts
      3 semihosting
     18 slirp
      2 softmmu
     59 subprojects
    504 target
      6 tcg
      3 test.rb
    175 tests
      6 tools
     20 ui
      8 util

It shows that "roms" contributes by far the most typos. Omitting it would reduce the required time to 22 seconds and the number of typos found (2947 lines in output) very much.

"capstone" (which has no entry in MAINTAINERS), "target" and "hw" also contribute more than 300 hits each, therefore cc'ing Richard.

Stefan

Attachment: OpenPGP_0xE08C21D5677450AD.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to