Re: [PATCH v8 00/35] fix CONFIG_DRM_USE_DYNAMIC_DEBUG=y regression
On Mon, Apr 29, 2024 at 1:32 PM Jim Cromie wrote: > > hi Greg, Jason, DRM-folk, > > This patchset fixes the CONFIG_DRM_USE_DYNAMIC_DEBUG=y regression, > Fixes: bb2ff6c27bc9 ("drm: Disable dynamic debug as broken") > > this is v8. > Its also here: > https://github.com/jimc/linux/tree/dd-classmap-fix-8a > This patchset didnt get picked up by drm patchwork maybe its tired my stochastic renaming, sorry about that https://patchwork.freedesktop.org/project/intel-gfx/series/?ordering=-last_updated 125063fix DRM_USE_DYNAMIC_DEBUG=y regressionwarningNew23jim.cromie@gmail.comNone2023-11-01 123572fix DRM_USE_DYNAMIC_DEBUG regressionfailureNew22jim.cromie@gmail.comNone2023-09-11 113363fix DRM_USE_DYNAMIC_DEBUG regressionwarningIn progress 22jim.cromie@gmail.comNone2023-08-01 111651DRM_USE_DYNAMIC_DEBUG regressionfailureNew20jim.cromie@gmail.comNone2023-01-13 111631DRM - avoid regression in -rc, fix comment is there something missing for a DRM patchwork run ? does it kick out because theres non-drm subsystem stuff in there ? thanks > v7 had at least 2 problems: > > https://lore.kernel.org/lkml/20231101002609.3533731-1-jim.cro...@gmail.com/ > https://patchwork.freedesktop.org/series/125066/ > > 1. missing __align(8) in METATDATA macro, giving too much placement > freedom to linker, caused weird segvs following non-ptr vals, but for > builtin modules only. found by lkp-test. > > 2. the main patch changed both the dyndbg API, and the drm/drivers. > This was a flag-day annoyance, and not practical. Fix by preserving > old API macro until "later", and splitting the patch and set into 2 > sequential subsets. removal can wait. > > What was broken ? > > Booting a modular kernel with drm.debug=0x1ff, this enabled pr_debugs > only in drm itself, not the yet-to-be loaded driver + helpers. I had > tested with scripts doing lots of modprobes with dyndbg=<> options > permuting. I didn't notice that I didn't really test without them. > > The deeper cause was my design error, a violation of the K rule: > "define once, refer many times". > > DECLARE_DYNDBG_CLASSMAP defined the classmap, and was used everywhere, > re-declaring the same static classmap repeatedly. Jani Nikula actually > picked up on this at the time, but didn't scream loudly enough for > anyone to notice, I know I didn't get it then. One patchset across 2 > trees didn't help either. > > The revised classmap API "splits" it to def & ref. > > DYNDBG_CLASSMAP_DEFINE fixes & updates the busted macro, EXPORTing the > classmap instead. It gets invoked once per subsystem, by the > parent/builtin, drm.ko for DRM. > > DYNDBG_CLASSMAP_USE in drivers and helpers refer to the classmap by > name, which links the 2 modules (like __drm_debug already does). > > These 2 tell dyndbg to map "class FOO" to the defined FOO_ID, which > allows it to make those changes via >control. > > DYNDBG_CLASSMAP_PARAM*, defines the controlling kparam, and binds it > to both the _var, and the _DEFINEd classmap. So drm uses this to bind > the classmap to __drm_debug. > > It provides the common control-point for the sub-system; it is applied > to the classmaps during modprobe of both _DEFINEr and USErs. It also > enforces the relative nature of LEVEL classmaps, ie V3>V2. > > DECLARE_DYNDBG_CLASSMAP is preserved to decouple the DRM patches. > > A new struct and elf section contain the _USEs; on modprobe, these are > scanned similarly to the _DEFINEs, but follow the references to their > defining modules, find the kparam wired to the classmap, and apply its > classmap settings to the USEr. This action is what V1 missed, which > is why drivers failed to enable debug during modprobe. > > In order to recapitulate the regression scenario without involving > DRM, the patchset (v6 I think) adds test_dynamic_debug_submod, which > is a duplicate of its parent; _submod.c #defines _SUBMOD, and then > includes parent. > > This puts _DEFINE and _USE close together in the same file, for > obviousness. It also guarantees that the submod always has the same > complement of debug()s, giving consistent output from both when > classmaps are working properly, as tested when changing callsites via > both param and >control. > > To provide a turn-key selftest, the patchset also adds > ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh, pilfered > from a debug-to-trace patchset I and Lukasz Bartozik have been working > out. It starts with basic_tests, then to test 2 new parsing > delimiters, which simplify testing of the classmap functionality. > > It works nicely from virtme-ng: > > [jimc@frodo vx]$ vrun_ -- > ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh > doing: vng --verbose --name v6.9-rc5-34-g2f1ace6e1c68 \ >--user root --cwd ../.. \ >-a dynamic_debug.verbose=2 -p 4 \ >-- ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh > virtme: waiting for virtiofsd to start > ... > [3.546739] ip (260) used greatest stack depth: 12368 bytes left > [
[PATCH v8 00/35] fix CONFIG_DRM_USE_DYNAMIC_DEBUG=y regression
hi Greg, Jason, DRM-folk, This patchset fixes the CONFIG_DRM_USE_DYNAMIC_DEBUG=y regression, Fixes: bb2ff6c27bc9 ("drm: Disable dynamic debug as broken") this is v8. Its also here: https://github.com/jimc/linux/tree/dd-classmap-fix-8a v7 had at least 2 problems: https://lore.kernel.org/lkml/20231101002609.3533731-1-jim.cro...@gmail.com/ https://patchwork.freedesktop.org/series/125066/ 1. missing __align(8) in METATDATA macro, giving too much placement freedom to linker, caused weird segvs following non-ptr vals, but for builtin modules only. found by lkp-test. 2. the main patch changed both the dyndbg API, and the drm/drivers. This was a flag-day annoyance, and not practical. Fix by preserving old API macro until "later", and splitting the patch and set into 2 sequential subsets. removal can wait. What was broken ? Booting a modular kernel with drm.debug=0x1ff, this enabled pr_debugs only in drm itself, not the yet-to-be loaded driver + helpers. I had tested with scripts doing lots of modprobes with dyndbg=<> options permuting. I didn't notice that I didn't really test without them. The deeper cause was my design error, a violation of the K rule: "define once, refer many times". DECLARE_DYNDBG_CLASSMAP defined the classmap, and was used everywhere, re-declaring the same static classmap repeatedly. Jani Nikula actually picked up on this at the time, but didn't scream loudly enough for anyone to notice, I know I didn't get it then. One patchset across 2 trees didn't help either. The revised classmap API "splits" it to def & ref. DYNDBG_CLASSMAP_DEFINE fixes & updates the busted macro, EXPORTing the classmap instead. It gets invoked once per subsystem, by the parent/builtin, drm.ko for DRM. DYNDBG_CLASSMAP_USE in drivers and helpers refer to the classmap by name, which links the 2 modules (like __drm_debug already does). These 2 tell dyndbg to map "class FOO" to the defined FOO_ID, which allows it to make those changes via >control. DYNDBG_CLASSMAP_PARAM*, defines the controlling kparam, and binds it to both the _var, and the _DEFINEd classmap. So drm uses this to bind the classmap to __drm_debug. It provides the common control-point for the sub-system; it is applied to the classmaps during modprobe of both _DEFINEr and USErs. It also enforces the relative nature of LEVEL classmaps, ie V3>V2. DECLARE_DYNDBG_CLASSMAP is preserved to decouple the DRM patches. A new struct and elf section contain the _USEs; on modprobe, these are scanned similarly to the _DEFINEs, but follow the references to their defining modules, find the kparam wired to the classmap, and apply its classmap settings to the USEr. This action is what V1 missed, which is why drivers failed to enable debug during modprobe. In order to recapitulate the regression scenario without involving DRM, the patchset (v6 I think) adds test_dynamic_debug_submod, which is a duplicate of its parent; _submod.c #defines _SUBMOD, and then includes parent. This puts _DEFINE and _USE close together in the same file, for obviousness. It also guarantees that the submod always has the same complement of debug()s, giving consistent output from both when classmaps are working properly, as tested when changing callsites via both param and >control. To provide a turn-key selftest, the patchset also adds ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh, pilfered from a debug-to-trace patchset I and Lukasz Bartozik have been working out. It starts with basic_tests, then to test 2 new parsing delimiters, which simplify testing of the classmap functionality. It works nicely from virtme-ng: [jimc@frodo vx]$ vrun_ -- ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh doing: vng --verbose --name v6.9-rc5-34-g2f1ace6e1c68 \ --user root --cwd ../.. \ -a dynamic_debug.verbose=2 -p 4 \ -- ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh virtme: waiting for virtiofsd to start ... [3.546739] ip (260) used greatest stack depth: 12368 bytes left [3.609288] virtme-init: starting script test_dynamic_debug_submod not there test_dynamic_debug not there # BASIC_TESTS ... # Done on: Fri Apr 26 20:45:08 MDT 2024 [4.765751] virtme-init: script returned {0} Powering off. [4.805790] ACPI: PM: Preparing to enter system sleep state S5 [4.806223] kvm: exiting hardware virtualization [4.806564] reboot: Power down [jimc@frodo vx]$ I've been running the kernel on my x86 desktop & laptop, booting with drm.debug=0x1f, then turning it all-off after sleep 15. a few highlights from a bare-metal boot: here modprobe amdgpu; dyndbg applies last bit/class/category, and finishes init, then drm and amdgpu start logging as they execute ... [9.019696] gandalf kernel: dyndbg: query 0: "class DRM_UT_ATOMIC +p" mod:amdgpu [9.019704] gandalf kernel: dyndbg: class-ref: amdgpu.DRM_UT_ATOMIC module:amdgpu nd:4754 nc:0 nu:1 [9.020012] gandalf kernel: dyndbg: processed 1 queries, with 21