Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
On 2022-07-01 08:19, Baolu Lu wrote: On 2022/6/29 21:03, Robin Murphy wrote: On 2019-06-12 01:28, Lu Baolu wrote: The drhd and device scope list should be iterated with the iommu global lock held. Otherwise, a suspicious RCU usage message will be displayed. [ 3.695886] = [ 3.695917] WARNING: suspicious RCU usage [ 3.695950] 5.2.0-rc2+ #2467 Not tainted [ 3.695981] - [ 3.696014] drivers/iommu/intel-iommu.c:4569 suspicious rcu_dereference_check() usage! [ 3.696069] other info that might help us debug this: [ 3.696126] rcu_scheduler_active = 2, debug_locks = 1 [ 3.696173] no locks held by swapper/0/1. [ 3.696204] stack backtrace: [ 3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467 [ 3.696370] Call Trace: [ 3.696404] dump_stack+0x85/0xcb [ 3.696441] intel_iommu_init+0x128c/0x13ce [ 3.696478] ? kmem_cache_free+0x16b/0x2c0 [ 3.696516] ? __fput+0x14b/0x270 [ 3.696550] ? __call_rcu+0xb7/0x300 [ 3.696583] ? get_max_files+0x10/0x10 [ 3.696631] ? set_debug_rodata+0x11/0x11 [ 3.696668] ? e820__memblock_setup+0x60/0x60 [ 3.696704] ? pci_iommu_init+0x16/0x3f [ 3.696737] ? set_debug_rodata+0x11/0x11 [ 3.696770] pci_iommu_init+0x16/0x3f [ 3.696805] do_one_initcall+0x5d/0x2e4 [ 3.696844] ? set_debug_rodata+0x11/0x11 [ 3.696880] ? rcu_read_lock_sched_held+0x6b/0x80 [ 3.696924] kernel_init_freeable+0x1f0/0x27c [ 3.696961] ? rest_init+0x260/0x260 [ 3.696997] kernel_init+0xa/0x110 [ 3.697028] ret_from_fork+0x3a/0x50 Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space devices") Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 19c4c387a3f6..84e650c6a46d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void) cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL, intel_iommu_cpu_dead); + down_read(_global_lock); if (probe_acpi_namespace_devices()) pr_warn("ACPI name space devices didn't probe correctly\n"); + up_read(_global_lock); Doing a bit of archaeology here, is this actually broken? If any ANDD entries exist, we'd end up doing: down_read(_global_lock) probe_acpi_namespace_devices() -> iommu_probe_device() -> iommu_create_device_direct_mappings() -> iommu_get_resv_regions() -> intel_iommu_get_resv_regions() -> down_read(_global_lock) I'm wondering whether this might explain why my bus_set_iommu series prevented Baolu's machine from booting, since "iommu: Move bus setup to IOMMU device registration" creates the same condition where we end up in get_resv_regions (via bus_iommu_probe() this time) from the same task that already holds dmar_global_lock. Of course that leaves me wondering how it *did* manage to boot OK on my Xeon box, but maybe there's a config difference or dumb luck at play? This is really problematic. Where does the latest bus_set_iommu series locate? I'd like to take a closer look at what happened here. Perhaps two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20 these days. I've prepared an up-to-date series here: https://gitlab.arm.com/linux-arm/linux-rm/-/tree/bus-set-iommu-v3 but I've been hesitant to post it without trying to make *some* progress on your breakage. I think last time I was just testing with x86_64_defconfig, so I'll double-check it with lockdep this afternoon. Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
On 2022/6/29 21:03, Robin Murphy wrote: On 2019-06-12 01:28, Lu Baolu wrote: The drhd and device scope list should be iterated with the iommu global lock held. Otherwise, a suspicious RCU usage message will be displayed. [ 3.695886] = [ 3.695917] WARNING: suspicious RCU usage [ 3.695950] 5.2.0-rc2+ #2467 Not tainted [ 3.695981] - [ 3.696014] drivers/iommu/intel-iommu.c:4569 suspicious rcu_dereference_check() usage! [ 3.696069] other info that might help us debug this: [ 3.696126] rcu_scheduler_active = 2, debug_locks = 1 [ 3.696173] no locks held by swapper/0/1. [ 3.696204] stack backtrace: [ 3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467 [ 3.696370] Call Trace: [ 3.696404] dump_stack+0x85/0xcb [ 3.696441] intel_iommu_init+0x128c/0x13ce [ 3.696478] ? kmem_cache_free+0x16b/0x2c0 [ 3.696516] ? __fput+0x14b/0x270 [ 3.696550] ? __call_rcu+0xb7/0x300 [ 3.696583] ? get_max_files+0x10/0x10 [ 3.696631] ? set_debug_rodata+0x11/0x11 [ 3.696668] ? e820__memblock_setup+0x60/0x60 [ 3.696704] ? pci_iommu_init+0x16/0x3f [ 3.696737] ? set_debug_rodata+0x11/0x11 [ 3.696770] pci_iommu_init+0x16/0x3f [ 3.696805] do_one_initcall+0x5d/0x2e4 [ 3.696844] ? set_debug_rodata+0x11/0x11 [ 3.696880] ? rcu_read_lock_sched_held+0x6b/0x80 [ 3.696924] kernel_init_freeable+0x1f0/0x27c [ 3.696961] ? rest_init+0x260/0x260 [ 3.696997] kernel_init+0xa/0x110 [ 3.697028] ret_from_fork+0x3a/0x50 Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space devices") Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 19c4c387a3f6..84e650c6a46d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void) cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL, intel_iommu_cpu_dead); + down_read(_global_lock); if (probe_acpi_namespace_devices()) pr_warn("ACPI name space devices didn't probe correctly\n"); + up_read(_global_lock); Doing a bit of archaeology here, is this actually broken? If any ANDD entries exist, we'd end up doing: down_read(_global_lock) probe_acpi_namespace_devices() -> iommu_probe_device() -> iommu_create_device_direct_mappings() -> iommu_get_resv_regions() -> intel_iommu_get_resv_regions() -> down_read(_global_lock) I'm wondering whether this might explain why my bus_set_iommu series prevented Baolu's machine from booting, since "iommu: Move bus setup to IOMMU device registration" creates the same condition where we end up in get_resv_regions (via bus_iommu_probe() this time) from the same task that already holds dmar_global_lock. Of course that leaves me wondering how it *did* manage to boot OK on my Xeon box, but maybe there's a config difference or dumb luck at play? This is really problematic. Where does the latest bus_set_iommu series locate? I'd like to take a closer look at what happened here. Perhaps two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20 these days. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
On 2019-06-12 01:28, Lu Baolu wrote: The drhd and device scope list should be iterated with the iommu global lock held. Otherwise, a suspicious RCU usage message will be displayed. [3.695886] = [3.695917] WARNING: suspicious RCU usage [3.695950] 5.2.0-rc2+ #2467 Not tainted [3.695981] - [3.696014] drivers/iommu/intel-iommu.c:4569 suspicious rcu_dereference_check() usage! [3.696069] other info that might help us debug this: [3.696126] rcu_scheduler_active = 2, debug_locks = 1 [3.696173] no locks held by swapper/0/1. [3.696204] stack backtrace: [3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467 [3.696370] Call Trace: [3.696404] dump_stack+0x85/0xcb [3.696441] intel_iommu_init+0x128c/0x13ce [3.696478] ? kmem_cache_free+0x16b/0x2c0 [3.696516] ? __fput+0x14b/0x270 [3.696550] ? __call_rcu+0xb7/0x300 [3.696583] ? get_max_files+0x10/0x10 [3.696631] ? set_debug_rodata+0x11/0x11 [3.696668] ? e820__memblock_setup+0x60/0x60 [3.696704] ? pci_iommu_init+0x16/0x3f [3.696737] ? set_debug_rodata+0x11/0x11 [3.696770] pci_iommu_init+0x16/0x3f [3.696805] do_one_initcall+0x5d/0x2e4 [3.696844] ? set_debug_rodata+0x11/0x11 [3.696880] ? rcu_read_lock_sched_held+0x6b/0x80 [3.696924] kernel_init_freeable+0x1f0/0x27c [3.696961] ? rest_init+0x260/0x260 [3.696997] kernel_init+0xa/0x110 [3.697028] ret_from_fork+0x3a/0x50 Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space devices") Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 19c4c387a3f6..84e650c6a46d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void) cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL, intel_iommu_cpu_dead); + down_read(_global_lock); if (probe_acpi_namespace_devices()) pr_warn("ACPI name space devices didn't probe correctly\n"); + up_read(_global_lock); Doing a bit of archaeology here, is this actually broken? If any ANDD entries exist, we'd end up doing: down_read(_global_lock) probe_acpi_namespace_devices() -> iommu_probe_device() -> iommu_create_device_direct_mappings() -> iommu_get_resv_regions() -> intel_iommu_get_resv_regions() -> down_read(_global_lock) I'm wondering whether this might explain why my bus_set_iommu series prevented Baolu's machine from booting, since "iommu: Move bus setup to IOMMU device registration" creates the same condition where we end up in get_resv_regions (via bus_iommu_probe() this time) from the same task that already holds dmar_global_lock. Of course that leaves me wondering how it *did* manage to boot OK on my Xeon box, but maybe there's a config difference or dumb luck at play? Robin. /* Finally, we enable the DMA remapping hardware. */ for_each_iommu(iommu, drhd) { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu