Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()

2022-07-01 Thread Robin Murphy

On 2022-07-01 08:19, Baolu Lu wrote:

On 2022/6/29 21:03, Robin Murphy wrote:

On 2019-06-12 01:28, Lu Baolu wrote:

The drhd and device scope list should be iterated with the
iommu global lock held. Otherwise, a suspicious RCU usage
message will be displayed.

[    3.695886] =
[    3.695917] WARNING: suspicious RCU usage
[    3.695950] 5.2.0-rc2+ #2467 Not tainted
[    3.695981] -
[    3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
rcu_dereference_check() usage!

[    3.696069]
    other info that might help us debug this:

[    3.696126]
    rcu_scheduler_active = 2, debug_locks = 1
[    3.696173] no locks held by swapper/0/1.
[    3.696204]
    stack backtrace:
[    3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ 
#2467

[    3.696370] Call Trace:
[    3.696404]  dump_stack+0x85/0xcb
[    3.696441]  intel_iommu_init+0x128c/0x13ce
[    3.696478]  ? kmem_cache_free+0x16b/0x2c0
[    3.696516]  ? __fput+0x14b/0x270
[    3.696550]  ? __call_rcu+0xb7/0x300
[    3.696583]  ? get_max_files+0x10/0x10
[    3.696631]  ? set_debug_rodata+0x11/0x11
[    3.696668]  ? e820__memblock_setup+0x60/0x60
[    3.696704]  ? pci_iommu_init+0x16/0x3f
[    3.696737]  ? set_debug_rodata+0x11/0x11
[    3.696770]  pci_iommu_init+0x16/0x3f
[    3.696805]  do_one_initcall+0x5d/0x2e4
[    3.696844]  ? set_debug_rodata+0x11/0x11
[    3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
[    3.696924]  kernel_init_freeable+0x1f0/0x27c
[    3.696961]  ? rest_init+0x260/0x260
[    3.696997]  kernel_init+0xa/0x110
[    3.697028]  ret_from_fork+0x3a/0x50

Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space 
devices")

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 19c4c387a3f6..84e650c6a46d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
  cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", 
NULL,

    intel_iommu_cpu_dead);
+    down_read(_global_lock);
  if (probe_acpi_namespace_devices())
  pr_warn("ACPI name space devices didn't probe correctly\n");
+    up_read(_global_lock);


Doing a bit of archaeology here, is this actually broken? If any ANDD 
entries exist, we'd end up doing:


   down_read(_global_lock)
   probe_acpi_namespace_devices()
   -> iommu_probe_device()
  -> iommu_create_device_direct_mappings()
 -> iommu_get_resv_regions()
    -> intel_iommu_get_resv_regions()
   -> down_read(_global_lock)

I'm wondering whether this might explain why my bus_set_iommu series 
prevented Baolu's machine from booting, since "iommu: Move bus setup 
to IOMMU device registration" creates the same condition where we end 
up in get_resv_regions (via bus_iommu_probe() this time) from the same 
task that already holds dmar_global_lock. Of course that leaves me 
wondering how it *did* manage to boot OK on my Xeon box, but maybe 
there's a config difference or dumb luck at play?


This is really problematic. Where does the latest bus_set_iommu series
locate? I'd like to take a closer look at what happened here. Perhaps
two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
these days.


I've prepared an up-to-date series here:

https://gitlab.arm.com/linux-arm/linux-rm/-/tree/bus-set-iommu-v3

but I've been hesitant to post it without trying to make *some* progress 
on your breakage. I think last time I was just testing with 
x86_64_defconfig, so I'll double-check it with lockdep this afternoon.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()

2022-07-01 Thread Baolu Lu

On 2022/6/29 21:03, Robin Murphy wrote:

On 2019-06-12 01:28, Lu Baolu wrote:

The drhd and device scope list should be iterated with the
iommu global lock held. Otherwise, a suspicious RCU usage
message will be displayed.

[    3.695886] =
[    3.695917] WARNING: suspicious RCU usage
[    3.695950] 5.2.0-rc2+ #2467 Not tainted
[    3.695981] -
[    3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
rcu_dereference_check() usage!

[    3.696069]
    other info that might help us debug this:

[    3.696126]
    rcu_scheduler_active = 2, debug_locks = 1
[    3.696173] no locks held by swapper/0/1.
[    3.696204]
    stack backtrace:
[    3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467
[    3.696370] Call Trace:
[    3.696404]  dump_stack+0x85/0xcb
[    3.696441]  intel_iommu_init+0x128c/0x13ce
[    3.696478]  ? kmem_cache_free+0x16b/0x2c0
[    3.696516]  ? __fput+0x14b/0x270
[    3.696550]  ? __call_rcu+0xb7/0x300
[    3.696583]  ? get_max_files+0x10/0x10
[    3.696631]  ? set_debug_rodata+0x11/0x11
[    3.696668]  ? e820__memblock_setup+0x60/0x60
[    3.696704]  ? pci_iommu_init+0x16/0x3f
[    3.696737]  ? set_debug_rodata+0x11/0x11
[    3.696770]  pci_iommu_init+0x16/0x3f
[    3.696805]  do_one_initcall+0x5d/0x2e4
[    3.696844]  ? set_debug_rodata+0x11/0x11
[    3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
[    3.696924]  kernel_init_freeable+0x1f0/0x27c
[    3.696961]  ? rest_init+0x260/0x260
[    3.696997]  kernel_init+0xa/0x110
[    3.697028]  ret_from_fork+0x3a/0x50

Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space 
devices")

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 19c4c387a3f6..84e650c6a46d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
  cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
    intel_iommu_cpu_dead);
+    down_read(_global_lock);
  if (probe_acpi_namespace_devices())
  pr_warn("ACPI name space devices didn't probe correctly\n");
+    up_read(_global_lock);


Doing a bit of archaeology here, is this actually broken? If any ANDD 
entries exist, we'd end up doing:


   down_read(_global_lock)
   probe_acpi_namespace_devices()
   -> iommu_probe_device()
  -> iommu_create_device_direct_mappings()
     -> iommu_get_resv_regions()
    -> intel_iommu_get_resv_regions()
   -> down_read(_global_lock)

I'm wondering whether this might explain why my bus_set_iommu series 
prevented Baolu's machine from booting, since "iommu: Move bus setup to 
IOMMU device registration" creates the same condition where we end up in 
get_resv_regions (via bus_iommu_probe() this time) from the same task 
that already holds dmar_global_lock. Of course that leaves me wondering 
how it *did* manage to boot OK on my Xeon box, but maybe there's a 
config difference or dumb luck at play?


This is really problematic. Where does the latest bus_set_iommu series
locate? I'd like to take a closer look at what happened here. Perhaps
two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
these days.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()

2022-06-29 Thread Robin Murphy

On 2019-06-12 01:28, Lu Baolu wrote:

The drhd and device scope list should be iterated with the
iommu global lock held. Otherwise, a suspicious RCU usage
message will be displayed.

[3.695886] =
[3.695917] WARNING: suspicious RCU usage
[3.695950] 5.2.0-rc2+ #2467 Not tainted
[3.695981] -
[3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
rcu_dereference_check() usage!
[3.696069]
other info that might help us debug this:

[3.696126]
rcu_scheduler_active = 2, debug_locks = 1
[3.696173] no locks held by swapper/0/1.
[3.696204]
stack backtrace:
[3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467
[3.696370] Call Trace:
[3.696404]  dump_stack+0x85/0xcb
[3.696441]  intel_iommu_init+0x128c/0x13ce
[3.696478]  ? kmem_cache_free+0x16b/0x2c0
[3.696516]  ? __fput+0x14b/0x270
[3.696550]  ? __call_rcu+0xb7/0x300
[3.696583]  ? get_max_files+0x10/0x10
[3.696631]  ? set_debug_rodata+0x11/0x11
[3.696668]  ? e820__memblock_setup+0x60/0x60
[3.696704]  ? pci_iommu_init+0x16/0x3f
[3.696737]  ? set_debug_rodata+0x11/0x11
[3.696770]  pci_iommu_init+0x16/0x3f
[3.696805]  do_one_initcall+0x5d/0x2e4
[3.696844]  ? set_debug_rodata+0x11/0x11
[3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
[3.696924]  kernel_init_freeable+0x1f0/0x27c
[3.696961]  ? rest_init+0x260/0x260
[3.696997]  kernel_init+0xa/0x110
[3.697028]  ret_from_fork+0x3a/0x50

Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space devices")
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 19c4c387a3f6..84e650c6a46d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
  intel_iommu_cpu_dead);
  
+	down_read(_global_lock);

if (probe_acpi_namespace_devices())
pr_warn("ACPI name space devices didn't probe correctly\n");
+   up_read(_global_lock);


Doing a bit of archaeology here, is this actually broken? If any ANDD 
entries exist, we'd end up doing:


  down_read(_global_lock)
  probe_acpi_namespace_devices()
  -> iommu_probe_device()
 -> iommu_create_device_direct_mappings()
-> iommu_get_resv_regions()
   -> intel_iommu_get_resv_regions()
  -> down_read(_global_lock)

I'm wondering whether this might explain why my bus_set_iommu series 
prevented Baolu's machine from booting, since "iommu: Move bus setup to 
IOMMU device registration" creates the same condition where we end up in 
get_resv_regions (via bus_iommu_probe() this time) from the same task 
that already holds dmar_global_lock. Of course that leaves me wondering 
how it *did* manage to boot OK on my Xeon box, but maybe there's a 
config difference or dumb luck at play?


Robin.

  
  	/* Finally, we enable the DMA remapping hardware. */

for_each_iommu(iommu, drhd) {

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu