Re: [RFC PATCH] iommu/vt-d: avoid panic in __dmar_remove_one_dev_info

2019-12-17 Thread Jerry Snitselaar

On Tue Dec 17 19, Jerry Snitselaar wrote:

On Tue Dec 17 19, Jerry Snitselaar wrote:

In addition to checking for a null pointer, verify that
info does not have the value DEFER_DEVICE_DOMAIN_INFO or
DUMMY_DEVICE_DOMAIN_INFO. If info has one of those values
__dmar_remove_one_dev_info will panic when trying to access
a member of the device_domain_info struct.

  [1.464241] BUG: unable to handle kernel NULL pointer dereference at 
004e
  [1.464241] PGD 0 P4D 0
  [1.464241] Oops:  [#1] SMP PTI
  [1.464241] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
- -  - 4.18.0-160.el8.x86_64 #1
  [1.464241] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, 
BIOS P89 07/21/2019
  [1.464241] RIP: 0010:__dmar_remove_one_dev_info+0x27/0x250
  [1.464241] Code: 00 00 00 0f 1f 44 00 00 8b 05 35 ec 75 01 41 56 41 55 41 54 55 
53 85 c0 0f 84 99 01 00 00 48 85 ff 0f 84 92 01 00 00 48 89 fb <4c> 8b 67 50 48 
8b 6f 58 $
  [1.464241] RSP: :c90dfd10 EFLAGS: 00010082
  [1.464241] RAX: 0001 RBX: fffe RCX: 

  [1.464241] RDX: 0001 RSI: 0004 RDI: 
fffe
  [1.464241] RBP: 88ec7a72f368 R08: 0457 R09: 
0039
  [1.464241] R10:  R11: c90dfa58 R12: 
88ec7a0eec20
  [1.464241] R13: 88ec6fd0eab0 R14: 81eae980 R15: 

  [1.464241] FS:  () GS:88ec7a60() 
knlGS:
  [1.464241] CS:  0010 DS:  ES:  CR0: 80050033
  [1.464241] CR2: 004e CR3: 006c7900a001 C 001606b0
  [1.464241] Call Trace:
  [1.464241]  dmar_remove_one_dev_info.isra.68+0x27/0x40
  [1.464241]  intel_iommu_add_device+0x124/0x180
  [1.464241]  ? iommu_probe_device+0x40/0x40
  [1.464241]  add_iommu_group+0xa/0x20
  [1.464241]  bus_for_each_dev+0x77/0xc0
  [1.464241]  ? down_write+0xe/0x40
  [1.464241]  bus_set_iommu+0x85/0xc0
  [1.464241]  intel_iommu_init+0x4b4/0x777
  [1.464241]  ? e820__memblock_setup+0x63/0x63
  [1.464241]  ? do_early_param+0x91/0x91
  [1.464241]  pci_iommu_init+0x19/0x45
  [1.464241]  do_one_initcall+0x46/0x1c3
  [1.464241]  ? do_early_param+0x91/0x91
  [1.464241]  kernel_init_freeable+0x1af/0x258
  [1.464241]  ? rest_init+0xaa/0xaa
  [1.464241]  kernel_init+0xa/0x107
  [1.464241]  ret_from_fork+0x35/0x40
  [1.464241] Modules linked in:
  [1.464241] CR2: 004e
  [1.464241] ---[ end trace 0927d2ba8b8032b5 ]---

Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: David Woodhouse 
Cc: sta...@vger.kernel.org # v5.3+
Cc: iommu@lists.linux-foundation.org
Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one")
Signed-off-by: Jerry Snitselaar 
---
drivers/iommu/intel-iommu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0c8d81f56a30..e42a09794fa2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5163,7 +5163,8 @@ static void dmar_remove_one_dev_info(struct device *dev)

spin_lock_irqsave(&device_domain_lock, flags);
info = dev->archdata.iommu;
-   if (info)
+   if (info && info != DEFER_DEVICE_DOMAIN_INFO
+   && info != DUMMY_DEVICE_DOMAIN_INFO)
__dmar_remove_one_dev_info(info);
spin_unlock_irqrestore(&device_domain_lock, flags);
}
--
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu



Nack this.

Apparently the issue is just being seen with the kdump kernel.  I'm
wondering if it is already solved by 6c3a44ed3c55 ("iommu/vt-d: Turn
off translations at shutdown").  Testing a 5.5 build now.


And a minute later I got a response. The 5.5 kernel hits the original
panic when booting into the kdump kernel.

I need to test with this patch on 5.5, but with a test build of our
kernel with this patch the problem just moves to:

[3.742317] pci :01:00.0: Using iommu dma mapping
[3.744020] pci :01:00.1: Adding to iommu group 86
[3.746697] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0Modules 
linked in:
[3.746697] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.18.0-167.el8.iommu6.x86_64 #1
[3.746697] Hardware name: HP ProLiant DL560 Gen9/ProLiant DL560 Gen9, BIOS 
P85 07/21/2019
[3.746697] RIP: 0010:native_queued_spin_lock_slowpath+0x5d/0x1d0
[3.746697] Code: 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 
09 d0 a9 00 01 ff ff 75 47 85 c$
[3.746697] RSP: :c90f3bd8 EFLAGS: 0002
[3.746697] RAX: 0101 RBX: 0046 RCX: 7f17
[3.746697] RDX:  RSI:  RDI: 82e8a600
[3.746697] RBP: 6fd0ec00 R08: 000

Re: [RFC PATCH] iommu/vt-d: avoid panic in __dmar_remove_one_dev_info

2019-12-17 Thread Jerry Snitselaar

On Tue Dec 17 19, Jerry Snitselaar wrote:

In addition to checking for a null pointer, verify that
info does not have the value DEFER_DEVICE_DOMAIN_INFO or
DUMMY_DEVICE_DOMAIN_INFO. If info has one of those values
__dmar_remove_one_dev_info will panic when trying to access
a member of the device_domain_info struct.

   [1.464241] BUG: unable to handle kernel NULL pointer dereference at 
004e
   [1.464241] PGD 0 P4D 0
   [1.464241] Oops:  [#1] SMP PTI
   [1.464241] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
- -  - 4.18.0-160.el8.x86_64 #1
   [1.464241] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, 
BIOS P89 07/21/2019
   [1.464241] RIP: 0010:__dmar_remove_one_dev_info+0x27/0x250
   [1.464241] Code: 00 00 00 0f 1f 44 00 00 8b 05 35 ec 75 01 41 56 41 55 41 54 
55 53 85 c0 0f 84 99 01 00 00 48 85 ff 0f 84 92 01 00 00 48 89 fb <4c> 8b 67 50 
48 8b 6f 58 $
   [1.464241] RSP: :c90dfd10 EFLAGS: 00010082
   [1.464241] RAX: 0001 RBX: fffe RCX: 

   [1.464241] RDX: 0001 RSI: 0004 RDI: 
fffe
   [1.464241] RBP: 88ec7a72f368 R08: 0457 R09: 
0039
   [1.464241] R10:  R11: c90dfa58 R12: 
88ec7a0eec20
   [1.464241] R13: 88ec6fd0eab0 R14: 81eae980 R15: 

   [1.464241] FS:  () GS:88ec7a60() 
knlGS:
   [1.464241] CS:  0010 DS:  ES:  CR0: 80050033
   [1.464241] CR2: 004e CR3: 006c7900a001 C 001606b0
   [1.464241] Call Trace:
   [1.464241]  dmar_remove_one_dev_info.isra.68+0x27/0x40
   [1.464241]  intel_iommu_add_device+0x124/0x180
   [1.464241]  ? iommu_probe_device+0x40/0x40
   [1.464241]  add_iommu_group+0xa/0x20
   [1.464241]  bus_for_each_dev+0x77/0xc0
   [1.464241]  ? down_write+0xe/0x40
   [1.464241]  bus_set_iommu+0x85/0xc0
   [1.464241]  intel_iommu_init+0x4b4/0x777
   [1.464241]  ? e820__memblock_setup+0x63/0x63
   [1.464241]  ? do_early_param+0x91/0x91
   [1.464241]  pci_iommu_init+0x19/0x45
   [1.464241]  do_one_initcall+0x46/0x1c3
   [1.464241]  ? do_early_param+0x91/0x91
   [1.464241]  kernel_init_freeable+0x1af/0x258
   [1.464241]  ? rest_init+0xaa/0xaa
   [1.464241]  kernel_init+0xa/0x107
   [1.464241]  ret_from_fork+0x35/0x40
   [1.464241] Modules linked in:
   [1.464241] CR2: 004e
   [1.464241] ---[ end trace 0927d2ba8b8032b5 ]---

Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: David Woodhouse 
Cc: sta...@vger.kernel.org # v5.3+
Cc: iommu@lists.linux-foundation.org
Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one")
Signed-off-by: Jerry Snitselaar 
---
drivers/iommu/intel-iommu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0c8d81f56a30..e42a09794fa2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5163,7 +5163,8 @@ static void dmar_remove_one_dev_info(struct device *dev)

spin_lock_irqsave(&device_domain_lock, flags);
info = dev->archdata.iommu;
-   if (info)
+   if (info && info != DEFER_DEVICE_DOMAIN_INFO
+   && info != DUMMY_DEVICE_DOMAIN_INFO)
__dmar_remove_one_dev_info(info);
spin_unlock_irqrestore(&device_domain_lock, flags);
}
--
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu



Nack this.

Apparently the issue is just being seen with the kdump kernel.  I'm
wondering if it is already solved by 6c3a44ed3c55 ("iommu/vt-d: Turn
off translations at shutdown").  Testing a 5.5 build now.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/vt-d: avoid panic in __dmar_remove_one_dev_info

2019-12-17 Thread Jerry Snitselaar
On Tue, Dec 17, 2019 at 10:56 AM Jerry Snitselaar  wrote:
>
> In addition to checking for a null pointer, verify that
> info does not have the value DEFER_DEVICE_DOMAIN_INFO or
> DUMMY_DEVICE_DOMAIN_INFO. If info has one of those values
> __dmar_remove_one_dev_info will panic when trying to access
> a member of the device_domain_info struct.
>
> [1.464241] BUG: unable to handle kernel NULL pointer dereference at 
> 004e
> [1.464241] PGD 0 P4D 0
> [1.464241] Oops:  [#1] SMP PTI
> [1.464241] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
> - -  - 4.18.0-160.el8.x86_64 #1
> [1.464241] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, 
> BIOS P89 07/21/2019
> [1.464241] RIP: 0010:__dmar_remove_one_dev_info+0x27/0x250
> [1.464241] Code: 00 00 00 0f 1f 44 00 00 8b 05 35 ec 75 01 41 56 41 
> 55 41 54 55 53 85 c0 0f 84 99 01 00 00 48 85 ff 0f 84 92 01 00 00 48 89 fb 
> <4c> 8b 67 50 48 8b 6f 58 $
> [1.464241] RSP: :c90dfd10 EFLAGS: 00010082
> [1.464241] RAX: 0001 RBX: fffe RCX: 
> 
> [1.464241] RDX: 0001 RSI: 0004 RDI: 
> fffe
> [1.464241] RBP: 88ec7a72f368 R08: 0457 R09: 
> 0039
> [1.464241] R10:  R11: c90dfa58 R12: 
> 88ec7a0eec20
> [1.464241] R13: 88ec6fd0eab0 R14: 81eae980 R15: 
> 
> [1.464241] FS:  () GS:88ec7a60() 
> knlGS:
> [1.464241] CS:  0010 DS:  ES:  CR0: 80050033
> [1.464241] CR2: 004e CR3: 006c7900a001 C 
> 001606b0
> [1.464241] Call Trace:
> [1.464241]  dmar_remove_one_dev_info.isra.68+0x27/0x40
> [1.464241]  intel_iommu_add_device+0x124/0x180
> [1.464241]  ? iommu_probe_device+0x40/0x40
> [1.464241]  add_iommu_group+0xa/0x20
> [1.464241]  bus_for_each_dev+0x77/0xc0
> [1.464241]  ? down_write+0xe/0x40
> [1.464241]  bus_set_iommu+0x85/0xc0
> [1.464241]  intel_iommu_init+0x4b4/0x777
> [1.464241]  ? e820__memblock_setup+0x63/0x63
> [1.464241]  ? do_early_param+0x91/0x91
> [1.464241]  pci_iommu_init+0x19/0x45
> [1.464241]  do_one_initcall+0x46/0x1c3
> [1.464241]  ? do_early_param+0x91/0x91
> [1.464241]  kernel_init_freeable+0x1af/0x258
> [1.464241]  ? rest_init+0xaa/0xaa
> [1.464241]  kernel_init+0xa/0x107
> [1.464241]  ret_from_fork+0x35/0x40
> [1.464241] Modules linked in:
> [1.464241] CR2: 004e
> [1.464241] ---[ end trace 0927d2ba8b8032b5 ]---
>
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Cc: David Woodhouse 
> Cc: sta...@vger.kernel.org # v5.3+
> Cc: iommu@lists.linux-foundation.org
> Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one")
> Signed-off-by: Jerry Snitselaar 
> ---
>  drivers/iommu/intel-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0c8d81f56a30..e42a09794fa2 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5163,7 +5163,8 @@ static void dmar_remove_one_dev_info(struct device *dev)
>
> spin_lock_irqsave(&device_domain_lock, flags);
> info = dev->archdata.iommu;
> -   if (info)
> +   if (info && info != DEFER_DEVICE_DOMAIN_INFO
> +   && info != DUMMY_DEVICE_DOMAIN_INFO)
> __dmar_remove_one_dev_info(info);
> spin_unlock_irqrestore(&device_domain_lock, flags);
>  }
> --
> 2.24.0
>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>

I'm not positive that the DUMMY_DEVICE_DOMAIN_INFO check is needed.
It seemed like there were checks for that most places before
dmar_remove_one_dev_info
would be called, but I wasn't certain.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH] iommu/vt-d: avoid panic in __dmar_remove_one_dev_info

2019-12-17 Thread Jerry Snitselaar
In addition to checking for a null pointer, verify that
info does not have the value DEFER_DEVICE_DOMAIN_INFO or
DUMMY_DEVICE_DOMAIN_INFO. If info has one of those values
__dmar_remove_one_dev_info will panic when trying to access
a member of the device_domain_info struct.

[1.464241] BUG: unable to handle kernel NULL pointer dereference at 
004e
[1.464241] PGD 0 P4D 0
[1.464241] Oops:  [#1] SMP PTI
[1.464241] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
- -  - 4.18.0-160.el8.x86_64 #1
[1.464241] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, 
BIOS P89 07/21/2019
[1.464241] RIP: 0010:__dmar_remove_one_dev_info+0x27/0x250
[1.464241] Code: 00 00 00 0f 1f 44 00 00 8b 05 35 ec 75 01 41 56 41 55 
41 54 55 53 85 c0 0f 84 99 01 00 00 48 85 ff 0f 84 92 01 00 00 48 89 fb <4c> 8b 
67 50 48 8b 6f 58 $
[1.464241] RSP: :c90dfd10 EFLAGS: 00010082
[1.464241] RAX: 0001 RBX: fffe RCX: 

[1.464241] RDX: 0001 RSI: 0004 RDI: 
fffe
[1.464241] RBP: 88ec7a72f368 R08: 0457 R09: 
0039
[1.464241] R10:  R11: c90dfa58 R12: 
88ec7a0eec20
[1.464241] R13: 88ec6fd0eab0 R14: 81eae980 R15: 

[1.464241] FS:  () GS:88ec7a60() 
knlGS:
[1.464241] CS:  0010 DS:  ES:  CR0: 80050033
[1.464241] CR2: 004e CR3: 006c7900a001 C 
001606b0
[1.464241] Call Trace:
[1.464241]  dmar_remove_one_dev_info.isra.68+0x27/0x40
[1.464241]  intel_iommu_add_device+0x124/0x180
[1.464241]  ? iommu_probe_device+0x40/0x40
[1.464241]  add_iommu_group+0xa/0x20
[1.464241]  bus_for_each_dev+0x77/0xc0
[1.464241]  ? down_write+0xe/0x40
[1.464241]  bus_set_iommu+0x85/0xc0
[1.464241]  intel_iommu_init+0x4b4/0x777
[1.464241]  ? e820__memblock_setup+0x63/0x63
[1.464241]  ? do_early_param+0x91/0x91
[1.464241]  pci_iommu_init+0x19/0x45
[1.464241]  do_one_initcall+0x46/0x1c3
[1.464241]  ? do_early_param+0x91/0x91
[1.464241]  kernel_init_freeable+0x1af/0x258
[1.464241]  ? rest_init+0xaa/0xaa
[1.464241]  kernel_init+0xa/0x107
[1.464241]  ret_from_fork+0x35/0x40
[1.464241] Modules linked in:
[1.464241] CR2: 004e
[1.464241] ---[ end trace 0927d2ba8b8032b5 ]---

Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: David Woodhouse 
Cc: sta...@vger.kernel.org # v5.3+
Cc: iommu@lists.linux-foundation.org
Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one")
Signed-off-by: Jerry Snitselaar 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0c8d81f56a30..e42a09794fa2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5163,7 +5163,8 @@ static void dmar_remove_one_dev_info(struct device *dev)
 
spin_lock_irqsave(&device_domain_lock, flags);
info = dev->archdata.iommu;
-   if (info)
+   if (info && info != DEFER_DEVICE_DOMAIN_INFO
+   && info != DUMMY_DEVICE_DOMAIN_INFO)
__dmar_remove_one_dev_info(info);
spin_unlock_irqrestore(&device_domain_lock, flags);
 }
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu