Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-07-06 Thread Joerg Roedel
On Fri, Jun 24, 2022 at 02:12:28PM +0800, Baolu Lu wrote:
> It makes sense as far as I am aware. By putting IOMMUs in pass-through
> mode, there will be no run-time costs and things could be simplified a
> lot.
> 
> Besides the refactoring efforts, we still need this quick fix so that
> the fix could be propagated to various stable and vendors' downstream trees.

Patch is applied now for 5.19.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-06-24 Thread Baolu Lu

Hi Joerg,

On 2022/6/24 13:45, Joerg Roedel wrote:

Hi Baolu,

On Wed, May 25, 2022 at 09:40:26AM +0800, Baolu Lu wrote:

How do you like it? If you agree, I can queue it in my next pull request
for fixes.


Would it help to tie DMAR and IOMMU components together, so that
selecting DMAR for IRQ remapping also selects IOMMU? The IOMMU can be in
PT mode and I think it would simplify a lot of things.


It makes sense as far as I am aware. By putting IOMMUs in pass-through
mode, there will be no run-time costs and things could be simplified a
lot.

Besides the refactoring efforts, we still need this quick fix so that
the fix could be propagated to various stable and vendors' downstream trees.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-06-23 Thread Joerg Roedel
Hi Baolu,

On Wed, May 25, 2022 at 09:40:26AM +0800, Baolu Lu wrote:
> How do you like it? If you agree, I can queue it in my next pull request
> for fixes.

Would it help to tie DMAR and IOMMU components together, so that
selecting DMAR for IRQ remapping also selects IOMMU? The IOMMU can be in
PT mode and I think it would simplify a lot of things.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-05-24 Thread Baolu Lu

Hi Joerg,

On 2022/5/21 08:21, Yian Chen wrote:

Notifier calling chain uses priority to determine the execution
order of the notifiers or listeners registered to the chain.
PCI bus device hot add utilizes the notification mechanism.

The current code sets low priority (INT_MIN) to Intel
dmar_pci_bus_notifier and postpones DMAR decoding after adding
new device into IOMMU. The result is that struct device pointer
cannot be found in DRHD search for the new device's DMAR/IOMMU.
Subsequently, the device is put under the "catch-all" IOMMU
instead of the correct one. This could cause system hang when
device TLB invalidation is sent to the wrong IOMMU. Invalidation
timeout error and hard lockup have been observed and data
inconsistency/crush may occur as well.

This patch fixes the issue by setting a positive priority(1) for
dmar_pci_bus_notifier while the priority of IOMMU bus notifier
uses the default value(0), therefore DMAR decoding will be in
advance of DRHD search for a new device to find the correct IOMMU.

Following is a 2-step example that triggers the bug by simulating
PCI device hot add behavior in Intel Sapphire Rapids server.

echo 1 > /sys/bus/pci/devices/:6a:01.0/remove
echo 1 > /sys/bus/pci/rescan

Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
Cc: sta...@vger.kernel.org # v3.15+
Reported-by: Zhang, Bernice 
Signed-off-by: Jacob Pan 
Signed-off-by: Yian Chen 
---
This is a quick fix for the bug reported. Intel internally evaluated
another redesigned solution that eliminates dmar pci bus notifier to
simplify the workflow of pci hotplug and improve its runtime efficiency.

While considering the fix could apply to downstream and the complexity
of pci hotplug workflow change may significantly increase the
engineering effort to downstream the patch, the choice is to submit this
simple patch to help the deployment of this bug fix.


Yian has been worked on using IOMMU bus notifier to solve this problem.
It turns out that due to the following facts, we need to refactor the 
IOMMU core and Intel DMAR Code:


- Interrupt remapping also requires Intel DMAR code. Therefore, when
  IOMMU is not enabled, the PCI bus notifier in DMAR is still required.
- The IOMMU PCI bus notifier calls .probe_device() which lacks of the
  information about hot-add or static boot.

Considering that the problem described here is a serious problem,
because users can easily damage the system by writing sysfs files on
some platforms, we need a quick fix for both upstream and stable
kernels. The refactoring code will be discussed in a separate series.

How do you like it? If you agree, I can queue it in my next pull request
for fixes.

Best regards,
baolu


---

  drivers/iommu/intel/dmar.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 4de960834a1b..497c5bd95caf 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -383,7 +383,7 @@ static int dmar_pci_bus_notifier(struct notifier_block *nb,
  
  static struct notifier_block dmar_pci_bus_nb = {

.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+   .priority = 1,
  };
  
  static struct dmar_drhd_unit *


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-14 Thread Jacob Pan
Hi Lu,

On Fri, 14 Jan 2022 11:12:45 +0800, Lu Baolu 
wrote:

> On 1/14/22 11:11 AM, Jacob Pan wrote:
> > On Fri, 14 Jan 2022 08:58:53 +0800, Lu Baolu
> > wrote:
> >   
> >> Hi Jacob,
> >>
> >> On 1/13/22 9:23 PM, Jacob Pan wrote:  
> >>> During PCI bus rescan, adding new devices involve two notifiers.
> >>> 1. dmar_pci_bus_notifier()
> >>> 2. iommu_bus_notifier()
> >>> The current code sets #1 as low priority (INT_MIN) which resulted in
> >>> #2 being invoked first. The result is that struct device pointer
> >>> cannot be found in DRHD search for the new device's DMAR/IOMMU.
> >>> Subsequently, the device is put under the "catch-all" IOMMU instead
> >>> of the correct one.
> >>>
> >>> This could cause system hang when device TLB invalidation is sent to
> >>> the wrong IOMMU. Invalidation timeout error or hard lockup can be
> >>> observed.
> >>>
> >>> This patch fixes the issue by setting a higher priority for
> >>> dmar_pci_bus_notifier. DRHD search for a new device will find the
> >>> correct IOMMU.
> >>>
> >>> Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
> >>> Reported-by: Zhang, Bernice
> >>> Signed-off-by: Jacob Pan
> >>> ---
> >>>drivers/iommu/intel/dmar.c | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> >>> index 915bff76fe96..5d07e5b89c2e 100644
> >>> --- a/drivers/iommu/intel/dmar.c
> >>> +++ b/drivers/iommu/intel/dmar.c
> >>> @@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct
> >>> notifier_block *nb,
> >>>static struct notifier_block dmar_pci_bus_nb = {
> >>>   .notifier_call = dmar_pci_bus_notifier,
> >>> - .priority = INT_MIN,
> >>> + .priority = INT_MAX,
> >>>};
> >>>
> >>>static struct dmar_drhd_unit *
> >>>  
> >> Nice catch! dmar_pci_bus_add_dev() should take place*before*
> >> iommu_probe_device(). This change enforces this with a higher notifier
> >> priority for dmar callback.
> >>
> >> Comparably, dmar_pci_bus_del_dev() should take place*after*
> >> iommu_release_device(). Perhaps we can use two notifiers, one for
> >> ADD_DEVICE (with .priority=INT_MAX) and the other for REMOVE_DEVICE
> >> (with .priority=INT_MIN)?
> >>  
> > Since device_to_iommu() lookup in intel_iommu_release_device() only
> > checks if device is under "an" IOMMU, not "the" IOMMU. Then the remove
> > path order is not needed, right?
> > 
> > I know this is not robust, but having so many notifiers with implicit
> > priority is not clean either.
> > 
> > Perhaps, we should have explicit priority defined around iommu_bus
> > notifier? i.e.
> > 
> > @@ -1841,6 +1841,7 @@ static int iommu_bus_init(struct bus_type *bus,
> > const struct iommu_ops *ops) return -ENOMEM;
> >  nb->notifier_call = iommu_bus_notifier;
> > 
> > +   nb->priority = IOMMU_BUS_NOTIFY_PRIORITY;
> > 
> > 
> >   static struct notifier_block dmar_pci_bus_add_nb = {
> >  .notifier_call = dmar_pci_bus_notifier,
> > -   .priority = INT_MIN,
> > +   .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,
> >   };
> > 
> >   static struct notifier_block dmar_pci_bus_remove_nb = {
> >  .notifier_call = dmar_pci_bus_notifier,
> > -   .priority = INT_MIN,
> > +   .priority = IOMMU_BUS_NOTIFY_PRIORITY - 1,
> >   };  
> 
> IOMMU_BUS_NOTIFY_PRIORITY by default is 0. So you can simply use 1 and
> -1? Adding a comment around it will be helpful.
> 
Yeah, I will add comment.


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-13 Thread Lu Baolu

On 1/14/22 11:11 AM, Jacob Pan wrote:

On Fri, 14 Jan 2022 08:58:53 +0800, Lu Baolu
wrote:


Hi Jacob,

On 1/13/22 9:23 PM, Jacob Pan wrote:

During PCI bus rescan, adding new devices involve two notifiers.
1. dmar_pci_bus_notifier()
2. iommu_bus_notifier()
The current code sets #1 as low priority (INT_MIN) which resulted in #2
being invoked first. The result is that struct device pointer cannot be
found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
device is put under the "catch-all" IOMMU instead of the correct one.

This could cause system hang when device TLB invalidation is sent to the
wrong IOMMU. Invalidation timeout error or hard lockup can be observed.

This patch fixes the issue by setting a higher priority for
dmar_pci_bus_notifier. DRHD search for a new device will find the
correct IOMMU.

Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
Reported-by: Zhang, Bernice
Signed-off-by: Jacob Pan
---
   drivers/iommu/intel/dmar.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 915bff76fe96..5d07e5b89c2e 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct
notifier_block *nb,
   static struct notifier_block dmar_pci_bus_nb = {
.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+   .priority = INT_MAX,
   };
   
   static struct dmar_drhd_unit *
   

Nice catch! dmar_pci_bus_add_dev() should take place*before*
iommu_probe_device(). This change enforces this with a higher notifier
priority for dmar callback.

Comparably, dmar_pci_bus_del_dev() should take place*after*
iommu_release_device(). Perhaps we can use two notifiers, one for
ADD_DEVICE (with .priority=INT_MAX) and the other for REMOVE_DEVICE
(with .priority=INT_MIN)?


Since device_to_iommu() lookup in intel_iommu_release_device() only
checks if device is under "an" IOMMU, not "the" IOMMU. Then the remove path
order is not needed, right?

I know this is not robust, but having so many notifiers with implicit
priority is not clean either.

Perhaps, we should have explicit priority defined around iommu_bus
notifier? i.e.

@@ -1841,6 +1841,7 @@ static int iommu_bus_init(struct bus_type *bus, const
struct iommu_ops *ops) return -ENOMEM;
 nb->notifier_call = iommu_bus_notifier;

+   nb->priority = IOMMU_BUS_NOTIFY_PRIORITY;



  static struct notifier_block dmar_pci_bus_add_nb = {
 .notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,
  };

  static struct notifier_block dmar_pci_bus_remove_nb = {
 .notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY - 1,
  };


IOMMU_BUS_NOTIFY_PRIORITY by default is 0. So you can simply use 1 and
-1? Adding a comment around it will be helpful.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-13 Thread Jacob Pan
Hi BaoLu,

On Fri, 14 Jan 2022 08:58:53 +0800, Lu Baolu 
wrote:

> Hi Jacob,
> 
> On 1/13/22 9:23 PM, Jacob Pan wrote:
> > During PCI bus rescan, adding new devices involve two notifiers.
> > 1. dmar_pci_bus_notifier()
> > 2. iommu_bus_notifier()
> > The current code sets #1 as low priority (INT_MIN) which resulted in #2
> > being invoked first. The result is that struct device pointer cannot be
> > found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
> > device is put under the "catch-all" IOMMU instead of the correct one.
> > 
> > This could cause system hang when device TLB invalidation is sent to the
> > wrong IOMMU. Invalidation timeout error or hard lockup can be observed.
> > 
> > This patch fixes the issue by setting a higher priority for
> > dmar_pci_bus_notifier. DRHD search for a new device will find the
> > correct IOMMU.
> > 
> > Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
> > Reported-by: Zhang, Bernice 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel/dmar.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index 915bff76fe96..5d07e5b89c2e 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct
> > notifier_block *nb, 
> >   static struct notifier_block dmar_pci_bus_nb = {
> > .notifier_call = dmar_pci_bus_notifier,
> > -   .priority = INT_MIN,
> > +   .priority = INT_MAX,
> >   };
> >   
> >   static struct dmar_drhd_unit *
> >   
> 
> Nice catch! dmar_pci_bus_add_dev() should take place *before*
> iommu_probe_device(). This change enforces this with a higher notifier
> priority for dmar callback.
> 
> Comparably, dmar_pci_bus_del_dev() should take place *after*
> iommu_release_device(). Perhaps we can use two notifiers, one for
> ADD_DEVICE (with .priority=INT_MAX) and the other for REMOVE_DEVICE
> (with .priority=INT_MIN)?
> 

Since device_to_iommu() lookup in intel_iommu_release_device() only
checks if device is under "an" IOMMU, not "the" IOMMU. Then the remove path
order is not needed, right?

I know this is not robust, but having so many notifiers with implicit
priority is not clean either.

Perhaps, we should have explicit priority defined around iommu_bus
notifier? i.e.

@@ -1841,6 +1841,7 @@ static int iommu_bus_init(struct bus_type *bus, const
struct iommu_ops *ops) return -ENOMEM; 
nb->notifier_call = iommu_bus_notifier;
   
+   nb->priority = IOMMU_BUS_NOTIFY_PRIORITY;
   

 static struct notifier_block dmar_pci_bus_add_nb = {  
.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,   
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,   
 };

 static struct notifier_block dmar_pci_bus_remove_nb = {  
.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,   
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY - 1,   
 };   
   

> Best regards,
> baolu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-13 Thread Lu Baolu

Hi Jacob,

On 1/13/22 9:23 PM, Jacob Pan wrote:

During PCI bus rescan, adding new devices involve two notifiers.
1. dmar_pci_bus_notifier()
2. iommu_bus_notifier()
The current code sets #1 as low priority (INT_MIN) which resulted in #2
being invoked first. The result is that struct device pointer cannot be
found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
device is put under the "catch-all" IOMMU instead of the correct one.

This could cause system hang when device TLB invalidation is sent to the
wrong IOMMU. Invalidation timeout error or hard lockup can be observed.

This patch fixes the issue by setting a higher priority for
dmar_pci_bus_notifier. DRHD search for a new device will find the
correct IOMMU.

Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
Reported-by: Zhang, Bernice 
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/dmar.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 915bff76fe96..5d07e5b89c2e 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct notifier_block *nb,
  
  static struct notifier_block dmar_pci_bus_nb = {

.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+   .priority = INT_MAX,
  };
  
  static struct dmar_drhd_unit *




Nice catch! dmar_pci_bus_add_dev() should take place *before*
iommu_probe_device(). This change enforces this with a higher notifier
priority for dmar callback.

Comparably, dmar_pci_bus_del_dev() should take place *after*
iommu_release_device(). Perhaps we can use two notifiers, one for
ADD_DEVICE (with .priority=INT_MAX) and the other for REMOVE_DEVICE
(with .priority=INT_MIN)?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu