Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2017-01-04 Thread Joerg Roedel
On Mon, Dec 05, 2016 at 08:09:07PM +0800, Xunlei Pang wrote:

> v2->v3:
> Flush context cache only and add Fixes-tag, according to Joerg's comments.
> 
>  drivers/iommu/intel-iommu.c | 19 +++
>  1 file changed, 19 insertions(+)

Applied to the fixes branch, thanks.



Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2017-01-04 Thread Joerg Roedel
On Mon, Dec 05, 2016 at 08:09:07PM +0800, Xunlei Pang wrote:

> v2->v3:
> Flush context cache only and add Fixes-tag, according to Joerg's comments.
> 
>  drivers/iommu/intel-iommu.c | 19 +++
>  1 file changed, 19 insertions(+)

Applied to the fixes branch, thanks.



Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2017-01-03 Thread Joerg Roedel
On Tue, Jan 03, 2017 at 08:23:23AM -0700, Myron Stowe wrote:
> This didn't seem to make the 4.10 merge window.  Was that on purpose?
> Any idea when you might include this in a pull request for Linus?

I will collect fixes this week and send them to Linus, they will also
include this patch.



Joerg



Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2017-01-03 Thread Joerg Roedel
On Tue, Jan 03, 2017 at 08:23:23AM -0700, Myron Stowe wrote:
> This didn't seem to make the 4.10 merge window.  Was that on purpose?
> Any idea when you might include this in a pull request for Linus?

I will collect fixes this week and send them to Linus, they will also
include this patch.



Joerg



Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2017-01-03 Thread Myron Stowe
On Tue, Dec 6, 2016 at 9:03 AM, Joerg Roedel  wrote:
> On Mon, Dec 05, 2016 at 08:09:07PM +0800, Xunlei Pang wrote:
>>  drivers/iommu/intel-iommu.c | 19 +++
>>  1 file changed, 19 insertions(+)
>
> Applied, thanks.

Joerg:

This didn't seem to make the 4.10 merge window.  Was that on purpose?
Any idea when you might include this in a pull request for Linus?

Myron
>


Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2017-01-03 Thread Myron Stowe
On Tue, Dec 6, 2016 at 9:03 AM, Joerg Roedel  wrote:
> On Mon, Dec 05, 2016 at 08:09:07PM +0800, Xunlei Pang wrote:
>>  drivers/iommu/intel-iommu.c | 19 +++
>>  1 file changed, 19 insertions(+)
>
> Applied, thanks.

Joerg:

This didn't seem to make the 4.10 merge window.  Was that on purpose?
Any idea when you might include this in a pull request for Linus?

Myron
>


RE: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-12-06 Thread Don Brace
> -Original Message-
> From: Xunlei Pang [mailto:xlp...@redhat.com]
> Sent: Monday, December 05, 2016 6:09 AM
> To: Joerg Roedel; David Woodhouse
> Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Xunlei
> Pang; Myron Stowe; Joseph Szczypek; Don Brace; Baoquan He; Dave Young
> Subject: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when
> the device gets context mapped
> 
> EXTERNAL EMAIL
> 
> 
> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
> under kdump, it can be steadily reproduced on several different machines,
> the dmesg log is like:
> HP HPSA Driver (v 3.4.16-0)
> hpsa :02:00.0: using doorbell to reset controller
> hpsa :02:00.0: board ready after hard reset.
> hpsa :02:00.0: Waiting for controller to respond to no-op
> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 -
> 0xbdf6efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
> DMAR: DRHD: handling fault status reg 2
> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason
> 06] PTE Read access is not set
> hpsa :02:00.0: controller message 03:00 timed out
> hpsa :02:00.0: no-op failed; re-trying
> 
> After some debugging, we found that the fault addr is from DMA initiated at
> the driver probe stage after reset(not in-flight DMA), and the corresponding
> pte entry value is correct, the fault is likely due to the old iommu caches
> of the in-flight DMA before it.
> 
> Thus we need to flush the old cache after context mapping is setup for the
> device, where the device is supposed to finish reset at its driver probe
> stage and no in-flight DMA exists hereafter.
> 
> I'm not sure if the hardware is responsible for invalidating all the related
> caches allocated in the iommu hardware before, but seems not the case for
> hpsa,
> actually many device drivers have problems in properly resetting the
> hardware.
> Anyway flushing (again) by software in kdump kernel when the device gets
> context
> mapped which is a quite infrequent operation does little harm.
> 
> With this patch, the problematic machine can survive the kdump tests.
> 
> CC: Myron Stowe <myron.st...@gmail.com>
> CC: Joseph Szczypek <jszcz...@redhat.com>
> CC: Don Brace <don.br...@microsemi.com>
> CC: Baoquan He <b...@redhat.com>
> CC: Dave Young <dyo...@redhat.com>
> Fixes: 091d42e43d21 ("iommu/vt-d: Copy translation tables from old kernel")
> Fixes: dbcd861f252d ("iommu/vt-d: Do not re-use domain-ids from the old
> kernel")
> Fixes: cf484d0e6939 ("iommu/vt-d: Mark copied context entries")
> Signed-off-by: Xunlei Pang <xlp...@redhat.com>
> ---
> v2->v3:
> Flush context cache only and add Fixes-tag, according to Joerg's comments.
> 
>  drivers/iommu/intel-iommu.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3965e73..624eac9 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct
> dmar_domain *domain,
> if (context_present(context))
> goto out_unlock;
> 
> +   /*
> +* For kdump cases, old valid entries may be cached due to the
> +* in-flight DMA and copied pgtable, but there is no unmapping
> +* behaviour for them, thus we need an explicit cache flush for
> +* the newly-mapped device. For kdump, at this point, the device
> +* is supposed to finish reset at its driver probe stage, so no
> +* in-flight DMA will exist, and we don't need to worry anymore
> +* hereafter.
> +*/
> +   if (context_copied(context)) {
> +   u16 did_old = context_domain_id(context);
> +
> +   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
> +   iommu->flush.flush_context(iommu, did_old,
> +  (((u16)bus) << 8) | devfn,
> +  DMA_CCMD_MASK_NOBIT,
> +  DMA_CCMD_DEVICE_INVL);
> +   }
> +
> pgd = domain->pgd;
> 
> context_clear_entry(context);
> --
> 1.8.3.1

Tested-by: Don Brace <don.br...@microsemi.com>

Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation





RE: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-12-06 Thread Don Brace
> -Original Message-
> From: Xunlei Pang [mailto:xlp...@redhat.com]
> Sent: Monday, December 05, 2016 6:09 AM
> To: Joerg Roedel; David Woodhouse
> Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Xunlei
> Pang; Myron Stowe; Joseph Szczypek; Don Brace; Baoquan He; Dave Young
> Subject: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when
> the device gets context mapped
> 
> EXTERNAL EMAIL
> 
> 
> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
> under kdump, it can be steadily reproduced on several different machines,
> the dmesg log is like:
> HP HPSA Driver (v 3.4.16-0)
> hpsa :02:00.0: using doorbell to reset controller
> hpsa :02:00.0: board ready after hard reset.
> hpsa :02:00.0: Waiting for controller to respond to no-op
> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 -
> 0xbdf6efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
> DMAR: DRHD: handling fault status reg 2
> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason
> 06] PTE Read access is not set
> hpsa :02:00.0: controller message 03:00 timed out
> hpsa :02:00.0: no-op failed; re-trying
> 
> After some debugging, we found that the fault addr is from DMA initiated at
> the driver probe stage after reset(not in-flight DMA), and the corresponding
> pte entry value is correct, the fault is likely due to the old iommu caches
> of the in-flight DMA before it.
> 
> Thus we need to flush the old cache after context mapping is setup for the
> device, where the device is supposed to finish reset at its driver probe
> stage and no in-flight DMA exists hereafter.
> 
> I'm not sure if the hardware is responsible for invalidating all the related
> caches allocated in the iommu hardware before, but seems not the case for
> hpsa,
> actually many device drivers have problems in properly resetting the
> hardware.
> Anyway flushing (again) by software in kdump kernel when the device gets
> context
> mapped which is a quite infrequent operation does little harm.
> 
> With this patch, the problematic machine can survive the kdump tests.
> 
> CC: Myron Stowe 
> CC: Joseph Szczypek 
> CC: Don Brace 
> CC: Baoquan He 
> CC: Dave Young 
> Fixes: 091d42e43d21 ("iommu/vt-d: Copy translation tables from old kernel")
> Fixes: dbcd861f252d ("iommu/vt-d: Do not re-use domain-ids from the old
> kernel")
> Fixes: cf484d0e6939 ("iommu/vt-d: Mark copied context entries")
> Signed-off-by: Xunlei Pang 
> ---
> v2->v3:
> Flush context cache only and add Fixes-tag, according to Joerg's comments.
> 
>  drivers/iommu/intel-iommu.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3965e73..624eac9 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct
> dmar_domain *domain,
> if (context_present(context))
> goto out_unlock;
> 
> +   /*
> +* For kdump cases, old valid entries may be cached due to the
> +* in-flight DMA and copied pgtable, but there is no unmapping
> +* behaviour for them, thus we need an explicit cache flush for
> +* the newly-mapped device. For kdump, at this point, the device
> +* is supposed to finish reset at its driver probe stage, so no
> +* in-flight DMA will exist, and we don't need to worry anymore
> +* hereafter.
> +*/
> +   if (context_copied(context)) {
> +   u16 did_old = context_domain_id(context);
> +
> +   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
> +   iommu->flush.flush_context(iommu, did_old,
> +  (((u16)bus) << 8) | devfn,
> +  DMA_CCMD_MASK_NOBIT,
> +  DMA_CCMD_DEVICE_INVL);
> +   }
> +
> pgd = domain->pgd;
> 
> context_clear_entry(context);
> --
> 1.8.3.1

Tested-by: Don Brace 

Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation





Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-12-06 Thread Joerg Roedel
On Mon, Dec 05, 2016 at 08:09:07PM +0800, Xunlei Pang wrote:
>  drivers/iommu/intel-iommu.c | 19 +++
>  1 file changed, 19 insertions(+)

Applied, thanks.



Re: [PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-12-06 Thread Joerg Roedel
On Mon, Dec 05, 2016 at 08:09:07PM +0800, Xunlei Pang wrote:
>  drivers/iommu/intel-iommu.c | 19 +++
>  1 file changed, 19 insertions(+)

Applied, thanks.



[PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-12-05 Thread Xunlei Pang
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like:
HP HPSA Driver (v 3.4.16-0)
hpsa :02:00.0: using doorbell to reset controller
hpsa :02:00.0: board ready after hard reset.
hpsa :02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 06] 
PTE Read access is not set
hpsa :02:00.0: controller message 03:00 timed out
hpsa :02:00.0: no-op failed; re-trying

After some debugging, we found that the fault addr is from DMA initiated at
the driver probe stage after reset(not in-flight DMA), and the corresponding
pte entry value is correct, the fault is likely due to the old iommu caches
of the in-flight DMA before it.

Thus we need to flush the old cache after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.

I'm not sure if the hardware is responsible for invalidating all the related
caches allocated in the iommu hardware before, but seems not the case for hpsa,
actually many device drivers have problems in properly resetting the hardware.
Anyway flushing (again) by software in kdump kernel when the device gets context
mapped which is a quite infrequent operation does little harm.

With this patch, the problematic machine can survive the kdump tests.

CC: Myron Stowe 
CC: Joseph Szczypek 
CC: Don Brace 
CC: Baoquan He 
CC: Dave Young 
Fixes: 091d42e43d21 ("iommu/vt-d: Copy translation tables from old kernel")
Fixes: dbcd861f252d ("iommu/vt-d: Do not re-use domain-ids from the old kernel")
Fixes: cf484d0e6939 ("iommu/vt-d: Mark copied context entries")
Signed-off-by: Xunlei Pang 
---
v2->v3:
Flush context cache only and add Fixes-tag, according to Joerg's comments.

 drivers/iommu/intel-iommu.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..624eac9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
if (context_present(context))
goto out_unlock;
 
+   /*
+* For kdump cases, old valid entries may be cached due to the
+* in-flight DMA and copied pgtable, but there is no unmapping
+* behaviour for them, thus we need an explicit cache flush for
+* the newly-mapped device. For kdump, at this point, the device
+* is supposed to finish reset at its driver probe stage, so no
+* in-flight DMA will exist, and we don't need to worry anymore
+* hereafter.
+*/
+   if (context_copied(context)) {
+   u16 did_old = context_domain_id(context);
+
+   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
+   iommu->flush.flush_context(iommu, did_old,
+  (((u16)bus) << 8) | devfn,
+  DMA_CCMD_MASK_NOBIT,
+  DMA_CCMD_DEVICE_INVL);
+   }
+
pgd = domain->pgd;
 
context_clear_entry(context);
-- 
1.8.3.1



[PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-12-05 Thread Xunlei Pang
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like:
HP HPSA Driver (v 3.4.16-0)
hpsa :02:00.0: using doorbell to reset controller
hpsa :02:00.0: board ready after hard reset.
hpsa :02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 06] 
PTE Read access is not set
hpsa :02:00.0: controller message 03:00 timed out
hpsa :02:00.0: no-op failed; re-trying

After some debugging, we found that the fault addr is from DMA initiated at
the driver probe stage after reset(not in-flight DMA), and the corresponding
pte entry value is correct, the fault is likely due to the old iommu caches
of the in-flight DMA before it.

Thus we need to flush the old cache after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.

I'm not sure if the hardware is responsible for invalidating all the related
caches allocated in the iommu hardware before, but seems not the case for hpsa,
actually many device drivers have problems in properly resetting the hardware.
Anyway flushing (again) by software in kdump kernel when the device gets context
mapped which is a quite infrequent operation does little harm.

With this patch, the problematic machine can survive the kdump tests.

CC: Myron Stowe 
CC: Joseph Szczypek 
CC: Don Brace 
CC: Baoquan He 
CC: Dave Young 
Fixes: 091d42e43d21 ("iommu/vt-d: Copy translation tables from old kernel")
Fixes: dbcd861f252d ("iommu/vt-d: Do not re-use domain-ids from the old kernel")
Fixes: cf484d0e6939 ("iommu/vt-d: Mark copied context entries")
Signed-off-by: Xunlei Pang 
---
v2->v3:
Flush context cache only and add Fixes-tag, according to Joerg's comments.

 drivers/iommu/intel-iommu.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..624eac9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
if (context_present(context))
goto out_unlock;
 
+   /*
+* For kdump cases, old valid entries may be cached due to the
+* in-flight DMA and copied pgtable, but there is no unmapping
+* behaviour for them, thus we need an explicit cache flush for
+* the newly-mapped device. For kdump, at this point, the device
+* is supposed to finish reset at its driver probe stage, so no
+* in-flight DMA will exist, and we don't need to worry anymore
+* hereafter.
+*/
+   if (context_copied(context)) {
+   u16 did_old = context_domain_id(context);
+
+   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
+   iommu->flush.flush_context(iommu, did_old,
+  (((u16)bus) << 8) | devfn,
+  DMA_CCMD_MASK_NOBIT,
+  DMA_CCMD_DEVICE_INVL);
+   }
+
pgd = domain->pgd;
 
context_clear_entry(context);
-- 
1.8.3.1