Re: [PATCH RFC 1/1] iommu: set the default iommu-dma mode as non-strict

2019-02-28 Thread Leizhen (ThunderTown)



On 2019/2/26 20:36, Hanjun Guo wrote:
> Hi Jean,
> 
> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>> Hi,
>>
>> On 31/01/2019 13:52, Zhen Lei wrote:
>>> Currently, many peripherals are faster than before. For example, the top
>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>> in strict mode, because of frequently map and unmap operations. In order
>>> to keep abreast of the times, I think it's better to set non-strict as
>>> default.
>>
>> Most users won't be aware of this relaxation and will have their system
>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>> Invalidation in
>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
Hi Jean,

   In fact, we have discussed the vulnerable of deferred invalidation before 
upstream
the non-strict patches. The attacks maybe possible because of an untrusted 
device or
the mistake of the device driver. And we limited the VFIO to still use strict 
mode.
   As mentioned in the pdf, limit the freed memory with deferred invalidation 
only to
be reused by the device, can mitigate the vulnerability. But it's too hard to 
implement
it now.
   A compromise maybe we only apply non-strict to (1) dma_free_coherent, 
because the
memory is controlled by DMA common module, so we can make the memory to be 
freed after
the global invalidation in the timer handler. (2) And provide some new APIs 
related to
iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate 
device
drivers update the APIs if they want to improve performance. (3) Make sure that 
only
the trusted devices and trusted drivers can apply (1) and (2). For example, the 
driver
must be built into kernel Image.
   So that some high-end trusted devices use non-strict mode, and keep others 
still using
strict mode. The drivers who want to use non-strict mode, should change to use 
new APIs
by themselves.


>>
>> Why not keep the policy to secure by default, as we do for
>> iommu.passthrough? And maybe add something similar to
>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>> command-line argument or change the default config.
> 
> Sorry for the late reply, it was Chinese new year, and we had a long 
> discussion
> internally, we are fine to add a Kconfig but not sure OS vendors will set it
> to default y.
> 
> OS vendors seems not happy to pass a command-line argument, to be honest,
> this is our motivation to enable non-strict as default. Hope OS vendors
> can see this email thread, and give some input here.
> 
> Thanks
> Hanjun
> 
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu/vt-d: Set context field after value initialized

2019-02-28 Thread Lu Baolu
Otherwise, the translation type field of a context entry for
a PCI device will always be 0. All translated DMA requests
will be blocked by IOMMU. As the result, the PCI devices with
PCI ATS (device IOTBL) support won't work as expected.

Cc: Ashok Raj 
Cc: Jacob Pan 
Suggested-by: Kevin Tian 
Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index abdd165a829c..c968b3c7bae0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2056,7 +2056,6 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
int agaw;
 
context_set_domain_id(context, did);
-   context_set_translation_type(context, translation);
 
if (translation != CONTEXT_TT_PASS_THROUGH) {
/*
@@ -2086,6 +2085,8 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
 */
context_set_address_width(context, iommu->msagaw);
}
+
+   context_set_translation_type(context, translation);
}
 
context_set_fault_enable(context);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm()

2019-02-28 Thread Lu Baolu
Intel IOMMU could be turned off with intel_iommu=off. If Intel
IOMMU is off,  the intel_iommu struct will not be initialized.
When device drivers call intel_svm_bind_mm(), the NULL pointer
reference will happen there.

Add dmar_disabled check to avoid NULL pointer reference.

Cc: Ashok Raj 
Cc: Jacob Pan 
Reported-by: Dave Jiang 
Fixes: 2f26e0a9c9860 ("iommu/vt-d: Add basic SVM PASID support")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index c79540deaf00..3a4b09ae8561 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -234,7 +234,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
int pasid_max;
int ret;
 
-   if (!iommu)
+   if (!iommu || dmar_disabled)
return -EINVAL;
 
if (dev_is_pci(dev)) {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] iommu/vt-d: Get domain ID before clear pasid entry

2019-02-28 Thread Lu Baolu
After tearing down a pasid entry, the domain id is used to
invalidate the translation caches. Retrieve the domain id
from the pasid entry value before clearing the pasid entry.
Otherwise, we will always use domain id 0.

Cc: Ashok Raj 
Cc: Jacob Pan 
Signed-off-by: Liu Yi L 
Fixes: 6f7db75e1c469 ("iommu/vt-d: Add second level page table interface")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-pasid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 53fe5248d8f1..03b12d2ee213 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -466,8 +466,8 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
if (WARN_ON(!pte))
return;
 
-   intel_pasid_clear_entry(dev, pasid);
did = pasid_get_domain_id(pte);
+   intel_pasid_clear_entry(dev, pasid);
 
if (!ecap_coherent(iommu->ecap))
clflush_cache_range(pte, sizeof(*pte));
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] iommu/vt-d: Several fixes for 5.1

2019-02-28 Thread Lu Baolu
Hi Joerg,

This includes several small fixes. All are stable kernel irrelated.
Please consider it for 5.1-rc1.

Best regards,
Lu Baolu

Lu Baolu (4):
  iommu/vt-d: Disable ATS support on untrusted devices
  iommu/vt-d: Set context field after value initialized
  iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm()
  iommu/vt-d: Get domain ID before clear pasid entry

 drivers/iommu/intel-iommu.c | 6 --
 drivers/iommu/intel-pasid.c | 2 +-
 drivers/iommu/intel-svm.c   | 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu/vt-d: Disable ATS support on untrusted devices

2019-02-28 Thread Lu Baolu
Commit fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted
devices") disables ATS support on the devices which have been marked
as untrusted. Unfortunately this is not enough to fix the DMA attack
vulnerabiltiies because IOMMU driver allows translated requests as
long as a device advertises the ATS capability. Hence a malicious
peripheral device could use this to bypass IOMMU.

This disables the ATS support on untrusted devices by clearing the
internal per-device ATS mark. As the result, IOMMU driver will block
any translated requests from any device marked as untrusted.

Cc: Jacob Pan 
Cc: Mika Westerberg 
Suggested-by: Kevin Tian 
Suggested-by: Ashok Raj 
Fixes: fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f8f6d46c60f4..abdd165a829c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2484,7 +2484,8 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
 
-   if (!pci_ats_disabled() &&
+   if (!pdev->untrusted &&
+   !pci_ats_disabled() &&
ecap_dev_iotlb_support(iommu->ecap) &&
pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS) &&
dmar_find_matched_atsr_unit(pdev))
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-28 Thread Pingfan Liu
On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu  wrote:
>
> Hi Borislav,
>
> Do you think the following patch is good at present?
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 81f9d23..9213073 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -460,7 +460,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_LOW_MAX(512 << 20)
>  # define CRASH_ADDR_HIGH_MAX   (512 << 20)
>  #else
> -# define CRASH_ADDR_LOW_MAX(896UL << 20)
> +# define CRASH_ADDR_LOW_MAX(1 << 32)
>  # define CRASH_ADDR_HIGH_MAX   MAXMEM
>  #endif
>
Or patch lools like:
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..ed0def5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -459,7 +459,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_LOW_MAX(512 << 20)
 # define CRASH_ADDR_HIGH_MAX   (512 << 20)
 #else
-# define CRASH_ADDR_LOW_MAX(896UL << 20)
+# define CRASH_ADDR_LOW_MAX(1 << 32)
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif

@@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
high ? CRASH_ADDR_HIGH_MAX
 : CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+   /*
+* crashkernel=X reserve below 4G fails? Try MAXMEM
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
+#endif

which tries 0-4G, the fall back to 4G above

> For documentation, I will send another patch to improve the description.
>
> Thanks,
> Pingfan
>
> On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov  wrote:
> >
> > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > > If we move to high as default, it will allocate 160M high + 256M low. It
> >
> > We won't move to high by default - we will *fall* back to high if the
> > default allocation fails.
> >
> > > To make the process less fragile maybe we can remove the 896M limitation
> > > and only try <4G then go to high.
> >
> > Sure, the more robust for the user, the better.
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-28 Thread Pingfan Liu
Hi Borislav,

Do you think the following patch is good at present?
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 81f9d23..9213073 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -460,7 +460,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_LOW_MAX(512 << 20)
 # define CRASH_ADDR_HIGH_MAX   (512 << 20)
 #else
-# define CRASH_ADDR_LOW_MAX(896UL << 20)
+# define CRASH_ADDR_LOW_MAX(1 << 32)
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif

For documentation, I will send another patch to improve the description.

Thanks,
Pingfan

On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov  wrote:
>
> On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > If we move to high as default, it will allocate 160M high + 256M low. It
>
> We won't move to high by default - we will *fall* back to high if the
> default allocation fails.
>
> > To make the process less fragile maybe we can remove the 896M limitation
> > and only try <4G then go to high.
>
> Sure, the more robust for the user, the better.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Bind process address spaces to devices

2019-02-28 Thread Raj, Ashok
On Thu, Feb 28, 2019 at 01:15:49PM -0800, Jacob Pan wrote:
> On Thu, 28 Feb 2019 15:09:50 +0100
> Joerg Roedel  wrote:
> 
> > Hi Jacob,
> > 
> > On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote:
> > > On Tue, 26 Feb 2019 12:17:43 +0100
> > > Joerg Roedel  wrote:  
> > 
> > > Just trying to understand how to use this API.
> > > So if we bind the same mm to two different devices, we should get
> > > two different iommu_sva handle, right?
> > > I think intel-svm still needs a flag argument for supervisor pasid
> > > etc. Other than that, I think both interface should work for vt-d.  
> > 
> > I second Jean's question here, is supervisor pasid still needed with
> > scalable mode? What is the use-case and which mm_struct will be used
> > for supervisor accesses?
> > 
> I will delegate this to Ashok.

Supervisor PASID is still required for some kernel clients. Some of our
IB folks had asked for it. Current implementation uses init_mm, but
we know this is dangerous. Plus since the kernel has no support for
mmu_notifiers for kernel memory we were not able to invalidate
device tlb after memory was freed.

Suppose we could just build regular page-tables much like how the map/unmap
does today, but bind it with a Supervisor PASID. This way we don't open
up the kimono to the device, but only open select portions on request.

we haven't spent enough time on it lately, but will focus once the core
pieces are completed for the baseline support for Scalable mode.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Bind process address spaces to devices

2019-02-28 Thread Jacob Pan
On Thu, 28 Feb 2019 15:09:50 +0100
Joerg Roedel  wrote:

> Hi Jacob,
> 
> On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote:
> > On Tue, 26 Feb 2019 12:17:43 +0100
> > Joerg Roedel  wrote:  
> 
> > Just trying to understand how to use this API.
> > So if we bind the same mm to two different devices, we should get
> > two different iommu_sva handle, right?
> > I think intel-svm still needs a flag argument for supervisor pasid
> > etc. Other than that, I think both interface should work for vt-d.  
> 
> I second Jean's question here, is supervisor pasid still needed with
> scalable mode? What is the use-case and which mm_struct will be used
> for supervisor accesses?
> 
I will delegate this to Ashok.

> > Another question is that for nested SVA, we will need to bind guest
> > mm. Do you think we should try to reuse this or have it separate? I
> > am working on a separate API for now.  
> 
> I think a separate API makes more sense. It could be somehow fit into
> this as well, but having it separate is cleaner. And we already have
> separate API for aux-domains, so this would be just another extension
> of the IOMMU-API for using PASIDs.
> 
Agreed.
> 
> > >   int iommu_sva_get_pasid(struct iommu_sva *handle);  
> > If multiple bind to the same mm gets multiple handles, this API
> > should retrieve the same pasid for different handle?  
> 
> It can return the same handle if we store the pasid in the mm_struct,
> for example ...
> > Just curious why making the handle private instead of returning the
> > pasid value in the handle?  
> 
> ... which is also the reason why I prefer the accessor function, it
> allows to have the pasid not in the iommu_sva handle, but to retrieve
> it from somewhere else (like the mm_struct).
make sense, more flexible storage and controlled access too. thanks for
explaining.


Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Bind process address spaces to devices

2019-02-28 Thread Jacob Pan
On Thu, 28 Feb 2019 01:10:55 +
"Tian, Kevin"  wrote:

> > From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> > Sent: Thursday, February 28, 2019 5:41 AM
> > 
> > On Tue, 26 Feb 2019 12:17:43 +0100
> > Joerg Roedel  wrote:
> >   
> > >
> > > How about a 'struct iommu_sva' with an iommu-private definition
> > > that is returned by this function:
> > >
> > >   struct iommu_sva *iommu_sva_bind_device(struct device
> > > *dev, struct mm_struct *mm);
> > >  
> > Just trying to understand how to use this API.
> > So if we bind the same mm to two different devices, we should get
> > two different iommu_sva handle, right?
> > I think intel-svm still needs a flag argument for supervisor pasid
> > etc. Other than that, I think both interface should work for vt-d.
> > 
> > Another question is that for nested SVA, we will need to bind guest
> > mm. Do you think we should try to reuse this or have it separate? I
> > am working on a separate API for now.
> >   
> 
> It has to be different. Host doesn't know guest mm.
> 
> Also note that from virtualization p.o.v we just focus on 'nested
> translation' in host side. The 1st level may point to guest CPU
> page table (SVA), or IOVA page table. In that manner, the API
> (as currently defined in your series) is purely about setting up
> nested translation on VFIO assigned device. 
>
Sounds good, will keep them separate.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Bind process address spaces to devices

2019-02-28 Thread Jacob Pan
On Thu, 28 Feb 2019 12:19:22 +
Jean-Philippe Brucker  wrote:

> On 27/02/2019 21:41, Jacob Pan wrote:
> > On Tue, 26 Feb 2019 12:17:43 +0100
> > Joerg Roedel  wrote:
> >   
> >> Hi Jean-Philippe,
> >>
> >> Thanks for the patch! I think this is getting close to be applied
> >> after the next merge window.
> >>
> >> On Wed, Feb 20, 2019 at 02:27:59PM +, Jean-Philippe Brucker
> >> wrote:  
> >>> +int iommu_sva_bind_device(struct device *dev, struct mm_struct
> >>> *mm, int *pasid,
> >>> +   iommu_mm_exit_handler_t mm_exit, void
> >>> *drvdata)
> >>
> >> I think we are better of with introducing a sva-bind handle which
> >> can be used to extend and further configure the binding done with
> >> this function.
> >>
> >> How about a 'struct iommu_sva' with an iommu-private definition
> >> that is returned by this function:
> >>
> >>struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> >>struct mm_struct
> >> *mm); 
> > Just trying to understand how to use this API.
> > So if we bind the same mm to two different devices, we should get
> > two different iommu_sva handle, right?  
> 
> Yes, the iommu_sva handle is the bond between one mm and one device,
> so you will get two different handles.
> 
> > I think intel-svm still needs a flag argument for supervisor pasid
> > etc. Other than that, I think both interface should work for vt-d.  
> 
> Is supervisor PASID still needed now that we have auxiliary domains,
> and now that VT-d supports nested IOVA? You could have private kernel
> address spaces through auxiliary domains, or simply use DMA API as
> usual with PASID#0. I've been reluctant to make that feature common
> because it looks risky - gives full access to the kernel address
> space to devices and no notification on mapping change.
> 
It is still in the VT-d spec. Ashok will be able to answer this
better :)
> > Another question is that for nested SVA, we will need to bind guest
> > mm. Do you think we should try to reuse this or have it separate? I
> > am working on a separate API for now.  
> 
> I also think it should be separate. That bind() operation is performed
> on an auxiliary domain, I guess?
> 
yes the 2nd level is retrieved from aux domain for mdev, but for pdev,
2nd level comes from rid2pasid/default domain.
> >> and the corresponding unbind function:
> >>
> >>int iommu_sva_unbind_device(struct iommu_sva* *handle);
> >>
> >> (Btw, does this need to return and int? Can unbinding fail?).
> >>
> >> With that in place we can implement and extentable API base on the
> >> handle:
> >>
> >>int iommu_sva_get_pasid(struct iommu_sva *handle);  
> > If multiple bind to the same mm gets multiple handles, this API
> > should retrieve the same pasid for different handle?  
> 
> Yes
> 
> > Just curious why making
> > the handle private instead of returning the pasid value in the
> > handle?  
> 
> I don't have a strong objection against that. One reason to have an
> accessor is that the PASID is freed on mm_exit, so until the device
> driver calls unbind(), the PASID contained in the handle is stale (and
> the accessor returns PASID_INVALID). But that's a bit pedantic, the
> device driver should know that the handle is stale since it got
> notified of it. Having an accessor might also help tracking uses of
> the handle in the kernel, and make future API modifications easier.
> 
make sense.
> Thanks,
> Jean
> 
> >>void iommu_sva_set_exit_handler(struct iommu_sva *handle,
> >>iommu_mm_exit_handler_t
> >> mm_exit);
> >>
> >> I think at least the AMD IOMMU driver needs more call-backs like a
> >> handler that is invoked when a fault can not be resolved. And there
> >> might be others in the future, putting them all in the parameter
> >> list of the bind function doesn't scale well.
> >>  
> >   
> >> Regards,
> >>
> >>Joerg  
> > 
> >   
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Bind process address spaces to devices

2019-02-28 Thread Joerg Roedel
Hi Jacob,

On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote:
> On Tue, 26 Feb 2019 12:17:43 +0100
> Joerg Roedel  wrote:

> Just trying to understand how to use this API.
> So if we bind the same mm to two different devices, we should get two
> different iommu_sva handle, right?
> I think intel-svm still needs a flag argument for supervisor pasid etc.
> Other than that, I think both interface should work for vt-d.

I second Jean's question here, is supervisor pasid still needed with
scalable mode? What is the use-case and which mm_struct will be used for
supervisor accesses?

> Another question is that for nested SVA, we will need to bind guest mm.
> Do you think we should try to reuse this or have it separate? I am
> working on a separate API for now.

I think a separate API makes more sense. It could be somehow fit into
this as well, but having it separate is cleaner. And we already have
separate API for aux-domains, so this would be just another extension of
the IOMMU-API for using PASIDs.


> > int iommu_sva_get_pasid(struct iommu_sva *handle);
> If multiple bind to the same mm gets multiple handles, this API should
> retrieve the same pasid for different handle?

It can return the same handle if we store the pasid in the mm_struct,
for example ...
> Just curious why making the handle private instead of returning the
> pasid value in the handle?

... which is also the reason why I prefer the accessor function, it
allows to have the pasid not in the iommu_sva handle, but to retrieve it
from somewhere else (like the mm_struct).

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: MT76x2U crashes XHCI driver on AMD Ryzen system

2019-02-28 Thread Joerg Roedel
On Thu, Feb 28, 2019 at 01:19:48PM +0100, Stanislaw Gruszka wrote:
> Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().

Yes, it is. In sg_num_pages() the offset into the IOMMU mapping is
stored in s->dma_address, taking also the segment boundary mask into
account. map_sg() later only adds the base-address to that.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Bind process address spaces to devices

2019-02-28 Thread Jean-Philippe Brucker
On 27/02/2019 21:41, Jacob Pan wrote:
> On Tue, 26 Feb 2019 12:17:43 +0100
> Joerg Roedel  wrote:
> 
>> Hi Jean-Philippe,
>>
>> Thanks for the patch! I think this is getting close to be applied
>> after the next merge window.
>>
>> On Wed, Feb 20, 2019 at 02:27:59PM +, Jean-Philippe Brucker wrote:
>>> +int iommu_sva_bind_device(struct device *dev, struct mm_struct
>>> *mm, int *pasid,
>>> + iommu_mm_exit_handler_t mm_exit, void
>>> *drvdata)  
>>
>> I think we are better of with introducing a sva-bind handle which can
>> be used to extend and further configure the binding done with this
>> function.
>>
>> How about a 'struct iommu_sva' with an iommu-private definition that
>> is returned by this function:
>>
>>  struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>>  struct mm_struct *mm);
>>
> Just trying to understand how to use this API.
> So if we bind the same mm to two different devices, we should get two
> different iommu_sva handle, right?

Yes, the iommu_sva handle is the bond between one mm and one device, so
you will get two different handles.

> I think intel-svm still needs a flag argument for supervisor pasid etc.
> Other than that, I think both interface should work for vt-d.

Is supervisor PASID still needed now that we have auxiliary domains, and
now that VT-d supports nested IOVA? You could have private kernel
address spaces through auxiliary domains, or simply use DMA API as usual
with PASID#0. I've been reluctant to make that feature common because it
looks risky - gives full access to the kernel address space to devices
and no notification on mapping change.

> Another question is that for nested SVA, we will need to bind guest mm.
> Do you think we should try to reuse this or have it separate? I am
> working on a separate API for now.

I also think it should be separate. That bind() operation is performed
on an auxiliary domain, I guess?

>> and the corresponding unbind function:
>>
>>  int iommu_sva_unbind_device(struct iommu_sva* *handle);
>>
>> (Btw, does this need to return and int? Can unbinding fail?).
>>
>> With that in place we can implement and extentable API base on the
>> handle:
>>
>>  int iommu_sva_get_pasid(struct iommu_sva *handle);
> If multiple bind to the same mm gets multiple handles, this API should
> retrieve the same pasid for different handle?

Yes

> Just curious why making
> the handle private instead of returning the pasid value in the handle?

I don't have a strong objection against that. One reason to have an
accessor is that the PASID is freed on mm_exit, so until the device
driver calls unbind(), the PASID contained in the handle is stale (and
the accessor returns PASID_INVALID). But that's a bit pedantic, the
device driver should know that the handle is stale since it got notified
of it. Having an accessor might also help tracking uses of the handle in
the kernel, and make future API modifications easier.

Thanks,
Jean

>>  void iommu_sva_set_exit_handler(struct iommu_sva *handle,
>>  iommu_mm_exit_handler_t
>> mm_exit);
>>
>> I think at least the AMD IOMMU driver needs more call-backs like a
>> handler that is invoked when a fault can not be resolved. And there
>> might be others in the future, putting them all in the parameter list
>> of the bind function doesn't scale well.
>>
> 
>> Regards,
>>
>>  Joerg
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: MT76x2U crashes XHCI driver on AMD Ryzen system

2019-02-28 Thread Stanislaw Gruszka
On Thu, Feb 28, 2019 at 11:42:24AM +0100, Stanislaw Gruszka wrote:
> On Thu, Feb 28, 2019 at 10:04:12AM +0100, Stanislaw Gruszka wrote:
> > On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote:
> > > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> > > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem 
> > > > > with
> > > > > alignment.
> > > > 
> > > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> > > > handles the sg->page + sg->offset calculation fine.
> > > > 
> > > > > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > > > > by using urb->transfer_buffer instead of urb->sg make things work
> > > > > on AMD IOMMU.
> > > > 
> > > > On the other hand this points to a bug in the driver, I'll look further
> > > > if I can spot something there.
> > > 
> > > I think so too. And I have done some changes that avoid strange allocation
> > > scheme and use usb synchronous messages instead of allocating buffers
> > > with unaligned sizes. However things work ok on Intel IOMMU and 
> > > there is no documentation what are dma_map_sg() requirement versus
> > > dma_map_single() which works. I think there are some unwritten
> > > requirements and things can work on some platforms and fails on others
> > > (different IOMMUs, no-IOMMU on some ARCHes)  
> > 
> > For the record: we have another bug report with this issue:
> > https://bugzilla.kernel.org/show_bug.cgi?id=202673
> > 
> > I provided there patch that change alignment for page_frag_alloc() and
> > it did not fixed the problem. So this is not alignment issue.
> > Now I think it could be page->refcount issue ...
> 
> I looked at the map_sg() in amd_iommu.c code and one line looks suspicious
> to me, seems we can use not correctly initialized s->dma_address (should be 0,
> but I think can be non-zero if SG was reused). The code also seems do
> not do correct thing if there is more than one SG with multiple pages
> on individual segments. Something like in below patch seems to be more
> appropriate to me (not tested nor compiled).

Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().

Stanislaw
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: MT76x2U crashes XHCI driver on AMD Ryzen system

2019-02-28 Thread Stanislaw Gruszka
On Thu, Feb 28, 2019 at 10:04:12AM +0100, Stanislaw Gruszka wrote:
> On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote:
> > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> > > > alignment.
> > > 
> > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> > > handles the sg->page + sg->offset calculation fine.
> > > 
> > > > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > > > by using urb->transfer_buffer instead of urb->sg make things work
> > > > on AMD IOMMU.
> > > 
> > > On the other hand this points to a bug in the driver, I'll look further
> > > if I can spot something there.
> > 
> > I think so too. And I have done some changes that avoid strange allocation
> > scheme and use usb synchronous messages instead of allocating buffers
> > with unaligned sizes. However things work ok on Intel IOMMU and 
> > there is no documentation what are dma_map_sg() requirement versus
> > dma_map_single() which works. I think there are some unwritten
> > requirements and things can work on some platforms and fails on others
> > (different IOMMUs, no-IOMMU on some ARCHes)  
> 
> For the record: we have another bug report with this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=202673
> 
> I provided there patch that change alignment for page_frag_alloc() and
> it did not fixed the problem. So this is not alignment issue.
> Now I think it could be page->refcount issue ...

I looked at the map_sg() in amd_iommu.c code and one line looks suspicious
to me, seems we can use not correctly initialized s->dma_address (should be 0,
but I think can be non-zero if SG was reused). The code also seems do
not do correct thing if there is more than one SG with multiple pages
on individual segments. Something like in below patch seems to be more
appropriate to me (not tested nor compiled).

Stanislaw

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 34c9aa76a7bd..9c8887250b82 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2517,6 +2517,7 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
prot = dir2prot(direction);
 
/* Map all sg entries */
+   npages = 0;
for_each_sg(sglist, s, nelems, i) {
int j, pages = iommu_num_pages(sg_phys(s), s->length, 
PAGE_SIZE);
 
@@ -2524,7 +2525,7 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
unsigned long bus_addr, phys_addr;
int ret;
 
-   bus_addr  = address + s->dma_address + (j << 
PAGE_SHIFT);
+   bus_addr  = address + ((npages + j) << PAGE_SHIFT);
phys_addr = (sg_phys(s) & PAGE_MASK) + (j << 
PAGE_SHIFT);
ret = iommu_map_page(domain, bus_addr, phys_addr, 
PAGE_SIZE, prot, GFP_ATOMIC);
if (ret)
@@ -2532,6 +2533,8 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
 
mapped_pages += 1;
}
+
+   npages += mapped_pages;
}
 
/* Everything is mapped - write the right values into s->dma_address */


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V6 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode

2019-02-28 Thread Joerg Roedel
On Wed, Feb 27, 2019 at 10:54:02PM +0800, lantianyu1...@gmail.com wrote:
> Lan Tianyu (3):
>   x86/Hyper-V: Set x2apic destination mode to physical when x2apic is   
>  available
>   HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
>   MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
> scope

Okay, replaced the patches in the iommu-tree by these patches. Please
send future fixes on-top of the hyper-v branch in the iommu tree. Also
note that I am going to exclude this branch from my next pull-request in
case more issues are found.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: MT76x2U crashes XHCI driver on AMD Ryzen system

2019-02-28 Thread Stanislaw Gruszka
On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote:
> On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> > > alignment.
> > 
> > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> > handles the sg->page + sg->offset calculation fine.
> > 
> > > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > > by using urb->transfer_buffer instead of urb->sg make things work
> > > on AMD IOMMU.
> > 
> > On the other hand this points to a bug in the driver, I'll look further
> > if I can spot something there.
> 
> I think so too. And I have done some changes that avoid strange allocation
> scheme and use usb synchronous messages instead of allocating buffers
> with unaligned sizes. However things work ok on Intel IOMMU and 
> there is no documentation what are dma_map_sg() requirement versus
> dma_map_single() which works. I think there are some unwritten
> requirements and things can work on some platforms and fails on others
> (different IOMMUs, no-IOMMU on some ARCHes)  

For the record: we have another bug report with this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=202673

I provided there patch that change alignment for page_frag_alloc() and
it did not fixed the problem. So this is not alignment issue.
Now I think it could be page->refcount issue ...

Stanislaw
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu