Re: [PATCH RFC 1/1] iommu: set the default iommu-dma mode as non-strict
On 2019/2/26 20:36, Hanjun Guo wrote: > Hi Jean, > > On 2019/1/31 22:55, Jean-Philippe Brucker wrote: >> Hi, >> >> On 31/01/2019 13:52, Zhen Lei wrote: >>> Currently, many peripherals are faster than before. For example, the top >>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But >>> when iommu page-table mapping enabled, it's hard to reach the top speed >>> in strict mode, because of frequently map and unmap operations. In order >>> to keep abreast of the times, I think it's better to set non-strict as >>> default. >> >> Most users won't be aware of this relaxation and will have their system >> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred >> Invalidation in >> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf Hi Jean, In fact, we have discussed the vulnerable of deferred invalidation before upstream the non-strict patches. The attacks maybe possible because of an untrusted device or the mistake of the device driver. And we limited the VFIO to still use strict mode. As mentioned in the pdf, limit the freed memory with deferred invalidation only to be reused by the device, can mitigate the vulnerability. But it's too hard to implement it now. A compromise maybe we only apply non-strict to (1) dma_free_coherent, because the memory is controlled by DMA common module, so we can make the memory to be freed after the global invalidation in the timer handler. (2) And provide some new APIs related to iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate device drivers update the APIs if they want to improve performance. (3) Make sure that only the trusted devices and trusted drivers can apply (1) and (2). For example, the driver must be built into kernel Image. So that some high-end trusted devices use non-strict mode, and keep others still using strict mode. The drivers who want to use non-strict mode, should change to use new APIs by themselves. >> >> Why not keep the policy to secure by default, as we do for >> iommu.passthrough? And maybe add something similar to >> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a >> command-line argument or change the default config. > > Sorry for the late reply, it was Chinese new year, and we had a long > discussion > internally, we are fine to add a Kconfig but not sure OS vendors will set it > to default y. > > OS vendors seems not happy to pass a command-line argument, to be honest, > this is our motivation to enable non-strict as default. Hope OS vendors > can see this email thread, and give some input here. > > Thanks > Hanjun > > > . > -- Thanks! BestRegards ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/4] iommu/vt-d: Set context field after value initialized
Otherwise, the translation type field of a context entry for a PCI device will always be 0. All translated DMA requests will be blocked by IOMMU. As the result, the PCI devices with PCI ATS (device IOTBL) support won't work as expected. Cc: Ashok Raj Cc: Jacob Pan Suggested-by: Kevin Tian Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index abdd165a829c..c968b3c7bae0 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2056,7 +2056,6 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int agaw; context_set_domain_id(context, did); - context_set_translation_type(context, translation); if (translation != CONTEXT_TT_PASS_THROUGH) { /* @@ -2086,6 +2085,8 @@ static int domain_context_mapping_one(struct dmar_domain *domain, */ context_set_address_width(context, iommu->msagaw); } + + context_set_translation_type(context, translation); } context_set_fault_enable(context); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/4] iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm()
Intel IOMMU could be turned off with intel_iommu=off. If Intel IOMMU is off, the intel_iommu struct will not be initialized. When device drivers call intel_svm_bind_mm(), the NULL pointer reference will happen there. Add dmar_disabled check to avoid NULL pointer reference. Cc: Ashok Raj Cc: Jacob Pan Reported-by: Dave Jiang Fixes: 2f26e0a9c9860 ("iommu/vt-d: Add basic SVM PASID support") Signed-off-by: Lu Baolu --- drivers/iommu/intel-svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index c79540deaf00..3a4b09ae8561 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -234,7 +234,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ int pasid_max; int ret; - if (!iommu) + if (!iommu || dmar_disabled) return -EINVAL; if (dev_is_pci(dev)) { -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/4] iommu/vt-d: Get domain ID before clear pasid entry
After tearing down a pasid entry, the domain id is used to invalidate the translation caches. Retrieve the domain id from the pasid entry value before clearing the pasid entry. Otherwise, we will always use domain id 0. Cc: Ashok Raj Cc: Jacob Pan Signed-off-by: Liu Yi L Fixes: 6f7db75e1c469 ("iommu/vt-d: Add second level page table interface") Signed-off-by: Lu Baolu --- drivers/iommu/intel-pasid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 53fe5248d8f1..03b12d2ee213 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -466,8 +466,8 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, if (WARN_ON(!pte)) return; - intel_pasid_clear_entry(dev, pasid); did = pasid_get_domain_id(pte); + intel_pasid_clear_entry(dev, pasid); if (!ecap_coherent(iommu->ecap)) clflush_cache_range(pte, sizeof(*pte)); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/4] iommu/vt-d: Several fixes for 5.1
Hi Joerg, This includes several small fixes. All are stable kernel irrelated. Please consider it for 5.1-rc1. Best regards, Lu Baolu Lu Baolu (4): iommu/vt-d: Disable ATS support on untrusted devices iommu/vt-d: Set context field after value initialized iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm() iommu/vt-d: Get domain ID before clear pasid entry drivers/iommu/intel-iommu.c | 6 -- drivers/iommu/intel-pasid.c | 2 +- drivers/iommu/intel-svm.c | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/4] iommu/vt-d: Disable ATS support on untrusted devices
Commit fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices") disables ATS support on the devices which have been marked as untrusted. Unfortunately this is not enough to fix the DMA attack vulnerabiltiies because IOMMU driver allows translated requests as long as a device advertises the ATS capability. Hence a malicious peripheral device could use this to bypass IOMMU. This disables the ATS support on untrusted devices by clearing the internal per-device ATS mark. As the result, IOMMU driver will block any translated requests from any device marked as untrusted. Cc: Jacob Pan Cc: Mika Westerberg Suggested-by: Kevin Tian Suggested-by: Ashok Raj Fixes: fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices") Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index f8f6d46c60f4..abdd165a829c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2484,7 +2484,8 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (dev && dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(info->dev); - if (!pci_ats_disabled() && + if (!pdev->untrusted && + !pci_ats_disabled() && ecap_dev_iotlb_support(iommu->ecap) && pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS) && dmar_find_matched_atsr_unit(pdev)) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu wrote: > > Hi Borislav, > > Do you think the following patch is good at present? > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 81f9d23..9213073 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -460,7 +460,7 @@ static void __init > memblock_x86_reserve_range_setup_data(void) > # define CRASH_ADDR_LOW_MAX(512 << 20) > # define CRASH_ADDR_HIGH_MAX (512 << 20) > #else > -# define CRASH_ADDR_LOW_MAX(896UL << 20) > +# define CRASH_ADDR_LOW_MAX(1 << 32) > # define CRASH_ADDR_HIGH_MAX MAXMEM > #endif > Or patch lools like: diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3d872a5..ed0def5 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -459,7 +459,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_LOW_MAX(512 << 20) # define CRASH_ADDR_HIGH_MAX (512 << 20) #else -# define CRASH_ADDR_LOW_MAX(896UL << 20) +# define CRASH_ADDR_LOW_MAX(1 << 32) # define CRASH_ADDR_HIGH_MAX MAXMEM #endif @@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void) high ? CRASH_ADDR_HIGH_MAX : CRASH_ADDR_LOW_MAX, crash_size, CRASH_ALIGN); +#ifdef CONFIG_X86_64 + /* +* crashkernel=X reserve below 4G fails? Try MAXMEM +*/ + if (!high && !crash_base) + crash_base = memblock_find_in_range(CRASH_ALIGN, + CRASH_ADDR_HIGH_MAX, + crash_size, CRASH_ALIGN); +#endif which tries 0-4G, the fall back to 4G above > For documentation, I will send another patch to improve the description. > > Thanks, > Pingfan > > On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov wrote: > > > > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote: > > > If we move to high as default, it will allocate 160M high + 256M low. It > > > > We won't move to high by default - we will *fall* back to high if the > > default allocation fails. > > > > > To make the process less fragile maybe we can remove the 896M limitation > > > and only try <4G then go to high. > > > > Sure, the more robust for the user, the better. > > > > -- > > Regards/Gruss, > > Boris. > > > > Good mailing practices for 400: avoid top-posting and trim the reply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
Hi Borislav, Do you think the following patch is good at present? diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 81f9d23..9213073 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -460,7 +460,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_LOW_MAX(512 << 20) # define CRASH_ADDR_HIGH_MAX (512 << 20) #else -# define CRASH_ADDR_LOW_MAX(896UL << 20) +# define CRASH_ADDR_LOW_MAX(1 << 32) # define CRASH_ADDR_HIGH_MAX MAXMEM #endif For documentation, I will send another patch to improve the description. Thanks, Pingfan On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov wrote: > > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote: > > If we move to high as default, it will allocate 160M high + 256M low. It > > We won't move to high by default - we will *fall* back to high if the > default allocation fails. > > > To make the process less fragile maybe we can remove the 896M limitation > > and only try <4G then go to high. > > Sure, the more robust for the user, the better. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu: Bind process address spaces to devices
On Thu, Feb 28, 2019 at 01:15:49PM -0800, Jacob Pan wrote: > On Thu, 28 Feb 2019 15:09:50 +0100 > Joerg Roedel wrote: > > > Hi Jacob, > > > > On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote: > > > On Tue, 26 Feb 2019 12:17:43 +0100 > > > Joerg Roedel wrote: > > > > > Just trying to understand how to use this API. > > > So if we bind the same mm to two different devices, we should get > > > two different iommu_sva handle, right? > > > I think intel-svm still needs a flag argument for supervisor pasid > > > etc. Other than that, I think both interface should work for vt-d. > > > > I second Jean's question here, is supervisor pasid still needed with > > scalable mode? What is the use-case and which mm_struct will be used > > for supervisor accesses? > > > I will delegate this to Ashok. Supervisor PASID is still required for some kernel clients. Some of our IB folks had asked for it. Current implementation uses init_mm, but we know this is dangerous. Plus since the kernel has no support for mmu_notifiers for kernel memory we were not able to invalidate device tlb after memory was freed. Suppose we could just build regular page-tables much like how the map/unmap does today, but bind it with a Supervisor PASID. This way we don't open up the kimono to the device, but only open select portions on request. we haven't spent enough time on it lately, but will focus once the core pieces are completed for the baseline support for Scalable mode. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu: Bind process address spaces to devices
On Thu, 28 Feb 2019 15:09:50 +0100 Joerg Roedel wrote: > Hi Jacob, > > On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote: > > On Tue, 26 Feb 2019 12:17:43 +0100 > > Joerg Roedel wrote: > > > Just trying to understand how to use this API. > > So if we bind the same mm to two different devices, we should get > > two different iommu_sva handle, right? > > I think intel-svm still needs a flag argument for supervisor pasid > > etc. Other than that, I think both interface should work for vt-d. > > I second Jean's question here, is supervisor pasid still needed with > scalable mode? What is the use-case and which mm_struct will be used > for supervisor accesses? > I will delegate this to Ashok. > > Another question is that for nested SVA, we will need to bind guest > > mm. Do you think we should try to reuse this or have it separate? I > > am working on a separate API for now. > > I think a separate API makes more sense. It could be somehow fit into > this as well, but having it separate is cleaner. And we already have > separate API for aux-domains, so this would be just another extension > of the IOMMU-API for using PASIDs. > Agreed. > > > > int iommu_sva_get_pasid(struct iommu_sva *handle); > > If multiple bind to the same mm gets multiple handles, this API > > should retrieve the same pasid for different handle? > > It can return the same handle if we store the pasid in the mm_struct, > for example ... > > Just curious why making the handle private instead of returning the > > pasid value in the handle? > > ... which is also the reason why I prefer the accessor function, it > allows to have the pasid not in the iommu_sva handle, but to retrieve > it from somewhere else (like the mm_struct). make sense, more flexible storage and controlled access too. thanks for explaining. Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu: Bind process address spaces to devices
On Thu, 28 Feb 2019 01:10:55 + "Tian, Kevin" wrote: > > From: Jacob Pan [mailto:jacob.jun@linux.intel.com] > > Sent: Thursday, February 28, 2019 5:41 AM > > > > On Tue, 26 Feb 2019 12:17:43 +0100 > > Joerg Roedel wrote: > > > > > > > > How about a 'struct iommu_sva' with an iommu-private definition > > > that is returned by this function: > > > > > > struct iommu_sva *iommu_sva_bind_device(struct device > > > *dev, struct mm_struct *mm); > > > > > Just trying to understand how to use this API. > > So if we bind the same mm to two different devices, we should get > > two different iommu_sva handle, right? > > I think intel-svm still needs a flag argument for supervisor pasid > > etc. Other than that, I think both interface should work for vt-d. > > > > Another question is that for nested SVA, we will need to bind guest > > mm. Do you think we should try to reuse this or have it separate? I > > am working on a separate API for now. > > > > It has to be different. Host doesn't know guest mm. > > Also note that from virtualization p.o.v we just focus on 'nested > translation' in host side. The 1st level may point to guest CPU > page table (SVA), or IOVA page table. In that manner, the API > (as currently defined in your series) is purely about setting up > nested translation on VFIO assigned device. > Sounds good, will keep them separate. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu: Bind process address spaces to devices
On Thu, 28 Feb 2019 12:19:22 + Jean-Philippe Brucker wrote: > On 27/02/2019 21:41, Jacob Pan wrote: > > On Tue, 26 Feb 2019 12:17:43 +0100 > > Joerg Roedel wrote: > > > >> Hi Jean-Philippe, > >> > >> Thanks for the patch! I think this is getting close to be applied > >> after the next merge window. > >> > >> On Wed, Feb 20, 2019 at 02:27:59PM +, Jean-Philippe Brucker > >> wrote: > >>> +int iommu_sva_bind_device(struct device *dev, struct mm_struct > >>> *mm, int *pasid, > >>> + iommu_mm_exit_handler_t mm_exit, void > >>> *drvdata) > >> > >> I think we are better of with introducing a sva-bind handle which > >> can be used to extend and further configure the binding done with > >> this function. > >> > >> How about a 'struct iommu_sva' with an iommu-private definition > >> that is returned by this function: > >> > >>struct iommu_sva *iommu_sva_bind_device(struct device *dev, > >>struct mm_struct > >> *mm); > > Just trying to understand how to use this API. > > So if we bind the same mm to two different devices, we should get > > two different iommu_sva handle, right? > > Yes, the iommu_sva handle is the bond between one mm and one device, > so you will get two different handles. > > > I think intel-svm still needs a flag argument for supervisor pasid > > etc. Other than that, I think both interface should work for vt-d. > > Is supervisor PASID still needed now that we have auxiliary domains, > and now that VT-d supports nested IOVA? You could have private kernel > address spaces through auxiliary domains, or simply use DMA API as > usual with PASID#0. I've been reluctant to make that feature common > because it looks risky - gives full access to the kernel address > space to devices and no notification on mapping change. > It is still in the VT-d spec. Ashok will be able to answer this better :) > > Another question is that for nested SVA, we will need to bind guest > > mm. Do you think we should try to reuse this or have it separate? I > > am working on a separate API for now. > > I also think it should be separate. That bind() operation is performed > on an auxiliary domain, I guess? > yes the 2nd level is retrieved from aux domain for mdev, but for pdev, 2nd level comes from rid2pasid/default domain. > >> and the corresponding unbind function: > >> > >>int iommu_sva_unbind_device(struct iommu_sva* *handle); > >> > >> (Btw, does this need to return and int? Can unbinding fail?). > >> > >> With that in place we can implement and extentable API base on the > >> handle: > >> > >>int iommu_sva_get_pasid(struct iommu_sva *handle); > > If multiple bind to the same mm gets multiple handles, this API > > should retrieve the same pasid for different handle? > > Yes > > > Just curious why making > > the handle private instead of returning the pasid value in the > > handle? > > I don't have a strong objection against that. One reason to have an > accessor is that the PASID is freed on mm_exit, so until the device > driver calls unbind(), the PASID contained in the handle is stale (and > the accessor returns PASID_INVALID). But that's a bit pedantic, the > device driver should know that the handle is stale since it got > notified of it. Having an accessor might also help tracking uses of > the handle in the kernel, and make future API modifications easier. > make sense. > Thanks, > Jean > > >>void iommu_sva_set_exit_handler(struct iommu_sva *handle, > >>iommu_mm_exit_handler_t > >> mm_exit); > >> > >> I think at least the AMD IOMMU driver needs more call-backs like a > >> handler that is invoked when a fault can not be resolved. And there > >> might be others in the future, putting them all in the parameter > >> list of the bind function doesn't scale well. > >> > > > >> Regards, > >> > >>Joerg > > > > > [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu: Bind process address spaces to devices
Hi Jacob, On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote: > On Tue, 26 Feb 2019 12:17:43 +0100 > Joerg Roedel wrote: > Just trying to understand how to use this API. > So if we bind the same mm to two different devices, we should get two > different iommu_sva handle, right? > I think intel-svm still needs a flag argument for supervisor pasid etc. > Other than that, I think both interface should work for vt-d. I second Jean's question here, is supervisor pasid still needed with scalable mode? What is the use-case and which mm_struct will be used for supervisor accesses? > Another question is that for nested SVA, we will need to bind guest mm. > Do you think we should try to reuse this or have it separate? I am > working on a separate API for now. I think a separate API makes more sense. It could be somehow fit into this as well, but having it separate is cleaner. And we already have separate API for aux-domains, so this would be just another extension of the IOMMU-API for using PASIDs. > > int iommu_sva_get_pasid(struct iommu_sva *handle); > If multiple bind to the same mm gets multiple handles, this API should > retrieve the same pasid for different handle? It can return the same handle if we store the pasid in the mm_struct, for example ... > Just curious why making the handle private instead of returning the > pasid value in the handle? ... which is also the reason why I prefer the accessor function, it allows to have the pasid not in the iommu_sva handle, but to retrieve it from somewhere else (like the mm_struct). Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: MT76x2U crashes XHCI driver on AMD Ryzen system
On Thu, Feb 28, 2019 at 01:19:48PM +0100, Stanislaw Gruszka wrote: > Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages(). Yes, it is. In sg_num_pages() the offset into the IOMMU mapping is stored in s->dma_address, taking also the segment boundary mask into account. map_sg() later only adds the base-address to that. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu: Bind process address spaces to devices
On 27/02/2019 21:41, Jacob Pan wrote: > On Tue, 26 Feb 2019 12:17:43 +0100 > Joerg Roedel wrote: > >> Hi Jean-Philippe, >> >> Thanks for the patch! I think this is getting close to be applied >> after the next merge window. >> >> On Wed, Feb 20, 2019 at 02:27:59PM +, Jean-Philippe Brucker wrote: >>> +int iommu_sva_bind_device(struct device *dev, struct mm_struct >>> *mm, int *pasid, >>> + iommu_mm_exit_handler_t mm_exit, void >>> *drvdata) >> >> I think we are better of with introducing a sva-bind handle which can >> be used to extend and further configure the binding done with this >> function. >> >> How about a 'struct iommu_sva' with an iommu-private definition that >> is returned by this function: >> >> struct iommu_sva *iommu_sva_bind_device(struct device *dev, >> struct mm_struct *mm); >> > Just trying to understand how to use this API. > So if we bind the same mm to two different devices, we should get two > different iommu_sva handle, right? Yes, the iommu_sva handle is the bond between one mm and one device, so you will get two different handles. > I think intel-svm still needs a flag argument for supervisor pasid etc. > Other than that, I think both interface should work for vt-d. Is supervisor PASID still needed now that we have auxiliary domains, and now that VT-d supports nested IOVA? You could have private kernel address spaces through auxiliary domains, or simply use DMA API as usual with PASID#0. I've been reluctant to make that feature common because it looks risky - gives full access to the kernel address space to devices and no notification on mapping change. > Another question is that for nested SVA, we will need to bind guest mm. > Do you think we should try to reuse this or have it separate? I am > working on a separate API for now. I also think it should be separate. That bind() operation is performed on an auxiliary domain, I guess? >> and the corresponding unbind function: >> >> int iommu_sva_unbind_device(struct iommu_sva* *handle); >> >> (Btw, does this need to return and int? Can unbinding fail?). >> >> With that in place we can implement and extentable API base on the >> handle: >> >> int iommu_sva_get_pasid(struct iommu_sva *handle); > If multiple bind to the same mm gets multiple handles, this API should > retrieve the same pasid for different handle? Yes > Just curious why making > the handle private instead of returning the pasid value in the handle? I don't have a strong objection against that. One reason to have an accessor is that the PASID is freed on mm_exit, so until the device driver calls unbind(), the PASID contained in the handle is stale (and the accessor returns PASID_INVALID). But that's a bit pedantic, the device driver should know that the handle is stale since it got notified of it. Having an accessor might also help tracking uses of the handle in the kernel, and make future API modifications easier. Thanks, Jean >> void iommu_sva_set_exit_handler(struct iommu_sva *handle, >> iommu_mm_exit_handler_t >> mm_exit); >> >> I think at least the AMD IOMMU driver needs more call-backs like a >> handler that is invoked when a fault can not be resolved. And there >> might be others in the future, putting them all in the parameter list >> of the bind function doesn't scale well. >> > >> Regards, >> >> Joerg > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: MT76x2U crashes XHCI driver on AMD Ryzen system
On Thu, Feb 28, 2019 at 11:42:24AM +0100, Stanislaw Gruszka wrote: > On Thu, Feb 28, 2019 at 10:04:12AM +0100, Stanislaw Gruszka wrote: > > On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote: > > > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote: > > > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote: > > > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote: > > > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem > > > > > with > > > > > alignment. > > > > > > > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which > > > > handles the sg->page + sg->offset calculation fine. > > > > > > > > > Note hat issue is with dma_map_sg(), switching to dma_map_single() > > > > > by using urb->transfer_buffer instead of urb->sg make things work > > > > > on AMD IOMMU. > > > > > > > > On the other hand this points to a bug in the driver, I'll look further > > > > if I can spot something there. > > > > > > I think so too. And I have done some changes that avoid strange allocation > > > scheme and use usb synchronous messages instead of allocating buffers > > > with unaligned sizes. However things work ok on Intel IOMMU and > > > there is no documentation what are dma_map_sg() requirement versus > > > dma_map_single() which works. I think there are some unwritten > > > requirements and things can work on some platforms and fails on others > > > (different IOMMUs, no-IOMMU on some ARCHes) > > > > For the record: we have another bug report with this issue: > > https://bugzilla.kernel.org/show_bug.cgi?id=202673 > > > > I provided there patch that change alignment for page_frag_alloc() and > > it did not fixed the problem. So this is not alignment issue. > > Now I think it could be page->refcount issue ... > > I looked at the map_sg() in amd_iommu.c code and one line looks suspicious > to me, seems we can use not correctly initialized s->dma_address (should be 0, > but I think can be non-zero if SG was reused). The code also seems do > not do correct thing if there is more than one SG with multiple pages > on individual segments. Something like in below patch seems to be more > appropriate to me (not tested nor compiled). Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages(). Stanislaw ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: MT76x2U crashes XHCI driver on AMD Ryzen system
On Thu, Feb 28, 2019 at 10:04:12AM +0100, Stanislaw Gruszka wrote: > On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote: > > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote: > > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote: > > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote: > > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with > > > > alignment. > > > > > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which > > > handles the sg->page + sg->offset calculation fine. > > > > > > > Note hat issue is with dma_map_sg(), switching to dma_map_single() > > > > by using urb->transfer_buffer instead of urb->sg make things work > > > > on AMD IOMMU. > > > > > > On the other hand this points to a bug in the driver, I'll look further > > > if I can spot something there. > > > > I think so too. And I have done some changes that avoid strange allocation > > scheme and use usb synchronous messages instead of allocating buffers > > with unaligned sizes. However things work ok on Intel IOMMU and > > there is no documentation what are dma_map_sg() requirement versus > > dma_map_single() which works. I think there are some unwritten > > requirements and things can work on some platforms and fails on others > > (different IOMMUs, no-IOMMU on some ARCHes) > > For the record: we have another bug report with this issue: > https://bugzilla.kernel.org/show_bug.cgi?id=202673 > > I provided there patch that change alignment for page_frag_alloc() and > it did not fixed the problem. So this is not alignment issue. > Now I think it could be page->refcount issue ... I looked at the map_sg() in amd_iommu.c code and one line looks suspicious to me, seems we can use not correctly initialized s->dma_address (should be 0, but I think can be non-zero if SG was reused). The code also seems do not do correct thing if there is more than one SG with multiple pages on individual segments. Something like in below patch seems to be more appropriate to me (not tested nor compiled). Stanislaw diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 34c9aa76a7bd..9c8887250b82 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2517,6 +2517,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist, prot = dir2prot(direction); /* Map all sg entries */ + npages = 0; for_each_sg(sglist, s, nelems, i) { int j, pages = iommu_num_pages(sg_phys(s), s->length, PAGE_SIZE); @@ -2524,7 +2525,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist, unsigned long bus_addr, phys_addr; int ret; - bus_addr = address + s->dma_address + (j << PAGE_SHIFT); + bus_addr = address + ((npages + j) << PAGE_SHIFT); phys_addr = (sg_phys(s) & PAGE_MASK) + (j << PAGE_SHIFT); ret = iommu_map_page(domain, bus_addr, phys_addr, PAGE_SIZE, prot, GFP_ATOMIC); if (ret) @@ -2532,6 +2533,8 @@ static int map_sg(struct device *dev, struct scatterlist *sglist, mapped_pages += 1; } + + npages += mapped_pages; } /* Everything is mapped - write the right values into s->dma_address */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V6 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
On Wed, Feb 27, 2019 at 10:54:02PM +0800, lantianyu1...@gmail.com wrote: > Lan Tianyu (3): > x86/Hyper-V: Set x2apic destination mode to physical when x2apic is > available > HYPERV/IOMMU: Add Hyper-V stub IOMMU driver > MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS > scope Okay, replaced the patches in the iommu-tree by these patches. Please send future fixes on-top of the hyper-v branch in the iommu tree. Also note that I am going to exclude this branch from my next pull-request in case more issues are found. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: MT76x2U crashes XHCI driver on AMD Ryzen system
On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote: > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote: > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote: > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote: > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with > > > alignment. > > > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which > > handles the sg->page + sg->offset calculation fine. > > > > > Note hat issue is with dma_map_sg(), switching to dma_map_single() > > > by using urb->transfer_buffer instead of urb->sg make things work > > > on AMD IOMMU. > > > > On the other hand this points to a bug in the driver, I'll look further > > if I can spot something there. > > I think so too. And I have done some changes that avoid strange allocation > scheme and use usb synchronous messages instead of allocating buffers > with unaligned sizes. However things work ok on Intel IOMMU and > there is no documentation what are dma_map_sg() requirement versus > dma_map_single() which works. I think there are some unwritten > requirements and things can work on some platforms and fails on others > (different IOMMUs, no-IOMMU on some ARCHes) For the record: we have another bug report with this issue: https://bugzilla.kernel.org/show_bug.cgi?id=202673 I provided there patch that change alignment for page_frag_alloc() and it did not fixed the problem. So this is not alignment issue. Now I think it could be page->refcount issue ... Stanislaw ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu