Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote: > > I don't get why you need a rdmsr here, or why not having one would > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > > > +*/ > > > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > > > + return false; > > > > + > > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); > > How much more expensive is the wrmsr over the rdmsr? Can't we just > unconditionally write the current PASID and call it a day? The reason to check the rdmsr() is because we are using a hueristic taking GP faults. If we already setup the MSR, but we get it a second time it means the reason is something other than PASID_MSR not being set. Ideally we should use the TIF_ to track this would be cheaper, but we were told those bits aren't easy to give out. Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
Hi, Peter, On Mon, Jun 15, 2020 at 08:31:16PM +0200, Peter Zijlstra wrote: > On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote: > > > I don't get why you need a rdmsr here, or why not having one would > > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > > My concern is TIF flags are precious (only 3 slots available). Defining > > a new TIF flag may be not worth it while rdmsr() can check if PASID > > is valid in the MSR. And performance here might not be a big issue > > in #GP. > > > > But if you think using TIF flag is better, I can define a new TIF flag > > and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()). > > Then we can avoid using rdmsr() to check valid PASID in the MSR. > > WHY ?!?! What do you need a TIF flag for? We need "a way" to check if the per thread MSR has a valid PASID. If yes, no need to fix up the MSR (wrmsr()), and let other handler to handle the #GP. Otherwise, apply the heuristics and fix up the MSR and exit the #GP. The way to check the valid PASID in the MSR is rdmsr() in this series. A TIF flag will be much faster than rdmsr() and seems a sutiable way to check valid PASID status per thread. That's why it could replace rdmsr() to check PASID in the MSR. Or do you suggest to add a random new flag in struct thread_info instead of a TIF flag? Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
Hi, Peter, On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote: > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: > > > Or do you suggest to add a random new flag in struct thread_info instead > > of a TIF flag? > > Why thread_info? What's wrong with something simple like the below. It > takes a bit from the 'strictly current' flags word. > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index b62e6aaf28f0..fca830b97055 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -801,6 +801,9 @@ struct task_struct { > /* Stalled due to lack of memory */ > unsignedin_memstall:1; > #endif > +#ifdef CONFIG_PCI_PASID > + unsignedhas_valid_pasid:1; > +#endif > > unsigned long atomic_flags; /* Flags requiring atomic > access. */ > > diff --git a/kernel/fork.c b/kernel/fork.c > index 142b23645d82..10b3891be99e 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct > task_struct *orig, int node) > tsk->use_memdelay = 0; > #endif > > +#ifdef CONFIG_PCI_PASID > + tsk->has_valid_pasid = 0; > +#endif > + > #ifdef CONFIG_MEMCG > tsk->active_memcg = NULL; > #endif The PASID MSR is x86 specific although PASID is PCIe concept and per-mm. Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work. The flag should be cleared in cloned()/forked() and is only set and read in fixup() in x86 #GP for heuristic. It's not used anywhere outside of x86. That's why we think the flag should be in x86 struct thread_info instead of in generice struct task_struct. Please advice. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 26/29] docs: fix references for DMA*.txt files
On Mon, Jun 15, 2020 at 08:47:05AM +0200, Mauro Carvalho Chehab wrote: > As we moved those files to core-api, fix references to point > to their newer locations. > > Signed-off-by: Mauro Carvalho Chehab > --- > Documentation/PCI/pci.rst | 6 +++--- > Documentation/block/biodoc.rst | 2 +- > Documentation/core-api/bus-virt-phys-mapping.rst | 2 +- > Documentation/core-api/dma-api.rst | 6 +++--- > Documentation/core-api/dma-isa-lpc.rst | 2 +- > Documentation/driver-api/usb/dma.rst | 6 +++--- > Documentation/memory-barriers.txt | 6 +++--- I grabbed this Documentation/memory-barriers.txt, but can drop it if Jon takes the whole batch, in which case: Acked-by: Paul E. McKenney # from LKMM perspective > .../translations/ko_KR/memory-barriers.txt | 6 +++--- > arch/ia64/hp/common/sba_iommu.c| 12 ++-- > arch/parisc/kernel/pci-dma.c | 2 +- > arch/x86/include/asm/dma-mapping.h | 4 ++-- > arch/x86/kernel/amd_gart_64.c | 2 +- > drivers/parisc/sba_iommu.c | 14 +++--- > include/linux/dma-mapping.h| 2 +- > include/media/videobuf-dma-sg.h| 2 +- > kernel/dma/debug.c | 2 +- > 16 files changed, 38 insertions(+), 38 deletions(-) > > diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst > index 8c016d8c9862..d10d3fe604c5 100644 > --- a/Documentation/PCI/pci.rst > +++ b/Documentation/PCI/pci.rst > @@ -265,7 +265,7 @@ Set the DMA mask size > - > .. note:: > If anything below doesn't make sense, please refer to > - Documentation/DMA-API.txt. This section is just a reminder that > + :doc:`/core-api/dma-api`. This section is just a reminder that > drivers need to indicate DMA capabilities of the device and is not > an authoritative source for DMA interfaces. > > @@ -291,7 +291,7 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X > devices are > Setup shared control data > - > Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. > shared) > -memory. See Documentation/DMA-API.txt for a full description of > +memory. See :doc:`/core-api/dma-api` for a full description of > the DMA APIs. This section is just a reminder that it needs to be done > before enabling DMA on the device. > > @@ -421,7 +421,7 @@ owners if there is one. > > Then clean up "consistent" buffers which contain the control data. > > -See Documentation/DMA-API.txt for details on unmapping interfaces. > +See :doc:`/core-api/dma-api` for details on unmapping interfaces. > > > Unregister from other subsystems > diff --git a/Documentation/block/biodoc.rst b/Documentation/block/biodoc.rst > index b964796ec9c7..ba7f45d0271c 100644 > --- a/Documentation/block/biodoc.rst > +++ b/Documentation/block/biodoc.rst > @@ -196,7 +196,7 @@ a virtual address mapping (unlike the earlier scheme of > virtual address > do not have a corresponding kernel virtual address space mapping) and > low-memory pages. > > -Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion > +Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion > on PCI high mem DMA aspects and mapping of scatter gather lists, and support > for 64 bit PCI. > > diff --git a/Documentation/core-api/bus-virt-phys-mapping.rst > b/Documentation/core-api/bus-virt-phys-mapping.rst > index 4bb07c2f3e7d..c7bc99cd2e21 100644 > --- a/Documentation/core-api/bus-virt-phys-mapping.rst > +++ b/Documentation/core-api/bus-virt-phys-mapping.rst > @@ -8,7 +8,7 @@ How to access I/O mapped memory from within device drivers > > The virt_to_bus() and bus_to_virt() functions have been > superseded by the functionality provided by the PCI DMA interface > - (see Documentation/DMA-API-HOWTO.txt). They continue > + (see :doc:`/core-api/dma-api-howto`). They continue > to be documented below for historical purposes, but new code > must not use them. --davidm 00/12/12 > > diff --git a/Documentation/core-api/dma-api.rst > b/Documentation/core-api/dma-api.rst > index 2d8d2fed7317..63b4a2f20867 100644 > --- a/Documentation/core-api/dma-api.rst > +++ b/Documentation/core-api/dma-api.rst > @@ -5,7 +5,7 @@ Dynamic DMA mapping using the generic device > :Author: James E.J. Bottomley > > This document describes the DMA API. For a more gentle introduction > -of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt. > +of the API (and actual examples), see :doc:`/core-api/dma-api-howto`. > > This API is split into two pieces. Part I describes the basic API. > Part II describes extensions for supporting non-consistent memory > @@ -471,7 +471,7 @@ without the _attrs suffixes, except that they pass an >
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> On Jun 15, 2020, at 1:56 PM, Luck, Tony wrote: > > >> >> Are we planning to keep PASID live once a task has used it once or are we >> going to swap it lazily? If the latter, a percpu variable might be better. > > Current plan is "touch it once and the task owns it until exit(2)" > > Maybe someday in the future when we have data on how applications > actually use accelerators we could look at something more complex > if usage patterns look like it would be beneficial. > > So what’s the RDMSR for? Surely you have some state somewhere that says “this task has a PASID.” Can’t you just make sure that stays in sync with the MSR? Then, on #GP, if the task already has a PASID, you know the MSR is set. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote: > On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote: > > Hi, Peter, > > On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote: > > > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > > > > +/* > > > > + * Apply some heuristics to see if the #GP fault was caused by a thread > > > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like > > > > that > > > > + * is the problem, try initializing the IA32_PASID MSR. If the > > > > heuristic > > > > + * guesses incorrectly, take one more #GP fault. > > > > > > How is that going to help? Aren't we then going to run this same > > > heuristic again and again and again? > > > > The heuristic always initializes the MSR with the per mm PASID IIF the > > mm has a valid PASID but the MSR doesn't have one. This heuristic usually > > happens only once on the first #GP in a thread. > > But it doesn't guarantee the PASID is the right one. Suppose both the mm > has a PASID and the MSR has a VALID one, but the MSR isn't the mm one. > Then we NO-OP. So if the exception was due to us having the wrong PASID, > we stuck. The MSR for each thread was cleared during fork() and clone(). The PASID was cleared during mm_init(). The per-mm PASID was assigned when fist bind_mm() is called and remains the same value until process exit(). The MSR is only fixed up when the first ENQCMD is executed in a thread: bit 31 in the MSR is 0 and the PASID in the mm is non-zero. The MSR remains the same PASID value once it's fixed up until the thread exits. So the work flow ensures the PASID goes from mm's PASID to the MSR. The PASID could be unbund from the mm. In this case, iommu will generate #PF and kernel oops instead of #GP. > > > If the next #GP still comes in, the heuristic finds out the MSR already > > has a valid PASID and thus will not fixup the MSR any more. The fixup() > > returns "false" and lets others to handle the #GP. > > > > So the heuristic will be executed once (at most) and won't be executed > > again and again. > > So I get that you want to set the PASID on-demand, but I find this all > really weird code to make that happen. We could keep PASID same in all threads sychronously by propogating the MSRs when the PASID is bound to the mm via IPIs or taskworks to all threads in the process. But the code is complex and error-prone and overhead could be high: 1. The user can call driver to do binding and unbinding multiple times. The IPIs or taskworks will be sent multiple times to make sure only the last IPIs or taskworks take action. 2. Even if a thread never executes ENQCMD and thus never uses the MSR, the MSR still needs to be updated whenever bind_mm() and needs to be context switched each time. This could cause high overhead. Setting the PASID on-demand is simpler and cleaner and was recommended by Thomas. > > > > > +bool __fixup_pasid_exception(void) > > > > +{ > > > > + u64 pasid_msr; > > > > + unsigned int pasid; > > > > + > > > > + /* > > > > +* This function is called only when this #GP was triggered > > > > from user > > > > +* space. So the mm cannot be NULL. > > > > +*/ > > > > + pasid = current->mm->pasid; > > > > + /* If the mm doesn't have a valid PASID, then can't help. */ > > > > + if (invalid_pasid(pasid)) > > > > + return false; > > > > + > > > > + /* > > > > +* Since IRQ is disabled now, the current task still owns the > > > > FPU on > > > > > > That's just weird and confusing. What you want to say is that you rely > > > on the exception disabling the interrupt. > > > > I checked SDM again. You are right. #GP can be interrupted by machine check > > or other interrupts. So I cannot assume the current task still owns the FPU. > > Instead of directly rdmsr() and wrmsr(), I will add helpers that can access > > either the MSR on the processor or the PASID state in the memory. > > That's not in fact what I meant, but yes, you can take exceptions while > !IF just fine. > > > > > +* this CPU and the PASID MSR can be directly accessed. > > > > +* > > > > +* If the MSR has a valid PASID, the #GP must be for some other > > > > reason. > > > > +* > > > > +* If rdmsr() is really a performance issue, a TIF_ flag may be > > > > +* added to check if the thread has a valid PASID instead of > > > > rdmsr(). > > > > > > I don't understand any of this. Nobody except us writes to this MSR, we > > > should bloody well know what's in it. What gives? > > > > Patch 4 describes how to manage the MSR and patch 7 describes the format > > of the MSR (20-bit PASID value and bit 31 is valid bit). > > > > Are they sufficient to help? Or do you mean something else? > > I don't get why you need a rdmsr here, or why not having one would > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? My
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 11:19:21AM -0700, Raj, Ashok wrote: > On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote: > > > > I don't get why you need a rdmsr here, or why not having one would > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > > > > > + */ > > > > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > > > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > > > > + return false; > > > > > + > > > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); > > > > How much more expensive is the wrmsr over the rdmsr? Can't we just > > unconditionally write the current PASID and call it a day? > > The reason to check the rdmsr() is because we are using a hueristic taking > GP faults. If we already setup the MSR, but we get it a second time it > means the reason is something other than PASID_MSR not being set. > > Ideally we should use the TIF_ to track this would be cheaper, but we were > told those bits aren't easy to give out. Why do you need a TIF flag? Why not any other random flag in current? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote: > > I don't get why you need a rdmsr here, or why not having one would > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > My concern is TIF flags are precious (only 3 slots available). Defining > a new TIF flag may be not worth it while rdmsr() can check if PASID > is valid in the MSR. And performance here might not be a big issue > in #GP. > > But if you think using TIF flag is better, I can define a new TIF flag > and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()). > Then we can avoid using rdmsr() to check valid PASID in the MSR. WHY ?!?! What do you need a TIF flag for? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote: > Alexander > > On 6/1/20 4:01 PM, Alexander Monakov wrote: > > On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote: > > > > > > Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves > > > > the issue. This is the earliest point in amd_iommu_init_pci where the > > > > call succeeds on my laptop. > > > > > > According to your description, it should just need to be anywhere after > > > the > > > pci_enable_device() is called for the IOMMU device, isn't it? So, on your > > > system, what if we just move the init_iommu_perf_ctr() here: > > > > No, this doesn't work, as I already said in the paragraph you are responding > > to. See my last sentence in the quoted part. > > > > So the implication is init_device_table_dma together with subsequent cache > > flush is also setting up something that is necessary for counters to be > > writable. > > > > Alexander > > > > Instead of blindly moving the code around to a spot that would just work, > I am trying to understand what might be required here. In this case, > the init_device_table_dma()should not be needed. I suspect it's the IOMMU > invalidate all command that's also needed here. > > I'm also checking with the HW and BIOS team. Meanwhile, could you please give > the following change a try: Hello. Can you give any update please? Alexander > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index 5b81fd16f5fa..b07458cc1b0b 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void) > ret = iommu_init_pci(iommu); > if (ret) > break; > + iommu_flush_all_caches(iommu); > + init_iommu_perf_ctr(iommu); > } > > /* > -- > 2.17.1 > > Thanks, > Suravee > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 01:17:35PM -0700, Fenghua Yu wrote: > Hi, Peter, > > On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote: > > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: > > > > > Or do you suggest to add a random new flag in struct thread_info instead > > > of a TIF flag? > > > > Why thread_info? What's wrong with something simple like the below. It > > takes a bit from the 'strictly current' flags word. > > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index b62e6aaf28f0..fca830b97055 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -801,6 +801,9 @@ struct task_struct { > > /* Stalled due to lack of memory */ > > unsignedin_memstall:1; > > #endif > > +#ifdef CONFIG_PCI_PASID > > + unsignedhas_valid_pasid:1; > > +#endif > > > > unsigned long atomic_flags; /* Flags requiring atomic > > access. */ > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 142b23645d82..10b3891be99e 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct > > task_struct *orig, int node) > > tsk->use_memdelay = 0; > > #endif > > > > +#ifdef CONFIG_PCI_PASID > > + tsk->has_valid_pasid = 0; > > +#endif > > + > > #ifdef CONFIG_MEMCG > > tsk->active_memcg = NULL; > > #endif > > The PASID MSR is x86 specific although PASID is PCIe concept and per-mm. > Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work. > The flag should be cleared in cloned()/forked() and is only set and > read in fixup() in x86 #GP for heuristic. It's not used anywhere outside > of x86. > > That's why we think the flag should be in x86 struct thread_info instead > of in generice struct task_struct. I don't think anybody really cares, it's just one bit, we have plenty left. On x86_64 there's a u32 sized alignment hole in thread_info, also we don't use the upper 32bit of thread_info::flags, however using those would still mean you have to use atomic ops, which you really don't need. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> On Jun 15, 2020, at 1:17 PM, Fenghua Yu wrote: > > Hi, Peter, > >> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote: >>> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: >>> >>> Or do you suggest to add a random new flag in struct thread_info instead >>> of a TIF flag? >> >> Why thread_info? What's wrong with something simple like the below. It >> takes a bit from the 'strictly current' flags word. >> >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index b62e6aaf28f0..fca830b97055 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -801,6 +801,9 @@ struct task_struct { >>/* Stalled due to lack of memory */ >>unsignedin_memstall:1; >> #endif >> +#ifdef CONFIG_PCI_PASID >> +unsignedhas_valid_pasid:1; >> +#endif >> >>unsigned longatomic_flags; /* Flags requiring atomic access. >> */ >> >> diff --git a/kernel/fork.c b/kernel/fork.c >> index 142b23645d82..10b3891be99e 100644 >> --- a/kernel/fork.c >> +++ b/kernel/fork.c >> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct >> task_struct *orig, int node) >>tsk->use_memdelay = 0; >> #endif >> >> +#ifdef CONFIG_PCI_PASID >> +tsk->has_valid_pasid = 0; >> +#endif >> + >> #ifdef CONFIG_MEMCG >>tsk->active_memcg = NULL; >> #endif > > The PASID MSR is x86 specific although PASID is PCIe concept and per-mm. > Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work. > The flag should be cleared in cloned()/forked() and is only set and > read in fixup() in x86 #GP for heuristic. It's not used anywhere outside > of x86. > > That's why we think the flag should be in x86 struct thread_info instead > of in generice struct task_struct. > Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily? If the latter, a percpu variable might be better. > Please advice. > > Thanks. > > -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: > Or do you suggest to add a random new flag in struct thread_info instead > of a TIF flag? Why thread_info? What's wrong with something simple like the below. It takes a bit from the 'strictly current' flags word. diff --git a/include/linux/sched.h b/include/linux/sched.h index b62e6aaf28f0..fca830b97055 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -801,6 +801,9 @@ struct task_struct { /* Stalled due to lack of memory */ unsignedin_memstall:1; #endif +#ifdef CONFIG_PCI_PASID + unsignedhas_valid_pasid:1; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ diff --git a/kernel/fork.c b/kernel/fork.c index 142b23645d82..10b3891be99e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) tsk->use_memdelay = 0; #endif +#ifdef CONFIG_PCI_PASID + tsk->has_valid_pasid = 0; +#endif + #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; #endif ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> Are we planning to keep PASID live once a task has used it once or are we > going to swap it lazily? If the latter, a percpu variable might be better. Current plan is "touch it once and the task owns it until exit(2)" Maybe someday in the future when we have data on how applications actually use accelerators we could look at something more complex if usage patterns look like it would be beneficial. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> So what’s the RDMSR for? Surely you > have some state somewhere that says “this task has a PASID.” > Can’t you just make sure that stays in sync with the MSR? Then, on #GP, if > the task already has a PASID, you know the MSR is set. We have state that says the process ("mm") has been allocated a PASID. But not for each task. E.g. a process may clone a bunch of tasks, then one of them opens a device that needs a PASID. We did internally have patches to go hunt down all those other tasks and force a PASID onto each. But the code was big and ugly. Also maybe the wrong thing to do if those threads didn't ever need to access the device. PeterZ suggested that we can add a bit to the task structure for this purpose. Fenghua is hesitant about adding an x86 only bit there. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/4] pci: acs: Enable PCI_ACS_TB for untrusted/external-facing devices
When enabling ACS, currently the bit "translation blocking" was not getting changed at all. Set it to disable translation blocking too for all external facing or untrusted devices. This is OK because ATS is only allowed on internal devces. Signed-off-by: Rajat Jain --- drivers/pci/pci.c| 4 drivers/pci/quirks.c | 11 +++ 2 files changed, 15 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d2ff987585855..79853b52658a2 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3330,6 +3330,10 @@ static void pci_std_enable_acs(struct pci_dev *dev) /* Upstream Forwarding */ ctrl |= (cap & PCI_ACS_UF); + if (dev->external_facing || dev->untrusted) + /* Translation Blocking */ + ctrl |= (cap & PCI_ACS_TB); + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl); } diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index b341628e47527..6294adeac4049 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct pci_dev *dev) } } +/* + * Currently this quirk does the equivalent of + * PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_SV + * + * Currently missing, it also needs to do equivalent of PCI_ACS_TB, + * if dev->external_facing || dev->untrusted + */ static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev) { if (!pci_quirk_intel_pch_acs_match(dev)) @@ -4973,6 +4980,10 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct pci_dev *dev) ctrl |= (cap & PCI_ACS_CR); ctrl |= (cap & PCI_ACS_UF); + if (dev->external_facing || dev->untrusted) + /* Translation Blocking */ + ctrl |= (cap & PCI_ACS_TB); + pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl); pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n"); -- 2.27.0.290.gba653c62da-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/4] pci: Keep the ACS capability offset in device
Currently this is being looked up at a number of places. Read and store it once at bootup so that it can be used by all later. Signed-off-by: Rajat Jain --- drivers/pci/p2pdma.c | 2 +- drivers/pci/pci.c| 21 + drivers/pci/pci.h| 2 +- drivers/pci/probe.c | 2 +- drivers/pci/quirks.c | 8 include/linux/pci.h | 1 + 6 files changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index e8e444eeb1cd2..f29a48f8fa594 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -253,7 +253,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev) int pos; u16 ctrl; - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS); + pos = pdev->acs_cap; if (!pos) return 0; diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ce096272f52b1..d2ff987585855 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -51,6 +51,7 @@ EXPORT_SYMBOL(pci_pci_problems); unsigned int pci_pm_d3_delay; +static void pci_enable_acs(struct pci_dev *dev); static void pci_pme_list_scan(struct work_struct *work); static LIST_HEAD(pci_pme_list); @@ -3284,7 +3285,7 @@ static void pci_disable_acs_redir(struct pci_dev *dev) if (!pci_dev_specific_disable_acs_redir(dev)) return; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS); + pos = dev->acs_cap; if (!pos) { pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n"); return; @@ -3310,7 +3311,7 @@ static void pci_std_enable_acs(struct pci_dev *dev) u16 cap; u16 ctrl; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS); + pos = dev->acs_cap; if (!pos) return; @@ -3336,7 +3337,7 @@ static void pci_std_enable_acs(struct pci_dev *dev) * pci_enable_acs - enable ACS if hardware support it * @dev: the PCI device */ -void pci_enable_acs(struct pci_dev *dev) +static void pci_enable_acs(struct pci_dev *dev) { if (!pci_acs_enable) goto disable_acs_redir; @@ -3362,7 +3363,7 @@ static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags) int pos; u16 cap, ctrl; - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS); + pos = pdev->acs_cap; if (!pos) return false; @@ -3487,6 +3488,18 @@ bool pci_acs_path_enabled(struct pci_dev *start, return true; } +/** + * pci_acs_init - Initialize if hardware supports it + * @dev: the PCI device + */ +void pci_acs_init(struct pci_dev *dev) +{ + dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS); + + if (dev->acs_cap) + pci_enable_acs(dev); +} + /** * pci_rebar_find_pos - find position of resize ctrl reg for BAR * @pdev: PCI device diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 6d3f758671064..12fb79fbe29d3 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -532,7 +532,7 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev, return resource_alignment(res); } -void pci_enable_acs(struct pci_dev *dev); +void pci_acs_init(struct pci_dev *dev); #ifdef CONFIG_PCI_QUIRKS int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags); int pci_dev_specific_enable_acs(struct pci_dev *dev); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 2f66988cea257..6d87066a5ecc5 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2390,7 +2390,7 @@ static void pci_init_capabilities(struct pci_dev *dev) pci_ats_init(dev); /* Address Translation Services */ pci_pri_init(dev); /* Page Request Interface */ pci_pasid_init(dev);/* Process Address Space ID */ - pci_enable_acs(dev);/* Enable ACS P2P upstream forwarding */ + pci_acs_init(dev); /* Access Control Services */ pci_ptm_init(dev); /* Precision Time Measurement */ pci_aer_init(dev); /* Advanced Error Reporting */ pci_dpc_init(dev); /* Downstream Port Containment */ diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 812bfc32ecb82..b341628e47527 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4653,7 +4653,7 @@ static int pci_quirk_intel_spt_pch_acs(struct pci_dev *dev, u16 acs_flags) if (!pci_quirk_intel_spt_pch_acs_match(dev)) return -ENOTTY; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS); + pos = dev->acs_cap; if (!pos) return -ENOTTY; @@ -4961,7 +4961,7 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct pci_dev *dev) if (!pci_quirk_intel_spt_pch_acs_match(dev)) return -ENOTTY; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS); + pos =
Re: [PATCH 4/4] pci: export untrusted attribute in sysfs
On Mon, Jun 15, 2020 at 06:17:42PM -0700, Rajat Jain wrote: > This is needed to allow the userspace to determine when an untrusted > device has been added, and thus allowing it to bind the driver manually > to it, if it so wishes. This is being done as part of the approach > discussed at https://lkml.org/lkml/2020/6/9/1331 > > Signed-off-by: Rajat Jain > --- > drivers/pci/pci-sysfs.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 6d78df981d41a..574e9c613ba26 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -50,6 +50,7 @@ pci_config_attr(subsystem_device, "0x%04x\n"); > pci_config_attr(revision, "0x%02x\n"); > pci_config_attr(class, "0x%06x\n"); > pci_config_attr(irq, "%u\n"); > +pci_config_attr(untrusted, "%u\n"); > > static ssize_t broken_parity_status_show(struct device *dev, >struct device_attribute *attr, > @@ -608,6 +609,7 @@ static struct attribute *pci_dev_attrs[] = { > #endif > _attr_driver_override.attr, > _attr_ari_enabled.attr, > + _attr_untrusted.attr, > NULL, > }; You also need a Documentation/ABI/ update for this new file. thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch for-5.8 2/4] dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails
On Thu, Jun 11, 2020 at 12:20:29PM -0700, David Rientjes wrote: > If arch_dma_set_uncached() fails after memory has been decrypted, it needs > to be re-encrypted before freeing. > > Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more > general") > Cc: sta...@vger.kernel.org # 5.7 > Signed-off-by: David Rientjes Note that this can't really happen in practice as CONFIG_ARCH_HAS_DMA_SET_UNCACHED and memory encryption are mutally exclusive in pracrie. Still looks ok and useful otherwise. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 00/29] Documentation fixes
Hi Jon, That's a bunch of files I have to be applied on the top of v5.8-rc1 fixing documentation warnings. I already removed some duplicated stuff. Regards, Mauro Mauro Carvalho Chehab (29): mm: vmalloc.c: remove a kernel-doc annotation from a removed parameter net: dev: add a missing kernel-doc annotation net: netdevice.h: add a description for napi_defer_hard_irqs scripts/kernel-doc: parse __ETHTOOL_DECLARE_LINK_MODE_MASK net: pylink.h: add kernel-doc descriptions for new fields at phylink_config scripts/kernel-doc: handle function pointer prototypes fs: fs.h: fix a kernel-doc parameter description gpio: driver.h: fix kernel-doc markup kcsan: fix a kernel-doc warning rcu: fix some kernel-doc warnings fs: docs: f2fs.rst: fix a broken table dt: update a reference for reneases pcar file renamed to yaml dt: fix broken links due to txt->yaml renames dt: Fix broken references to renamed docs dt: fix reference to olpc,xo1.75-ec.txt selftests/vm/keys: fix a broken reference at protection_keys.c docs: hugetlbpage.rst: fix some warnings docs: powerpc: fix some issues at vas-api.rst docs: driver-model: remove a duplicated markup at driver.rst docs: watch_queue.rst: supress some Sphinx warnings and move to core-api docs: device-mapper: add dm-ebs.rst to an index file docs: it_IT: add two missing references docs: ABI: fix a typo when pointing to w1-generic.rst docs: fs: locking.rst: fix a broken table docs: add bus-virt-phys-mapping.txt to core-api docs: fix references for DMA*.txt files docs: dt: minor adjustments at writing-schema.rst docs: fs: proc.rst: fix a warning due to a merge conflict docs: fs: proc.rst: convert a new chapter to ReST .../ABI/testing/sysfs-driver-w1_therm | 2 +- Documentation/PCI/pci.rst | 6 +- .../admin-guide/device-mapper/index.rst | 1 + Documentation/admin-guide/mm/hugetlbpage.rst | 25 ++- Documentation/block/biodoc.rst| 2 +- .../bus-virt-phys-mapping.rst}| 2 +- Documentation/core-api/dma-api.rst| 6 +- Documentation/core-api/dma-isa-lpc.rst| 2 +- Documentation/core-api/index.rst | 2 + Documentation/{ => core-api}/watch_queue.rst | 34 ++-- .../bindings/arm/freescale/fsl,scu.txt| 2 +- .../bindings/display/bridge/sii902x.txt | 2 +- .../bindings/display/imx/fsl-imx-drm.txt | 4 +- .../devicetree/bindings/display/imx/ldb.txt | 4 +- .../display/rockchip/rockchip-drm.yaml| 2 +- .../bindings/misc/olpc,xo1.75-ec.txt | 2 +- .../bindings/net/mediatek-bluetooth.txt | 2 +- .../bindings/pinctrl/renesas,pfc-pinctrl.txt | 2 +- .../bindings/sound/audio-graph-card.txt | 2 +- .../bindings/sound/st,sti-asoc-card.txt | 2 +- .../bindings/spi/qcom,spi-geni-qcom.txt | 2 +- Documentation/devicetree/writing-schema.rst | 9 +- .../driver-api/driver-model/driver.rst| 2 - Documentation/driver-api/usb/dma.rst | 6 +- Documentation/filesystems/f2fs.rst| 150 -- Documentation/filesystems/locking.rst | 6 +- Documentation/filesystems/proc.rst| 46 +++--- Documentation/memory-barriers.txt | 6 +- Documentation/mips/ingenic-tcu.rst| 2 +- Documentation/powerpc/vas-api.rst | 23 ++- Documentation/security/keys/core.rst | 2 +- .../it_IT/process/management-style.rst| 2 + .../it_IT/process/submitting-patches.rst | 2 + .../translations/ko_KR/memory-barriers.txt| 6 +- MAINTAINERS | 8 +- arch/ia64/hp/common/sba_iommu.c | 12 +- arch/parisc/kernel/pci-dma.c | 2 +- arch/x86/include/asm/dma-mapping.h| 4 +- arch/x86/kernel/amd_gart_64.c | 2 +- drivers/parisc/sba_iommu.c| 14 +- include/linux/dma-mapping.h | 2 +- include/linux/fs.h| 2 +- include/linux/gpio/driver.h | 2 +- include/linux/kcsan-checks.h | 10 +- include/linux/netdevice.h | 2 + include/linux/phylink.h | 4 + include/linux/rculist.h | 2 +- include/linux/watch_queue.h | 2 +- include/media/videobuf-dma-sg.h | 2 +- init/Kconfig | 2 +- kernel/dma/debug.c| 2 +- kernel/watch_queue.c | 2 +- mm/vmalloc.c | 1 - net/core/dev.c| 1 + scripts/kernel-doc| 7 + tools/testing/selftests/vm/protection_keys.c | 2 +- 56 files changed, 282 insertions(+), 175 deletions(-) rename Documentation/{bus-virt-phys-mapping.txt =>
[PATCH] dma-direct: enable mmap for !CONFIG_MMU
nommu configfs can trivially map the coherent allocations to user space, as no actual page table setup is required and the kernel and the user space programs share the same address space. Fixes: 62fcee9a3bd7 ("dma-mapping: remove CONFIG_ARCH_NO_COHERENT_DMA_MMAP") Signed-off-by: Christoph Hellwig Reported-by: dillon min Reviewed-by: Vladimir Murzin Tested-by: dillon min --- kernel/dma/Kconfig | 1 + kernel/dma/direct.c | 14 -- 2 files changed, 1 insertion(+), 14 deletions(-) diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index d006668c0027d2..e0dae570a51530 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -71,6 +71,7 @@ config SWIOTLB # in the pagetables # config DMA_NONCOHERENT_MMAP + default y if !MMU bool config DMA_REMAP diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 0a4881e59aa7d6..9ec6a5c3fc578c 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -459,7 +459,6 @@ int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt, return ret; } -#ifdef CONFIG_MMU bool dma_direct_can_mmap(struct device *dev) { return dev_is_dma_coherent(dev) || @@ -485,19 +484,6 @@ int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma, return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff, user_count << PAGE_SHIFT, vma->vm_page_prot); } -#else /* CONFIG_MMU */ -bool dma_direct_can_mmap(struct device *dev) -{ - return false; -} - -int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size, - unsigned long attrs) -{ - return -ENXIO; -} -#endif /* CONFIG_MMU */ int dma_direct_supported(struct device *dev, u64 mask) { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Issue]platform/x86: iommu: System can't shutdown because iommu driver keeps checking the status of DMA_GSTS_TES
Hi Koba Ko, On 2020/6/15 11:19, Koba Ko wrote: hi All, I have a machine and there's only intel gpu. the secureboot and vt-d is enabled in BIOS. On the Ubuntu desktop, I do s2idle first and restart the machine. The machine can't restart successfully, so I need to press the power button to shutdown. I tried each of the following and the issue can't be triggered. 1. disable secure boot in BIOS. 2. intel_iommu=off. 3. intel_iomm=igfx_off. 4. nomodeset 5. i915.modeset=0. After I investigate further, find inte_iommu keeps checking the status of DMA_GSTS_TES. During the procedure of restart, the driver would disable iommu translation and check status of DMA_GSTS_TES until status of DMA_GSTS_TES is 0. If you need more information, I can provide it. Do you mind telling what platform is it? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm64 iommu groups issue
On 12/06/2020 15:30, Lorenzo Pieralisi wrote: On Mon, Feb 17, 2020 at 12:08:48PM +, John Garry wrote: Right, and even worse is that it relies on the port driver even existing at all. All this iommu group assignment should be taken outside device driver probe paths. However we could still consider device links for sync'ing the SMMU and each device probing. Yes, we should get that for DT now thanks to the of_devlink stuff, but cooking up some equivalent for IORT might be worthwhile. It doesn't solve this problem, but at least we could remove the iommu_ops check in iort_iommu_xlate(). We would need to carve out a path from pci_device_add() or even device_add() to solve all cases. Another thought that crosses my mind is that when pci_device_group() walks up to the point of ACS isolation and doesn't find an existing group, it can still infer that everything it walked past *should* be put in the same group it's then eventually going to return. Unfortunately I can't see an obvious way for it to act on that knowledge, though, since recursive iommu_probe_device() is unlikely to end well. [...] And this looks to be the reason for which current iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also. Of course, just adding a 'correct' add_device replay without the of_xlate process doesn't help at all. No wonder this looked suspiciously simpler than where the first idea left off... (on reflection, the core of this idea seems to be recycling the existing iommu_bus_init walk rather than building up a separate "waiting list", while forgetting that that wasn't the difficult part of the original idea anyway) We could still use a bus walk to add the group per iommu, but we would need an additional check to ensure the device is associated with the IOMMU. On this current code mentioned, the principle of this seems wrong to me - we call bus_for_each_device(..., add_iommu_group) for the first SMMU in the system which probes, but we attempt to add_iommu_group() for all devices on the bus, even though the SMMU for that device may yet to have probed. Yes, iommu_bus_init() is one of the places still holding a deeply-ingrained assumption that the ops go live for all IOMMU instances at once, which is what warranted the further replay in of_iommu_configure() originally. Moving that out of of_platform_device_create() to support probe deferral is where the trouble really started. I'm not too familiar with the history here, but could this be reverted now with the introduction of of_devlink stuff? Hi John, Hi Lorenzo, have we managed to reach a consensus on this thread on how to solve the issue ? No, not really. So Robin and I tried a couple of quick things previously, but they came did not come to much, as above. Asking because this thread seems stalled - I am keen on getting it fixed. I haven't spent more time on this. But from what I was hearing last time, this issue was ticketed internally for arm, so I was waiting for that to be picked up to re-engage. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch for-5.8 4/4] dma-direct: add missing set_memory_decrypted() for coherent mapping
On Thu, Jun 11, 2020 at 12:20:32PM -0700, David Rientjes wrote: > When a coherent mapping is created in dma_direct_alloc_pages(), it needs > to be decrypted if the device requires unencrypted DMA before returning. > > Fixes: 3acac065508f ("dma-mapping: merge the generic remapping helpers > into dma-direct") > Cc: sta...@vger.kernel.org # 5.5+ > Signed-off-by: David Rientjes > --- > kernel/dma/direct.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -195,6 +195,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t > size, > __builtin_return_address(0)); > if (!ret) > goto out_free_pages; > + if (force_dma_unencrypted(dev)) { > + err = set_memory_decrypted((unsigned long)ret, > +1 << get_order(size)); > + if (err) > + goto out_free_pages; > + } Note that ret is a vmalloc address here. Does set_memory_decrypted work for that case? Again this should be mostly theoretical, so I'm not too worried for now. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 26/29] docs: fix references for DMA*.txt files
As we moved those files to core-api, fix references to point to their newer locations. Signed-off-by: Mauro Carvalho Chehab --- Documentation/PCI/pci.rst | 6 +++--- Documentation/block/biodoc.rst | 2 +- Documentation/core-api/bus-virt-phys-mapping.rst | 2 +- Documentation/core-api/dma-api.rst | 6 +++--- Documentation/core-api/dma-isa-lpc.rst | 2 +- Documentation/driver-api/usb/dma.rst | 6 +++--- Documentation/memory-barriers.txt | 6 +++--- .../translations/ko_KR/memory-barriers.txt | 6 +++--- arch/ia64/hp/common/sba_iommu.c| 12 ++-- arch/parisc/kernel/pci-dma.c | 2 +- arch/x86/include/asm/dma-mapping.h | 4 ++-- arch/x86/kernel/amd_gart_64.c | 2 +- drivers/parisc/sba_iommu.c | 14 +++--- include/linux/dma-mapping.h| 2 +- include/media/videobuf-dma-sg.h| 2 +- kernel/dma/debug.c | 2 +- 16 files changed, 38 insertions(+), 38 deletions(-) diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst index 8c016d8c9862..d10d3fe604c5 100644 --- a/Documentation/PCI/pci.rst +++ b/Documentation/PCI/pci.rst @@ -265,7 +265,7 @@ Set the DMA mask size - .. note:: If anything below doesn't make sense, please refer to - Documentation/DMA-API.txt. This section is just a reminder that + :doc:`/core-api/dma-api`. This section is just a reminder that drivers need to indicate DMA capabilities of the device and is not an authoritative source for DMA interfaces. @@ -291,7 +291,7 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are Setup shared control data - Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) -memory. See Documentation/DMA-API.txt for a full description of +memory. See :doc:`/core-api/dma-api` for a full description of the DMA APIs. This section is just a reminder that it needs to be done before enabling DMA on the device. @@ -421,7 +421,7 @@ owners if there is one. Then clean up "consistent" buffers which contain the control data. -See Documentation/DMA-API.txt for details on unmapping interfaces. +See :doc:`/core-api/dma-api` for details on unmapping interfaces. Unregister from other subsystems diff --git a/Documentation/block/biodoc.rst b/Documentation/block/biodoc.rst index b964796ec9c7..ba7f45d0271c 100644 --- a/Documentation/block/biodoc.rst +++ b/Documentation/block/biodoc.rst @@ -196,7 +196,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address do not have a corresponding kernel virtual address space mapping) and low-memory pages. -Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion +Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion on PCI high mem DMA aspects and mapping of scatter gather lists, and support for 64 bit PCI. diff --git a/Documentation/core-api/bus-virt-phys-mapping.rst b/Documentation/core-api/bus-virt-phys-mapping.rst index 4bb07c2f3e7d..c7bc99cd2e21 100644 --- a/Documentation/core-api/bus-virt-phys-mapping.rst +++ b/Documentation/core-api/bus-virt-phys-mapping.rst @@ -8,7 +8,7 @@ How to access I/O mapped memory from within device drivers The virt_to_bus() and bus_to_virt() functions have been superseded by the functionality provided by the PCI DMA interface - (see Documentation/DMA-API-HOWTO.txt). They continue + (see :doc:`/core-api/dma-api-howto`). They continue to be documented below for historical purposes, but new code must not use them. --davidm 00/12/12 diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst index 2d8d2fed7317..63b4a2f20867 100644 --- a/Documentation/core-api/dma-api.rst +++ b/Documentation/core-api/dma-api.rst @@ -5,7 +5,7 @@ Dynamic DMA mapping using the generic device :Author: James E.J. Bottomley This document describes the DMA API. For a more gentle introduction -of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt. +of the API (and actual examples), see :doc:`/core-api/dma-api-howto`. This API is split into two pieces. Part I describes the basic API. Part II describes extensions for supporting non-consistent memory @@ -471,7 +471,7 @@ without the _attrs suffixes, except that they pass an optional dma_attrs. The interpretation of DMA attributes is architecture-specific, and -each attribute should be documented in Documentation/DMA-attributes.txt. +each attribute should be documented in :doc:`/core-api/dma-attributes`. If dma_attrs are 0, the semantics of each of these functions is identical to those of the corresponding function @@ -484,7 +484,7 @@ for DMA:: #include /* DMA_ATTR_FOO
Re: [patch for-5.8 1/4] dma-direct: always align allocation size in dma_direct_alloc_pages()
On Thu, Jun 11, 2020 at 12:20:28PM -0700, David Rientjes wrote: > dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted() > works at page granularity. It's necessary to page align the allocation > size in dma_direct_alloc_pages() for consistent behavior. > > This also fixes an issue when arch_dma_prep_coherent() is called on an > unaligned allocation size for dma_alloc_need_uncached() when > CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED > is enabled. > > Cc: sta...@vger.kernel.org > Signed-off-by: David Rientjes > --- > kernel/dma/direct.c | 17 ++--- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -112,11 +112,12 @@ static inline bool dma_should_free_from_pool(struct > device *dev, > struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, > gfp_t gfp, unsigned long attrs) > { > - size_t alloc_size = PAGE_ALIGN(size); > int node = dev_to_node(dev); > struct page *page = NULL; > u64 phys_limit; > > + VM_BUG_ON(!PAGE_ALIGNED(size)); This really should be a WARN_ON_ONCE, but I've fixed this up before applying. I've also added a prep patch to mark __dma_direct_alloc_pages static as part of auditing for other callers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 02/15] iommu: Report domain nesting info
Hi Kevin, > From: Tian, Kevin > Sent: Monday, June 15, 2020 9:23 AM > > > From: Liu, Yi L > > Sent: Friday, June 12, 2020 5:05 PM > > > > Hi Alex, > > > > > From: Alex Williamson > > > Sent: Friday, June 12, 2020 3:30 AM > > > > > > On Thu, 11 Jun 2020 05:15:21 -0700 > > > Liu Yi L wrote: > > > > > > > IOMMUs that support nesting translation needs report the > > > > capability info to userspace, e.g. the format of first level/stage > > > > paging > structures. > > > > > > > > Cc: Kevin Tian > > > > CC: Jacob Pan > > > > Cc: Alex Williamson > > > > Cc: Eric Auger > > > > Cc: Jean-Philippe Brucker > > > > Cc: Joerg Roedel > > > > Cc: Lu Baolu > > > > Signed-off-by: Liu Yi L > > > > Signed-off-by: Jacob Pan > > > > --- > > > > @Jean, Eric: as nesting was introduced for ARM, but looks like no > > > > actual user of it. right? So I'm wondering if we can reuse > > > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your > > opinions? > > > > > > > > include/linux/iommu.h | 1 + > > > > include/uapi/linux/iommu.h | 34 > > ++ > > > > 2 files changed, 35 insertions(+) > > > > > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index > > > > 78a26ae..f6e4b49 100644 > > > > --- a/include/linux/iommu.h > > > > +++ b/include/linux/iommu.h > > > > @@ -126,6 +126,7 @@ enum iommu_attr { > > > > DOMAIN_ATTR_FSL_PAMUV1, > > > > DOMAIN_ATTR_NESTING,/* two stages of translation */ > > > > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, > > > > + DOMAIN_ATTR_NESTING_INFO, > > > > DOMAIN_ATTR_MAX, > > > > }; > > > > > > > > diff --git a/include/uapi/linux/iommu.h > > > > b/include/uapi/linux/iommu.h index 303f148..02eac73 100644 > > > > --- a/include/uapi/linux/iommu.h > > > > +++ b/include/uapi/linux/iommu.h > > > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data { > > > > }; > > > > }; > > > > > > > > +struct iommu_nesting_info { > > > > + __u32 size; > > > > + __u32 format; > > > > + __u32 features; > > > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > > > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1) > > > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2) > > > > + __u32 flags; > > > > + __u8data[]; > > > > +}; > > > > + > > > > +/* > > > > + * @flags: VT-d specific flags. Currently reserved for future > > > > + * extension. > > > > + * @addr_width:The output addr width of first level/stage > translation > > > > + * @pasid_bits:Maximum supported PASID bits, 0 represents no > > PASID > > > > + * support. > > > > + * @cap_reg: Describe basic capabilities as defined in VT-d > > capability > > > > + * register. > > > > + * @cap_mask: Mark valid capability bits in @cap_reg. > > > > + * @ecap_reg: Describe the extended capabilities as defined in VT-d > > > > + * extended capability register. > > > > + * @ecap_mask: Mark the valid capability bits in @ecap_reg. > > > > > > Please explain this a little further, why do we need to tell > > > userspace about cap/ecap register bits that aren't valid through this > > > interface? > > > Thanks, > > > > we only want to tell userspace about the bits marked in the cap/ecap_mask. > > cap/ecap_mask is kind of white-list of the cap/ecap register. > > userspace should only care about the bits in the white-list, for other > > bits, it should ignore. > > > > Regards, > > Yi Liu > > For invalid bits if kernel just clears them then do we still need additional > mask bits > to explicitly mark them out? I guess this might be the point that Alex > asked... For invalid bits, kernel will clear them. But I think the mask bits is still necessary. The mask bits tells user space the bits related to nesting. Without it, user space may have no idea about it. Maybe talk about QEMU usage of the cap/ecap bits would help. QEMU vIOMMU decides cap/ecap bits according to QEMU cmdline. But not all of them are compatible with hardware support. Especially, vIOMMU built on nesting. So needs to sync the cap/ecap bits with host side. Based on the mask bits, QEMU can compare the cap/ecap bits configured by QEMU cmdline with the cap/ecap bits reported by this interface. This comparation is limited to the nesting related bits in cap/ecap, the other bits are not included and can use the configuration by QEMU cmdline. The link below show the current Intel vIOMMU usage on the cap/ecap bits. For each assigned device, vIOMMU will compare the nesting related bits in cap/ecap and mask out the bits which hardware doesn't support. After the machine is intilized, the vIOMMU cap/ecap bits are determined. If user hot-plug devices to VM, vIOMMU will fail it if the hardware cap/ecap bits behind hot-plug device are not compatible with determined vIOMMU cap/ecap bits. https://www.spinics.net/lists/kvm/msg218294.html Regards, Yi Liu > > > > > Alex > > > > > > > > > > +
[PATCH 2/4] pci: set "untrusted" flag for truly external devices only
The "ExternalFacing" devices (root ports) are still internal devices that sit on the internal system fabric and thus trusted. Currently they were being marked untrusted - likely as an unintended border case. This patch uses the platform flag to identify the external facing devices and then use it to mark any downstream devices as "untrusted". The external-facing devices themselves are left as "trusted". This was discussed here: https://lkml.org/lkml/2020/6/10/1049 Signed-off-by: Rajat Jain --- drivers/iommu/intel/iommu.c | 2 +- drivers/pci/of.c| 2 +- drivers/pci/pci-acpi.c | 13 +++-- drivers/pci/probe.c | 2 +- include/linux/pci.h | 8 5 files changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9129663a7406b..1256ca89fb519 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4735,7 +4735,7 @@ static inline bool has_untrusted_dev(void) struct pci_dev *pdev = NULL; for_each_pci_dev(pdev) - if (pdev->untrusted) + if (pdev->untrusted || pdev->external_facing) return true; return false; diff --git a/drivers/pci/of.c b/drivers/pci/of.c index 27839cd2459f6..22727fc9558df 100644 --- a/drivers/pci/of.c +++ b/drivers/pci/of.c @@ -42,7 +42,7 @@ void pci_set_bus_of_node(struct pci_bus *bus) } else { node = of_node_get(bus->self->dev.of_node); if (node && of_property_read_bool(node, "external-facing")) - bus->self->untrusted = true; + bus->self->external_facing = true; } bus->dev.of_node = node; diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index 7224b1e5f2a83..492c07805caf8 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -1213,22 +1213,23 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev, ACPI_FREE(obj); } -static void pci_acpi_set_untrusted(struct pci_dev *dev) +static void pci_acpi_set_external_facing(struct pci_dev *dev) { u8 val; - if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) + if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT && + pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM) return; if (device_property_read_u8(>dev, "ExternalFacingPort", )) return; /* -* These root ports expose PCIe (including DMA) outside of the -* system so make sure we treat them and everything behind as +* These root/down ports expose PCIe (including DMA) outside of the +* system so make sure we treat everything behind them as * untrusted. */ if (val) - dev->untrusted = 1; + dev->external_facing = 1; } static void pci_acpi_setup(struct device *dev) @@ -1240,7 +1241,7 @@ static void pci_acpi_setup(struct device *dev) return; pci_acpi_optimize_delay(pci_dev, adev->handle); - pci_acpi_set_untrusted(pci_dev); + pci_acpi_set_external_facing(pci_dev); pci_acpi_add_edr_notifier(pci_dev); pci_acpi_add_pm_notifier(adev, pci_dev); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 6d87066a5ecc5..8c40c00413e74 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1552,7 +1552,7 @@ static void set_pcie_untrusted(struct pci_dev *dev) * untrusted as well. */ parent = pci_upstream_bridge(dev); - if (parent && parent->untrusted) + if (parent && (parent->untrusted || parent->external_facing)) dev->untrusted = true; } diff --git a/include/linux/pci.h b/include/linux/pci.h index a26be5332bba6..fe1bc603fda40 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -432,6 +432,14 @@ struct pci_dev { * mappings to make sure they cannot access arbitrary memory. */ unsigned intuntrusted:1; + /* +* Devices are marked as external-facing using info from platform +* (ACPI / devicetree). An external-facing device is still an internal +* trusted device, but it faces external untrusted devices. Thus any +* devices enumerated downstream an external-facing device is marked +* as untrusted. +*/ + unsigned intexternal_facing:1; unsigned intbroken_intx_masking:1; /* INTx masking can't be used */ unsigned intio_window_1k:1; /* Intel bridge 1K I/O windows */ unsigned intirq_managed:1; -- 2.27.0.290.gba653c62da-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/4] pci: export untrusted attribute in sysfs
This is needed to allow the userspace to determine when an untrusted device has been added, and thus allowing it to bind the driver manually to it, if it so wishes. This is being done as part of the approach discussed at https://lkml.org/lkml/2020/6/9/1331 Signed-off-by: Rajat Jain --- drivers/pci/pci-sysfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 6d78df981d41a..574e9c613ba26 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -50,6 +50,7 @@ pci_config_attr(subsystem_device, "0x%04x\n"); pci_config_attr(revision, "0x%02x\n"); pci_config_attr(class, "0x%06x\n"); pci_config_attr(irq, "%u\n"); +pci_config_attr(untrusted, "%u\n"); static ssize_t broken_parity_status_show(struct device *dev, struct device_attribute *attr, @@ -608,6 +609,7 @@ static struct attribute *pci_dev_attrs[] = { #endif _attr_driver_override.attr, _attr_ari_enabled.attr, + _attr_untrusted.attr, NULL, }; -- 2.27.0.290.gba653c62da-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 02/15] iommu: Report domain nesting info
> From: Liu, Yi L > Sent: Monday, June 15, 2020 2:05 PM > > Hi Kevin, > > > From: Tian, Kevin > > Sent: Monday, June 15, 2020 9:23 AM > > > > > From: Liu, Yi L > > > Sent: Friday, June 12, 2020 5:05 PM > > > > > > Hi Alex, > > > > > > > From: Alex Williamson > > > > Sent: Friday, June 12, 2020 3:30 AM > > > > > > > > On Thu, 11 Jun 2020 05:15:21 -0700 > > > > Liu Yi L wrote: > > > > > > > > > IOMMUs that support nesting translation needs report the > > > > > capability info to userspace, e.g. the format of first level/stage > > > > > paging > > structures. > > > > > > > > > > Cc: Kevin Tian > > > > > CC: Jacob Pan > > > > > Cc: Alex Williamson > > > > > Cc: Eric Auger > > > > > Cc: Jean-Philippe Brucker > > > > > Cc: Joerg Roedel > > > > > Cc: Lu Baolu > > > > > Signed-off-by: Liu Yi L > > > > > Signed-off-by: Jacob Pan > > > > > --- > > > > > @Jean, Eric: as nesting was introduced for ARM, but looks like no > > > > > actual user of it. right? So I'm wondering if we can reuse > > > > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your > > > opinions? > > > > > > > > > > include/linux/iommu.h | 1 + > > > > > include/uapi/linux/iommu.h | 34 > > > ++ > > > > > 2 files changed, 35 insertions(+) > > > > > > > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index > > > > > 78a26ae..f6e4b49 100644 > > > > > --- a/include/linux/iommu.h > > > > > +++ b/include/linux/iommu.h > > > > > @@ -126,6 +126,7 @@ enum iommu_attr { > > > > > DOMAIN_ATTR_FSL_PAMUV1, > > > > > DOMAIN_ATTR_NESTING,/* two stages of translation */ > > > > > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, > > > > > + DOMAIN_ATTR_NESTING_INFO, > > > > > DOMAIN_ATTR_MAX, > > > > > }; > > > > > > > > > > diff --git a/include/uapi/linux/iommu.h > > > > > b/include/uapi/linux/iommu.h index 303f148..02eac73 100644 > > > > > --- a/include/uapi/linux/iommu.h > > > > > +++ b/include/uapi/linux/iommu.h > > > > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data { > > > > > }; > > > > > }; > > > > > > > > > > +struct iommu_nesting_info { > > > > > + __u32 size; > > > > > + __u32 format; > > > > > + __u32 features; > > > > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > > > > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL(1 << 1) > > > > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << > 2) > > > > > + __u32 flags; > > > > > + __u8data[]; > > > > > +}; > > > > > + > > > > > +/* > > > > > + * @flags: VT-d specific flags. Currently reserved for future > > > > > + * extension. > > > > > + * @addr_width: The output addr width of first level/stage > > translation > > > > > + * @pasid_bits: Maximum supported PASID bits, 0 represents > no > > > PASID > > > > > + * support. > > > > > + * @cap_reg: Describe basic capabilities as defined in VT-d > > > capability > > > > > + * register. > > > > > + * @cap_mask:Mark valid capability bits in @cap_reg. > > > > > + * @ecap_reg:Describe the extended capabilities as defined > > > > > in VT-d > > > > > + * extended capability register. > > > > > + * @ecap_mask: Mark the valid capability bits in @ecap_reg. > > > > > > > > Please explain this a little further, why do we need to tell > > > > userspace about cap/ecap register bits that aren't valid through this > interface? > > > > Thanks, > > > > > > we only want to tell userspace about the bits marked in the > cap/ecap_mask. > > > cap/ecap_mask is kind of white-list of the cap/ecap register. > > > userspace should only care about the bits in the white-list, for other > > > bits, it should ignore. > > > > > > Regards, > > > Yi Liu > > > > For invalid bits if kernel just clears them then do we still need additional > mask bits > > to explicitly mark them out? I guess this might be the point that Alex > > asked... > > For invalid bits, kernel will clear them. But I think the mask bits is > still necessary. The mask bits tells user space the bits related to > nesting. Without it, user space may have no idea about it. userspace should know which bit is related to nesting and then should check that bit explicitly... > > Maybe talk about QEMU usage of the cap/ecap bits would help. QEMU > vIOMMU > decides cap/ecap bits according to QEMU cmdline. But not all of them are > compatible with hardware support. Especially, vIOMMU built on nesting. > So needs to sync the cap/ecap bits with host side. Based on the mask > bits, QEMU can compare the cap/ecap bits configured by QEMU cmdline with > the cap/ecap bits reported by this interface. This comparation is limited > to the nesting related bits in cap/ecap, the other bits are not included > and can use the configuration by QEMU cmdline. I didn't get this explanation. Based on patch [15/15], nesting capabilities are defined as: +/* Nesting Support Capability Alignment */ +#define
Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU
On Sat, Jun 13, 2020 at 10:30:56PM +0800, Zhangfei Gao wrote: > On 2020/6/11 下午9:44, Bjorn Helgaas wrote: > > +++ b/drivers/iommu/iommu.c > > > > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device > > > > > > > > > > *dev, struct > > > > > > > > > > fwnode_handle *iommu_fwnode, > > > > > > > > > > fwspec->iommu_fwnode = iommu_fwnode; > > > > > > > > > > fwspec->ops = ops; > > > > > > > > > > dev_iommu_fwspec_set(dev, fwspec); > > > > > > > > > > + > > > > > > > > > > + if (dev_is_pci(dev)) > > > > > > > > > > + pci_fixup_device(pci_fixup_final, > > > > > > > > > > to_pci_dev(dev)); > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > Then pci_fixup_final will be called twice, the first in > > > > > > > > > > pci_bus_add_device. > > > > > > > > > > Here in iommu_fwspec_init is the second time, specifically > > > > > > > > > > for iommu_fwspec. > > > > > > > > > > Will send this when 5.8-rc1 is open. > > > > > > > > > Wait, this whole fixup approach seems wrong to me. No matter > > > > > > > > > how you > > > > > > > > > do the fixup, it's still a fixup, which means it requires > > > > > > > > > ongoing > > > > > > > > > maintenance. Surely we don't want to have to add the > > > > > > > > > Vendor/Device ID > > > > > > > > > for every new AMBA device that comes along, do we? > > > > > > > > > > > > > > > > > Here the fake pci device has standard PCI cfg space, but > > > > > > > > physical > > > > > > > > implementation is base on AMBA > > > > > > > > They can provide pasid feature. > > > > > > > > However, > > > > > > > > 1, does not support tlp since they are not real pci devices. > > > > > > > > 2. does not support pri, instead support stall (provided by > > > > > > > > smmu) > > > > > > > > And stall is not a pci feature, so it is not described in > > > > > > > > struct pci_dev, > > > > > > > > but in struct iommu_fwspec. > > > > > > > > So we use this fixup to tell pci system that the devices can > > > > > > > > support stall, > > > > > > > > and hereby support pasid. > > > > > > > This did not answer my question. Are you proposing that we > > > > > > > update a > > > > > > > quirk every time a new AMBA device is released? I don't think > > > > > > > that > > > > > > > would be a good model. > > > > > > Yes, you are right, but we do not have any better idea yet. > > > > > > Currently we have three fake pci devices, which support stall and > > > > > > pasid. > > > > > > We have to let pci system know the device can support pasid, > > > > > > because of > > > > > > stall feature, though not support pri. > > > > > > Do you have any other ideas? > > > > > It sounds like the best way would be to allocate a PCI capability for > > > > > it, so > > > > > detection can be done through config space, at least in future > > > > > devices, > > > > > or possibly after a firmware update if the config space in your system > > > > > is controlled by firmware somewhere. Once there is a proper mechanism > > > > > to do this, using fixups to detect the early devices that don't use > > > > > that > > > > > should be uncontroversial. I have no idea what the process or timeline > > > > > is to add new capabilities into the PCIe specification, or if this one > > > > > would be acceptable to the PCI SIG at all. > > > > That sounds like a possibility. The spec already defines a > > > > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might > > > > be a candidate. > > > Will investigate this, thanks Bjorn > > FWIW, there's also a Vendor-Specific Capability that can appear in the > > first 256 bytes of config space (the Vendor-Specific Extended > > Capability must appear in the "Extended Configuration Space" from > > 0x100-0xfff). > Unfortunately our silicon does not have either Vendor-Specific Capability or > Vendor-Specific Extended Capability. > > Studied commit 8531e283bee66050734fb0e89d53e85fd5ce24a4 > Looks this method requires adding member (like can_stall) to struct pci_dev, > looks difficult. The problem is that we don't want to add device IDs every time a new chip comes out. Adding one or two device IDs for silicon that's already released is not a problem as long as you have a strategy for *future* devices so they don't require a quirk. > > > > > If detection cannot be done through PCI config space, the next best > > > > > alternative is to pass auxiliary data through firmware. On DT based > > > > > machines, you can list non-hotpluggable PCIe devices and add custom > > > > > properties that could be read during device enumeration. I assume > > > > > ACPI has something similar, but I have not done that. > > > Yes, thanks Arnd > > > > ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate. I > > > > like this better than a PCI capability because the property you need > > > > to expose is not a PCI property. > > > _DSM may not workable, since it is working in runtime. > > > We need
RE: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Stefan Hajnoczi > Sent: Monday, June 15, 2020 6:02 PM > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > Intel platforms allows address space sharing between device DMA and > > applications. SVA can reduce programming complexity and enhance > security. > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > guest application address space with passthru devices. This is called > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > in the "Related series"). > > > > The high-level architecture for SVA virtualization is as below, the key > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > .-. .---. > > | vIOMMU| | Guest process CR3, FL only| > > | | '---' > > ./ > > | PASID Entry |--- PASID cache flush - > > '-' | > > | | V > > | |CR3 in GPA > > '-' > > Guest > > --| Shadow |--| > > vv v > > Host > > .-. .--. > > | pIOMMU| | Bind FL for GVA-GPA | > > | | '--' > > ./ | > > | PASID Entry | V (Nested xlate) > > '\.--. > > | | |SL for GPA-HPA, default domain| > > | | '--' > > '-' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > Hi, > Looks like an interesting feature! > > To check I understand this feature: can applications now pass virtual > addresses to devices instead of translating to IOVAs? > > If yes, can guest applications restrict the vSVA address space so the > device only has access to certain regions? > > On one hand replacing IOVA translation with virtual addresses simplifies > the application programming model, but does it give up isolation if the > device can now access all application memory? > with SVA each application is allocated with a unique PASID to tag its virtual address space. The device that claims SVA support must guarantee that one application can only program the device to access its own virtual address space (i.e. all DMAs triggered by this application are tagged with the application's PASID, and are translated by IOMMU's PASID-granular page table). So, isolation is not sacrificed in SVA. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
Hi, Baolu, On Sat, Jun 13, 2020 at 08:17:40PM +0800, Lu Baolu wrote: > Hi Fenghua, > > On 2020/6/13 8:41, Fenghua Yu wrote: > >+implement implement fairness or ensure forward progress can be made. > > Repeated "implement". Will fix this. > >+For example, the Intel Data Streaming Accelerator (DSA) uses > >+intel_svm_bind_mm(), which will do the following. > > The Intel SVM APIs have been deprecated. Drivers should use > iommu_sva_bind_device() instead. Please also update other places in > this document. Will fix this. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > @@ -447,6 +458,18 @@ dotraplinkage void do_general_protection(struct pt_regs > *regs, long error_code) > int ret; > > RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); > + > + /* > + * Perform the check for a user mode PASID exception before enable > + * interrupts. Doing this here ensures that the PASID MSR can be simply > + * accessed because the contents are known to be still associated > + * with the current process. > + */ > + if (user_mode(regs) && fixup_pasid_exception()) { > + cond_local_irq_enable(regs); > + return; OK, so we're done with the exception, lets enable interrupts? > + } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 00/12] x86: tag application address space for devices
On Fri, Jun 12, 2020 at 05:41:21PM -0700, Fenghua Yu wrote: > This series only provides simple and basic support for ENQCMD and the MSR: > 1. Clean up type definitions (patch 1-3). These patches can be in a >separate series. >- Define "pasid" as "unsigned int" consistently (patch 1 and 2). >- Define "flags" as "unsigned int" > 2. Explain different various technical terms used in the series (patch 4). > 3. Enumerate support for ENQCMD in the processor (patch 5). > 4. Handle FPU PASID state and the MSR during context switch (patches 6-7). > 5. Define "pasid" in mm_struct (patch 8). > 5. Clear PASID state for new mm and forked and cloned thread (patch 9-10). > 6. Allocate and free PASID for a process (patch 11). > 7. Fix up the PASID MSR in #GP handler when one thread in a process >executes ENQCMD for the first time (patches 12). If this is per mm, should not switch_mm() update the MSR ? I'm not seeing that, nor do I see it explained why not. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > +/* > + * Apply some heuristics to see if the #GP fault was caused by a thread > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic > + * guesses incorrectly, take one more #GP fault. How is that going to help? Aren't we then going to run this same heuristic again and again and again? > + */ > +bool __fixup_pasid_exception(void) > +{ > + u64 pasid_msr; > + unsigned int pasid; > + > + /* > + * This function is called only when this #GP was triggered from user > + * space. So the mm cannot be NULL. > + */ > + pasid = current->mm->pasid; > + /* If the mm doesn't have a valid PASID, then can't help. */ > + if (invalid_pasid(pasid)) > + return false; > + > + /* > + * Since IRQ is disabled now, the current task still owns the FPU on That's just weird and confusing. What you want to say is that you rely on the exception disabling the interrupt. > + * this CPU and the PASID MSR can be directly accessed. > + * > + * If the MSR has a valid PASID, the #GP must be for some other reason. > + * > + * If rdmsr() is really a performance issue, a TIF_ flag may be > + * added to check if the thread has a valid PASID instead of rdmsr(). I don't understand any of this. Nobody except us writes to this MSR, we should bloody well know what's in it. What gives? > + */ > + rdmsrl(MSR_IA32_PASID, pasid_msr); > + if (pasid_msr & MSR_IA32_PASID_VALID) > + return false; > + > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); > + > + return true; > +} > -- > 2.19.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Stefan Hajnoczi > Sent: Monday, June 15, 2020 6:02 PM > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > Intel platforms allows address space sharing between device DMA and > > applications. SVA can reduce programming complexity and enhance security. > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > guest application address space with passthru devices. This is called > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > in the "Related series"). > > > > The high-level architecture for SVA virtualization is as below, the key > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > .-. .---. > > | vIOMMU| | Guest process CR3, FL only| > > | | '---' > > ./ > > | PASID Entry |--- PASID cache flush - > > '-' | > > | | V > > | |CR3 in GPA > > '-' > > Guest > > --| Shadow |--| > > vv v > > Host > > .-. .--. > > | pIOMMU| | Bind FL for GVA-GPA | > > | | '--' > > ./ | > > | PASID Entry | V (Nested xlate) > > '\.--. > > | | |SL for GPA-HPA, default domain| > > | | '--' > > '-' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > Hi, > Looks like an interesting feature! thanks for the interest. Stefan :-) > To check I understand this feature: can applications now pass virtual > addresses to devices instead of translating to IOVAs? yes, application could pass virtual addresses to device directly. As long as the virtual address is mapped in cpu page table, then IOMMU would get it translated to physical address. > If yes, can guest applications restrict the vSVA address space so the > device only has access to certain regions? do you mean restrict the access of certain virtual address regions of guest application ? or certain guest memory? :-) > On one hand replacing IOVA translation with virtual addresses simplifies > the application programming model, but does it give up isolation if the > device can now access all application memory? yeah, you are right, SVA simplifies application programming model. And today, we do allow access all application memory by SVA. this is also another benefit of SVA. e.g. say an accelerator gets a copy of data from a buffer written by cpu. If there is some other data which is directed by a pointer (a virtual address) within the data got from memory, accelerator could do another DMA to fetch it without cpu's involvement. Regards, Yi Liu > Thanks, > Stefan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[Issue]platform/x86: iommu: System can't shutdown because iommu driver keeps checking the status of DMA_GSTS_TES
hi All, I have a machine and there's only intel gpu. the secureboot and vt-d is enabled in BIOS. On the Ubuntu desktop, I do s2idle first and restart the machine. The machine can't restart successfully, so I need to press the power button to shutdown. I tried each of the following and the issue can't be triggered. 1. disable secure boot in BIOS. 2. intel_iommu=off. 3. intel_iomm=igfx_off. 4. nomodeset 5. i915.modeset=0. After I investigate further, find inte_iommu keeps checking the status of DMA_GSTS_TES. During the procedure of restart, the driver would disable iommu translation and check status of DMA_GSTS_TES until status of DMA_GSTS_TES is 0. If you need more information, I can provide it. Thanks *Koba Ko* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 14/15] vfio: Document dual stage control
On Thu, Jun 11, 2020 at 05:15:33AM -0700, Liu Yi L wrote: > From: Eric Auger > > The VFIO API was enhanced to support nested stage control: a bunch of > new iotcls and usage guideline. > > Let's document the process to follow to set up nested mode. > > Cc: Kevin Tian > CC: Jacob Pan > Cc: Alex Williamson > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Cc: Joerg Roedel > Cc: Lu Baolu > Signed-off-by: Eric Auger > Signed-off-by: Liu Yi L > --- > v1 -> v2: > *) new in v2, compared with Eric's original version, pasid table bind >and fault reporting is removed as this series doesn't cover them. >Original version from Eric. >https://lkml.org/lkml/2020/3/20/700 > > Documentation/driver-api/vfio.rst | 64 > +++ > 1 file changed, 64 insertions(+) > > diff --git a/Documentation/driver-api/vfio.rst > b/Documentation/driver-api/vfio.rst > index f1a4d3c..06224bd 100644 > --- a/Documentation/driver-api/vfio.rst > +++ b/Documentation/driver-api/vfio.rst > @@ -239,6 +239,70 @@ group and can access them as follows:: > /* Gratuitous device reset and go... */ > ioctl(device, VFIO_DEVICE_RESET); > > +IOMMU Dual Stage Control > + > + > +Some IOMMUs support 2 stages/levels of translation. Stage corresponds to > +the ARM terminology while level corresponds to Intel's VTD terminology. > +In the following text we use either without distinction. > + > +This is useful when the guest is exposed with a virtual IOMMU and some > +devices are assigned to the guest through VFIO. Then the guest OS can use > +stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for > +VM isolation (GPA -> HPA). > + > +Under dual stage translation, the guest gets ownership of the stage 1 page > +tables and also owns stage 1 configuration structures. The hypervisor owns > +the root configuration structure (for security reason), including stage 2 > +configuration. This works as long configuration structures and page table s/as long configuration/as long as configuration/ > +format are compatible between the virtual IOMMU and the physical IOMMU. s/format/formats/ > + > +Assuming the HW supports it, this nested mode is selected by choosing the > +VFIO_TYPE1_NESTING_IOMMU type through: > + > +ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU); > + > +This forces the hypervisor to use the stage 2, leaving stage 1 available > +for guest usage. The guest stage 1 format depends on IOMMU vendor, and > +it is the same with the nesting configuration method. User space should > +check the format and configuration method after setting nesting type by > +using: > + > +ioctl(container->fd, VFIO_IOMMU_GET_INFO, _info); > + > +Details can be found in Documentation/userspace-api/iommu.rst. For Intel > +VT-d, each stage 1 page table is bound to host by: > + > +nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL; > +memcpy(_op->data, _data, sizeof(bind_data)); > +ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op); > + > +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA. > +GVA->GPA page tables are available when PASID (Process Address Space ID) > +is exposed to guest. e.g. guest with PASID-capable devices assigned. For > +such page table binding, the bind_data should include PASID info, which > +is allocated by guest itself or by host. This depends on hardware vendor > +e.g. Intel VT-d requires to allocate PASID from host. This requirement is > +available by VFIO_IOMMU_GET_INFO. User space could allocate PASID from > +host by: > + > +req.flags = VFIO_IOMMU_ALLOC_PASID; > +ioctl(container, VFIO_IOMMU_PASID_REQUEST, ); It is not clear how the userspace application determines whether PASIDs must be allocated from the host via VFIO_IOMMU_PASID_REQUEST or if the guest itself can allocate PASIDs. The text mentions VFIO_IOMMU_GET_INFO but what exactly should the userspace application check? signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[Issue]platform/x86: iommu: System can't shutdown because iommu driver keeps checking the status of DMA_GSTS_TES
hi All, I have a machine and there's only intel gpu. the secureboot and vt-d is enabled in BIOS. On the Ubuntu desktop, I do s2idle first and restart the machine. The machine can't restart successfully, so I need to press the power button to shutdown. I tried each of the following and the issue can't be triggered. 1. disable secure boot in BIOS. 2. intel_iommu=off. 3. intel_iomm=igfx_off. 4. nomodeset 5. i915.modeset=0. After I investigate further, find inte_iommu keeps checking the status of DMA_GSTS_TES. During the procedure of restart, the driver would disable iommu translation and check the status of DMA_GSTS_TES until status of DMA_GSTS_TES is 0. If you need more information, I can provide it. Thanks Koba Ko ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > Intel platforms allows address space sharing between device DMA and > applications. SVA can reduce programming complexity and enhance security. > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > guest application address space with passthru devices. This is called > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > changes. For IOMMU and QEMU changes, they are in separate series (listed > in the "Related series"). > > The high-level architecture for SVA virtualization is as below, the key > design of vSVA support is to utilize the dual-stage IOMMU translation ( > also known as IOMMU nesting translation) capability in host IOMMU. > > > .-. .---. > | vIOMMU| | Guest process CR3, FL only| > | | '---' > ./ > | PASID Entry |--- PASID cache flush - > '-' | > | | V > | |CR3 in GPA > '-' > Guest > --| Shadow |--| > vv v > Host > .-. .--. > | pIOMMU| | Bind FL for GVA-GPA | > | | '--' > ./ | > | PASID Entry | V (Nested xlate) > '\.--. > | | |SL for GPA-HPA, default domain| > | | '--' > '-' > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables Hi, Looks like an interesting feature! To check I understand this feature: can applications now pass virtual addresses to devices instead of translating to IOVAs? If yes, can guest applications restrict the vSVA address space so the device only has access to certain regions? On one hand replacing IOVA translation with virtual addresses simplifies the application programming model, but does it give up isolation if the device can now access all application memory? Thanks, Stefan signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] uacce: remove uacce_vma_fault
Fix NULL pointer error if removing uacce's parent module during app's running. SIGBUS is already reported by do_page_fault, so uacce_vma_fault is not needed. If providing vma_fault, vmf->page has to be filled as well, required by __do_fault. Reported-by: Jean-Philippe Brucker Signed-off-by: Zhangfei Gao --- drivers/misc/uacce/uacce.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index 107028e..aa91f69 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -179,14 +179,6 @@ static int uacce_fops_release(struct inode *inode, struct file *filep) return 0; } -static vm_fault_t uacce_vma_fault(struct vm_fault *vmf) -{ - if (vmf->flags & (FAULT_FLAG_MKWRITE | FAULT_FLAG_WRITE)) - return VM_FAULT_SIGBUS; - - return 0; -} - static void uacce_vma_close(struct vm_area_struct *vma) { struct uacce_queue *q = vma->vm_private_data; @@ -199,7 +191,6 @@ static void uacce_vma_close(struct vm_area_struct *vma) } static const struct vm_operations_struct uacce_vm_ops = { - .fault = uacce_vma_fault, .close = uacce_vma_close, }; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote: > Hi, Peter, > On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote: > > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > > > +/* > > > + * Apply some heuristics to see if the #GP fault was caused by a thread > > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that > > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic > > > + * guesses incorrectly, take one more #GP fault. > > > > How is that going to help? Aren't we then going to run this same > > heuristic again and again and again? > > The heuristic always initializes the MSR with the per mm PASID IIF the > mm has a valid PASID but the MSR doesn't have one. This heuristic usually > happens only once on the first #GP in a thread. But it doesn't guarantee the PASID is the right one. Suppose both the mm has a PASID and the MSR has a VALID one, but the MSR isn't the mm one. Then we NO-OP. So if the exception was due to us having the wrong PASID, we stuck. > If the next #GP still comes in, the heuristic finds out the MSR already > has a valid PASID and thus will not fixup the MSR any more. The fixup() > returns "false" and lets others to handle the #GP. > > So the heuristic will be executed once (at most) and won't be executed > again and again. So I get that you want to set the PASID on-demand, but I find this all really weird code to make that happen. > > > +bool __fixup_pasid_exception(void) > > > +{ > > > + u64 pasid_msr; > > > + unsigned int pasid; > > > + > > > + /* > > > + * This function is called only when this #GP was triggered from user > > > + * space. So the mm cannot be NULL. > > > + */ > > > + pasid = current->mm->pasid; > > > + /* If the mm doesn't have a valid PASID, then can't help. */ > > > + if (invalid_pasid(pasid)) > > > + return false; > > > + > > > + /* > > > + * Since IRQ is disabled now, the current task still owns the FPU on > > > > That's just weird and confusing. What you want to say is that you rely > > on the exception disabling the interrupt. > > I checked SDM again. You are right. #GP can be interrupted by machine check > or other interrupts. So I cannot assume the current task still owns the FPU. > Instead of directly rdmsr() and wrmsr(), I will add helpers that can access > either the MSR on the processor or the PASID state in the memory. That's not in fact what I meant, but yes, you can take exceptions while !IF just fine. > > > + * this CPU and the PASID MSR can be directly accessed. > > > + * > > > + * If the MSR has a valid PASID, the #GP must be for some other reason. > > > + * > > > + * If rdmsr() is really a performance issue, a TIF_ flag may be > > > + * added to check if the thread has a valid PASID instead of rdmsr(). > > > > I don't understand any of this. Nobody except us writes to this MSR, we > > should bloody well know what's in it. What gives? > > Patch 4 describes how to manage the MSR and patch 7 describes the format > of the MSR (20-bit PASID value and bit 31 is valid bit). > > Are they sufficient to help? Or do you mean something else? I don't get why you need a rdmsr here, or why not having one would require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > + */ > > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > > + return false; > > > + > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); How much more expensive is the wrmsr over the rdmsr? Can't we just unconditionally write the current PASID and call it a day? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/tegra-smmu: Add missing locks around mapping operations
25.05.2020 22:54, Dmitry Osipenko пишет: > The mapping operations of the Tegra SMMU driver are subjected to a race > condition issues because SMMU Address Space isn't allocated and freed > atomically, while it should be. This patch makes the mapping operations > atomic, it fixes an accidentally released Host1x Address Space problem > which happens while running multiple graphics tests in parallel on > Tegra30, i.e. by having multiple threads racing with each other in the > Host1x's submission and completion code paths, performing IOVA mappings > and unmappings in parallel. > > Cc: > Signed-off-by: Dmitry Osipenko > --- > > Changelog: > > v2: - Now using mutex instead of spinlock. > > - The _locked postfix is replaced with the underscores prefix. > Hello Thierry and Joerg! Guys, are you okay with the v2 variant? Will be great if we could fix the issue ASAP since it's quite unpleasant. Thanks in advance! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Move Kconfig and Makefile bits down into amd directory
Reviewed-by: Suravee Suthikulpanit Thanks, Suravee On 6/13/20 6:11 AM, Jerry Snitselaar wrote: Move AMD Kconfig and Makefile bits down into the amd directory with the rest of the AMD specific files. Cc: Joerg Roedel Cc: Suravee Suthikulpanit Signed-off-by: Jerry Snitselaar --- drivers/iommu/Kconfig | 45 +- drivers/iommu/Makefile | 5 + drivers/iommu/amd/Kconfig | 44 + drivers/iommu/amd/Makefile | 4 4 files changed, 50 insertions(+), 48 deletions(-) create mode 100644 drivers/iommu/amd/Kconfig create mode 100644 drivers/iommu/amd/Makefile diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index b12d4ec124f6..78a8be0053b3 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -132,50 +132,7 @@ config IOMMU_PGTABLES_L2 def_bool y depends on MSM_IOMMU && MMU && SMP && CPU_DCACHE_DISABLE=n -# AMD IOMMU support -config AMD_IOMMU - bool "AMD IOMMU support" - select SWIOTLB - select PCI_MSI - select PCI_ATS - select PCI_PRI - select PCI_PASID - select IOMMU_API - select IOMMU_IOVA - select IOMMU_DMA - depends on X86_64 && PCI && ACPI - ---help--- - With this option you can enable support for AMD IOMMU hardware in - your system. An IOMMU is a hardware component which provides - remapping of DMA memory accesses from devices. With an AMD IOMMU you - can isolate the DMA memory of different devices and protect the - system from misbehaving device drivers or hardware. - - You can find out if your system has an AMD IOMMU if you look into - your BIOS for an option to enable it or if you have an IVRS ACPI - table. - -config AMD_IOMMU_V2 - tristate "AMD IOMMU Version 2 driver" - depends on AMD_IOMMU - select MMU_NOTIFIER - ---help--- - This option enables support for the AMD IOMMUv2 features of the IOMMU - hardware. Select this option if you want to use devices that support - the PCI PRI and PASID interface. - -config AMD_IOMMU_DEBUGFS - bool "Enable AMD IOMMU internals in DebugFS" - depends on AMD_IOMMU && IOMMU_DEBUGFS - ---help--- - !!!WARNING!!! !!!WARNING!!! !!!WARNING!!! !!!WARNING!!! - - DO NOT ENABLE THIS OPTION UNLESS YOU REALLY, -REALLY- KNOW WHAT YOU ARE DOING!!! - Exposes AMD IOMMU device internals in DebugFS. - - This option is -NOT- intended for production environments, and should - not generally be enabled. - +source "drivers/iommu/amd/Kconfig" source "drivers/iommu/intel/Kconfig" config IRQ_REMAP diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 71dd2f382e78..f356bc12b1c7 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -obj-y += intel/ +obj-y += amd/ intel/ obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o @@ -12,9 +12,6 @@ obj-$(CONFIG_IOASID) += ioasid.o obj-$(CONFIG_IOMMU_IOVA) += iova.o obj-$(CONFIG_OF_IOMMU)+= of_iommu.o obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o -obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o -obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o -obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig new file mode 100644 index ..1f061d91e0b8 --- /dev/null +++ b/drivers/iommu/amd/Kconfig @@ -0,0 +1,44 @@ +# SPDX-License-Identifier: GPL-2.0-only +# AMD IOMMU support +config AMD_IOMMU + bool "AMD IOMMU support" + select SWIOTLB + select PCI_MSI + select PCI_ATS + select PCI_PRI + select PCI_PASID + select IOMMU_API + select IOMMU_IOVA + select IOMMU_DMA + depends on X86_64 && PCI && ACPI + help + With this option you can enable support for AMD IOMMU hardware in + your system. An IOMMU is a hardware component which provides + remapping of DMA memory accesses from devices. With an AMD IOMMU you + can isolate the DMA memory of different devices and protect the + system from misbehaving device drivers or hardware. + + You can find out if your system has an AMD IOMMU if you look into + your BIOS for an option to enable it or if you have an IVRS ACPI + table. + +config AMD_IOMMU_V2 + tristate "AMD IOMMU Version 2 driver" + depends on AMD_IOMMU + select MMU_NOTIFIER + help + This option enables support for the AMD IOMMUv2 features of the IOMMU + hardware. Select this option if you want to use devices that support + the PCI PRI and
Re: [PATCH v2 00/12] x86: tag application address space for devices
Hi, Peter, On Mon, Jun 15, 2020 at 09:52:02AM +0200, Peter Zijlstra wrote: > On Fri, Jun 12, 2020 at 05:41:21PM -0700, Fenghua Yu wrote: > > > This series only provides simple and basic support for ENQCMD and the MSR: > > 1. Clean up type definitions (patch 1-3). These patches can be in a > >separate series. > >- Define "pasid" as "unsigned int" consistently (patch 1 and 2). > >- Define "flags" as "unsigned int" > > 2. Explain different various technical terms used in the series (patch 4). > > 3. Enumerate support for ENQCMD in the processor (patch 5). > > 4. Handle FPU PASID state and the MSR during context switch (patches 6-7). > > 5. Define "pasid" in mm_struct (patch 8). > > 5. Clear PASID state for new mm and forked and cloned thread (patch 9-10). > > 6. Allocate and free PASID for a process (patch 11). > > 7. Fix up the PASID MSR in #GP handler when one thread in a process > >executes ENQCMD for the first time (patches 12). > > If this is per mm, should not switch_mm() update the MSR ? I'm not > seeing that, nor do I see it explained why not. PASID value is per mm and all threads in a process have the same PASID value in the MSR. However, the MSR is per thread and is context switched by XSAVES/XRSTROS in patches 6-7. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
Hi, Peter, On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote: > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > > +/* > > + * Apply some heuristics to see if the #GP fault was caused by a thread > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic > > + * guesses incorrectly, take one more #GP fault. > > How is that going to help? Aren't we then going to run this same > heuristic again and again and again? The heuristic always initializes the MSR with the per mm PASID IIF the mm has a valid PASID but the MSR doesn't have one. This heuristic usually happens only once on the first #GP in a thread. If the next #GP still comes in, the heuristic finds out the MSR already has a valid PASID and thus will not fixup the MSR any more. The fixup() returns "false" and lets others to handle the #GP. So the heuristic will be executed once (at most) and won't be executed again and again. > > > + */ > > +bool __fixup_pasid_exception(void) > > +{ > > + u64 pasid_msr; > > + unsigned int pasid; > > + > > + /* > > +* This function is called only when this #GP was triggered from user > > +* space. So the mm cannot be NULL. > > +*/ > > + pasid = current->mm->pasid; > > + /* If the mm doesn't have a valid PASID, then can't help. */ > > + if (invalid_pasid(pasid)) > > + return false; > > + > > + /* > > +* Since IRQ is disabled now, the current task still owns the FPU on > > That's just weird and confusing. What you want to say is that you rely > on the exception disabling the interrupt. I checked SDM again. You are right. #GP can be interrupted by machine check or other interrupts. So I cannot assume the current task still owns the FPU. Instead of directly rdmsr() and wrmsr(), I will add helpers that can access either the MSR on the processor or the PASID state in the memory. > > > +* this CPU and the PASID MSR can be directly accessed. > > +* > > +* If the MSR has a valid PASID, the #GP must be for some other reason. > > +* > > +* If rdmsr() is really a performance issue, a TIF_ flag may be > > +* added to check if the thread has a valid PASID instead of rdmsr(). > > I don't understand any of this. Nobody except us writes to this MSR, we > should bloody well know what's in it. What gives? Patch 4 describes how to manage the MSR and patch 7 describes the format of the MSR (20-bit PASID value and bit 31 is valid bit). Are they sufficient to help? Or do you mean something else? > > +*/ > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > + return false; > > + > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); > > + > > + return true; > > +} > > -- > > 2.19.1 > > Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
>> The heuristic always initializes the MSR with the per mm PASID IIF the >> mm has a valid PASID but the MSR doesn't have one. This heuristic usually >> happens only once on the first #GP in a thread. > > But it doesn't guarantee the PASID is the right one. Suppose both the mm > has a PASID and the MSR has a VALID one, but the MSR isn't the mm one. > Then we NO-OP. So if the exception was due to us having the wrong PASID, > we stuck. ENQCMD only checks the 'valid' bit of the IA32_PASID MSR to decide whether to #GP or not. H/W has no concept of the "right" pasid value. If IA32_PASID is valid with the wrong value ... then the system is about to see some major corruption because the operations in the accelerator are not going to translate to the physical addresses for pages owned by the process that issued the ENQCMD. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu