Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
> On Aug 3, 2020, at 10:34 AM, Dave Hansen wrote: > > On 8/3/20 10:16 AM, Andy Lutomirski wrote: >> - TILE: genuinely per-thread, but it's expensive so it's >> lazy-loadable. But the lazy-load mechanism reuses #NM, and it's not >> fully disambiguated from the other use of #NM. So it sort of works, >> but it's gross. > > For those playing along at home, there's a new whitepaper out from Intel > about some new CPU features which are going to be fun: > >> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf > > Which part were you worried about? I thought it was fully disambuguated > from this: > >> When XFD causes an instruction to generate #NM, the processor loads >> the IA32_XFD_ERR MSR to identify the disabled state component(s). >> Specifically, the MSR is loaded with the logical AND of the IA32_XFD >> MSR and the bitmap corresponding to the state components required by >> the faulting instruction. >> >> Device-not-available exceptions that are not due to XFD — those >> resulting from setting CR0.TS to 1 — do not modify the IA32_XFD_ERR >> MSR. > > So if you always make sure to *clear* IA32_XFD_ERR after handing and XFD > exception, any #NM's with a clear IA32_XFD_ERR are from "legacy" > CR0.TS=1. Any bits set in IA32_XFD_ERR mean a new-style XFD exception. > > Am I missing something? I don’t think you’re missing anything, but this mechanism seems to be solidly in the category of “just barely works”, kind of like #DB and DR6, which also just barely works. And this PASID vs #GP mess is just sad. We’re trying to engineer around an issue that has no need to exist in the first place. For some reason we have two lazy-loading-fault mechanisms showing up in x86 in rapid succession, one of them is maybe 3/4-baked, and the other is totally separate and maybe 1/4 baked. If ENQCMD instead raise #NM, this would be trivial. (And it even makes sense — the error is, literally, “an instruction tried to use something in XSTATE that isn’t initialized”.). Or if someone had noticed that, kind of like PKRU, PASID didn’t really belong in XSTATE, we wouldn’t have this mess. Yes, obviously Linux can get all this stuff to work, but maybe Intel could aspire to design features that are straightforward to use well instead of merely possible to use well? ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On 8/3/20 8:12 AM, Andy Lutomirski wrote: > I could easily be convinced that the PASID fixup is so trivial and so > obviously free of misfiring in a way that causes an infinite loop that > this code is fine. But I think we first need to answer the bigger > question of why we're doing a lazy fixup in the first place. There was an (internal to Intel) implementation of this about a year ago that used smp_call_function_many() to force the MSR state into all threads of a process. I took one look at it, decided there was a 0% chance of it actually functioning and recommended we find another way. While I'm sure it could be done more efficiently, the implementation I looked at took ~200 lines of code and comments. It started to look too much like another instance of mm->cpumask for my comfort. The only other option I could think of would be an ABI where threads were required to call into the kernel at least once after creation before calling ENQCMD. All ENQCMDs would be required to be "wrapped" by code doing this syscall. Something like this: if (!thread_local(did_pasid_init)) sys_pasid_init(); // new syscall or prctl thread_local(did_pasid_init) = 1; // ENQCMD here ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On Mon, Aug 03, 2020 at 08:12:18AM -0700, Andy Lutomirski wrote: > On Mon, Aug 3, 2020 at 8:03 AM Dave Hansen wrote: > > > > On 7/31/20 4:34 PM, Andy Lutomirski wrote: > > >> Thomas suggested to provide a reason for the #GP caused by executing > > >> ENQCMD > > >> without a valid PASID value programmed. #GP error codes are 16 bits and > > >> all > > >> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The > > >> other > > >> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP > > >> when loading from the source operand, so its not fully comprehending all > > >> the reasons. Rather than special case the ENQCMD, in future Intel may > > >> choose a different fault mechanism for such cases if recovery is needed > > >> on > > >> #GP. > > > Decoding the user instruction is ugly and sets a bad architecture > > > precedent, but we already do it in #GP for UMIP. So I'm unconvinced. > > > > I'll try to do one more bit of convincing. :) > > > > In the end, we need a way to figure out if the #GP was from a known "OK" > > source that we can fix up. You're right that we could fire up the > > instruction decoder to help answer that question. But, it (also) > > doesn't easily yield a perfect answer as to the source of the #GP, it > > always involves a user copy, and it's a larger code impact than what > > we've got. > > > > I think I went and looked at fixup_umip_exception(), and compared it to > > the alternative which is essentially just these three lines of code: > > > > > + /* > > > + * If the current task already has a valid PASID in the MSR, > > > + * the #GP must be for some other reason. > > > + */ > > > + if (current->has_valid_pasid) > > > + return false; > > ...> + /* Now the current task has a valid PASID in the MSR. */ > > > + current->has_valid_pasid = 1; > > > > and *I* was convinced that instruction decoding wasn't worth it. > > > > There's a lot of stuff that fixup_umip_exception() does which we don't > > have to duplicate, but it's going to be really hard to get it anywhere > > near as compact as what we've got. > > > > I could easily be convinced that the PASID fixup is so trivial and so > obviously free of misfiring in a way that causes an infinite loop that > this code is fine. But I think we first need to answer the bigger > question of why we're doing a lazy fixup in the first place. We choose lazy fixup for 2 reasons. - If some threads were already created before the MSR is programmed, then we need to fixup those in a race free way. scheduling some task-work etc. We did do that early on, but decided it was ugly. - Not all threads need to submit ENQCMD, force feeding the MSR probably isn't even required for all. Yes the overhead isn't probably big, but might not even be required for all threads. We needed to fixup MSR in two different way. To keep the code simple, the choice was to only fixup on #GP, that eliminated the extra code we need to support case1. Cheers, Ashok ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On 8/3/20 10:16 AM, Andy Lutomirski wrote: > - TILE: genuinely per-thread, but it's expensive so it's > lazy-loadable. But the lazy-load mechanism reuses #NM, and it's not > fully disambiguated from the other use of #NM. So it sort of works, > but it's gross. For those playing along at home, there's a new whitepaper out from Intel about some new CPU features which are going to be fun: > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf Which part were you worried about? I thought it was fully disambuguated from this: > When XFD causes an instruction to generate #NM, the processor loads > the IA32_XFD_ERR MSR to identify the disabled state component(s). > Specifically, the MSR is loaded with the logical AND of the IA32_XFD > MSR and the bitmap corresponding to the state components required by > the faulting instruction. > > Device-not-available exceptions that are not due to XFD — those > resulting from setting CR0.TS to 1 — do not modify the IA32_XFD_ERR > MSR. So if you always make sure to *clear* IA32_XFD_ERR after handing and XFD exception, any #NM's with a clear IA32_XFD_ERR are from "legacy" CR0.TS=1. Any bits set in IA32_XFD_ERR mean a new-style XFD exception. Am I missing something? ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
Hi, Andy, On Fri, Jul 31, 2020 at 06:28:37PM -0700, Andy Lutomirski wrote: > On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu wrote: > > > > A #GP fault is generated when ENQCMD instruction is executed without > > a valid PASID value programmed in the current thread's PASID MSR. The > > #GP fault handler will initialize the MSR if a PASID has been allocated > > for this process. > > Let's take a step back here. Why are we trying to avoid IPIs? If you > call munmap(), you IPI other CPUs running tasks in the current mm. If > you do perf_event_open() and thus acquire RDPMC permission, you IPI > other CPUs running tasks in the current mm. If you call modify_ldt(), > you IPI other CPUs running tasks in the current mm. These events can > all happen more than once per process. > > Now we have ENQCMD. An mm can be assigned a PASID *once* in the model > that these patches support. Why not just send an IPI using > essentially identical code to the LDT sync or the CR4.PCE sync? ldt (or the other two cases) is different from ENQCMD: the PASID MSR is per-task and is supported by xsaves. The per-task PASID MSR needs to updated to ALL tasks. That means IPI, which only updates running tasks' MSRs, is not enough. All tasks' MSRs need to be updated when a PASID is allocated. This difference increases the complexity of sending IPI to running tasks and updating sleeping tasks's MSRs with locking etc. Of course, it's doable not to update the MSRs in all task when a new PASID is allocated to the mm. But that means we need to discard xsaves support for the MSR and create our own switch function to load the MSR. That increases complexity. We tried similar IPI way to update the PASID in about 200 lines of code. As Dave Hansen pointed, it's too complex. The current lazy updating the MSR only takes essential 3 lines of code in #GP. Does it make sense to still use the current fix up method to update the MSR? Thanks. -Fenghua ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On 7/31/20 4:34 PM, Andy Lutomirski wrote: >> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD >> without a valid PASID value programmed. #GP error codes are 16 bits and all >> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other >> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP >> when loading from the source operand, so its not fully comprehending all >> the reasons. Rather than special case the ENQCMD, in future Intel may >> choose a different fault mechanism for such cases if recovery is needed on >> #GP. > Decoding the user instruction is ugly and sets a bad architecture > precedent, but we already do it in #GP for UMIP. So I'm unconvinced. I'll try to do one more bit of convincing. :) In the end, we need a way to figure out if the #GP was from a known "OK" source that we can fix up. You're right that we could fire up the instruction decoder to help answer that question. But, it (also) doesn't easily yield a perfect answer as to the source of the #GP, it always involves a user copy, and it's a larger code impact than what we've got. I think I went and looked at fixup_umip_exception(), and compared it to the alternative which is essentially just these three lines of code: > + /* > + * If the current task already has a valid PASID in the MSR, > + * the #GP must be for some other reason. > + */ > + if (current->has_valid_pasid) > + return false; ...> + /* Now the current task has a valid PASID in the MSR. */ > + current->has_valid_pasid = 1; and *I* was convinced that instruction decoding wasn't worth it. There's a lot of stuff that fixup_umip_exception() does which we don't have to duplicate, but it's going to be really hard to get it anywhere near as compact as what we've got. I guess Fenghua could go give instruction decoding a shot and see how it looks, though. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On Mon, Aug 3, 2020 at 9:37 AM Dave Hansen wrote: > > On 8/3/20 8:12 AM, Andy Lutomirski wrote: > > I could easily be convinced that the PASID fixup is so trivial and so > > obviously free of misfiring in a way that causes an infinite loop that > > this code is fine. But I think we first need to answer the bigger > > question of why we're doing a lazy fixup in the first place. > > There was an (internal to Intel) implementation of this about a year ago > that used smp_call_function_many() to force the MSR state into all > threads of a process. I took one look at it, decided there was a 0% > chance of it actually functioning and recommended we find another way. > While I'm sure it could be done more efficiently, the implementation I > looked at took ~200 lines of code and comments. It started to look too > much like another instance of mm->cpumask for my comfort. If I were implementing this, I would try making switch_mm_irqs_off() do, roughly: void load_mm_pasid(...) { if (cpu_feature_enabled(X86_FEATURE_ENQCMD)) tsk->xstate[offset] = READ_ONCE(next->context.pasid); } This costs one cache miss, although the cache line in question is about to be read anyway. It might be faster to, instead, do: void load_mm_pasid(...) { u32 pasid = READ_ONCE(next->context.pasid); if (tsk->xstate[offset] != pasid) tsk->state[offset] = pasid; } so we don't dirty the cache line in the common case. The actual generated code ought to be pretty good -- surely the offset of PASID in XSTATE is an entry in an array somewhere that can be found with a single read, right? The READ_ONCE is because this could race against a write to context.pasid, so this code needs to be at the end of the function where it's protected by mm_cpumask. With all this done, the pasid update is just on_each_cpu_mask(mm_cpumask(mm), load_mm_pasid, mm, true). This looks like maybe 20 lines of code. As an added bonus, it lets us free PASIDs early if we ever decide we want to. May I take this opportunity to ask Intel to please put some real thought into future pieces of CPU state? Here's a summary of some things we have: - Normal extended state (FPU, XMM, etc): genuinely per thread and only ever used explicitly. Actually makes sense with XSAVE(C/S). - PKRU: affects CPL0-originated memory accesses, so it needs to be eagerly loaded in the kernel. Does not make sense with XRSTOR(C/S), but it's in there anyway. - CR3: per-mm state. Makes some sense in CR3, but it's tangled up with CR4 in nasty ways. - LDTR: per-mm on Linux and mostly obsolete everyone. In it's own register, so it's not a big deal. - PASID: per-mm state (surely Intel always intended it to be per-mm, since it's for shared _virtual memory_!). But for some reason it's in an MSR (which is slow), and it's cleverly, but not that cleverly, accessible with XSAVES/XRSTORS. Doesn't actually make sense. Also, PASID is lazy-loadable, but the mechanism for telling the kernel that a lazy load is needed got flubbed. - TILE: genuinely per-thread, but it's expensive so it's lazy-loadable. But the lazy-load mechanism reuses #NM, and it's not fully disambiguated from the other use of #NM. So it sort of works, but it's gross. - "KERNEL_GS_BASE", i.e. the shadow GS base. This is logically per-user-thread state, but it's only accessible in MSRs. For some reason this is *not* in XSAVES/XRSTORS state, nor is there any efficient way to access it at all. - Segment registers: can't be properly saved except by hypervisors, and can almost, but not quite, be properly loaded (assuming the state was sane to begin with) by normal kernels. Just don't try to load 1, 2, or 3 into any of them. Sometimes I think that this is all intended to be as confusing as possible and that it's all a ploy to keep context switches slow and complicated. Maybe Intel doesn't actually want to compete with other architectures that can context switch quickly? It would be really nice if we had a clean way to load per-mm state (see my private emails about this), a clean way to load CPL3 register state, and a clean way to load per-user-thread *kernel* register state (e.g. PKRU and probably PKRS). And there should be an exception that says "user code accessed a lazy-loaded resource that isn't loaded, and this is the resource it tried to access". ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On Mon, Aug 3, 2020 at 8:03 AM Dave Hansen wrote: > > On 7/31/20 4:34 PM, Andy Lutomirski wrote: > >> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD > >> without a valid PASID value programmed. #GP error codes are 16 bits and all > >> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other > >> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP > >> when loading from the source operand, so its not fully comprehending all > >> the reasons. Rather than special case the ENQCMD, in future Intel may > >> choose a different fault mechanism for such cases if recovery is needed on > >> #GP. > > Decoding the user instruction is ugly and sets a bad architecture > > precedent, but we already do it in #GP for UMIP. So I'm unconvinced. > > I'll try to do one more bit of convincing. :) > > In the end, we need a way to figure out if the #GP was from a known "OK" > source that we can fix up. You're right that we could fire up the > instruction decoder to help answer that question. But, it (also) > doesn't easily yield a perfect answer as to the source of the #GP, it > always involves a user copy, and it's a larger code impact than what > we've got. > > I think I went and looked at fixup_umip_exception(), and compared it to > the alternative which is essentially just these three lines of code: > > > + /* > > + * If the current task already has a valid PASID in the MSR, > > + * the #GP must be for some other reason. > > + */ > > + if (current->has_valid_pasid) > > + return false; > ...> + /* Now the current task has a valid PASID in the MSR. */ > > + current->has_valid_pasid = 1; > > and *I* was convinced that instruction decoding wasn't worth it. > > There's a lot of stuff that fixup_umip_exception() does which we don't > have to duplicate, but it's going to be really hard to get it anywhere > near as compact as what we've got. > I could easily be convinced that the PASID fixup is so trivial and so obviously free of misfiring in a way that causes an infinite loop that this code is fine. But I think we first need to answer the bigger question of why we're doing a lazy fixup in the first place. --Andy ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
Hi, Andy, On Fri, Jul 31, 2020 at 04:34:11PM -0700, Andy Lutomirski wrote: > On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu wrote: > > > > A #GP fault is generated when ENQCMD instruction is executed without > > a valid PASID value programmed in the current thread's PASID MSR. The > > #GP fault handler will initialize the MSR if a PASID has been allocated > > for this process. > > > > Decoding the user instruction is ugly and sets a bad architecture > > precedent. It may not function if the faulting instruction is modified > > after #GP. > > > > Thomas suggested to provide a reason for the #GP caused by executing ENQCMD > > without a valid PASID value programmed. #GP error codes are 16 bits and all > > 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other > > choice was to reflect the error code in an MSR. ENQCMD can also cause #GP > > when loading from the source operand, so its not fully comprehending all > > the reasons. Rather than special case the ENQCMD, in future Intel may > > choose a different fault mechanism for such cases if recovery is needed on > > #GP. > > Decoding the user instruction is ugly and sets a bad architecture > precedent, but we already do it in #GP for UMIP. So I'm unconvinced. Maybe just remove the "Decoding the user instruction ... bad architecture precedent" sentence? The sentence is vague. As described in the following "It may not function ..." sentence, the real issue of parsing the instruction is the instruction may be modified by another processor before it's parsed in the #GP handler. If just keep the "It may not function ..." sentence, is that good enough to explain why we don't parse the faulting instruction? > > Memo to Intel, though: you REALLY need to start thinking about what > the heck an OS is supposed to do with all these new faults you're > coming up with. The new #NM for TILE is utterly nonsensical. Sure, > it works for an OS that does not use CR0.TS and as long as no one > tries to extend the same mechanism for some new optional piece of > state, but as soon as Intel tries to use the same mechanism for > anything else, it falls apart. > > Please do better. Internally we did discuss the error code in #GP for PASID with HW architects. But due to some uarch reason, it's not simple to report the error code for PASID:( Please see previous discussion on the error code for PASID: https://lore.kernel.org/lkml/20200427224646.GA103955@otc-nc-03/ It's painful for our SW guys to check exception reasons if hardware doesn't explicitly tell us. Hopefully the heuristics (fixup the PASID MSR if the process already has a valid PASID but the MSR doesn't have one yet) in this patch is acceptable. > > > + > > +/* > > + * Write the current task's PASID MSR/state. This is called only when PASID > > + * is enabled. > > + */ > > +static void fpu__pasid_write(u32 pasid) > > +{ > > + u64 msr_val = pasid | MSR_IA32_PASID_VALID; > > + > > + fpregs_lock(); > > + > > + /* > > +* If the MSR is active and owned by the current task's FPU, it can > > +* be directly written. > > +* > > +* Otherwise, write the fpstate. > > +*/ > > + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) { > > + wrmsrl(MSR_IA32_PASID, msr_val); > > + } else { > > + struct ia32_pasid_state *ppasid_state; > > + > > + ppasid_state = > > get_xsave_addr(>thread.fpu.state.xsave, > > + XFEATURE_PASID); > > + /* > > +* ppasid_state shouldn't be NULL because XFEATURE_PASID > > +* is enabled. > > +*/ > > + WARN_ON_ONCE(!ppasid_state); > > + ppasid_state->pasid = msr_val; > > WARN instead of BUG is nice, but you'll immediate oops if this fails. > How about: > > if (!WARN_ON_ONCE(!ppasid_state)) > ppasid_state->pasid = msr_val; OK. I will fix this issue. Thank you very much for your review! -Fenghua ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu wrote: > > A #GP fault is generated when ENQCMD instruction is executed without > a valid PASID value programmed in the current thread's PASID MSR. The > #GP fault handler will initialize the MSR if a PASID has been allocated > for this process. Let's take a step back here. Why are we trying to avoid IPIs? If you call munmap(), you IPI other CPUs running tasks in the current mm. If you do perf_event_open() and thus acquire RDPMC permission, you IPI other CPUs running tasks in the current mm. If you call modify_ldt(), you IPI other CPUs running tasks in the current mm. These events can all happen more than once per process. Now we have ENQCMD. An mm can be assigned a PASID *once* in the model that these patches support. Why not just send an IPI using essentially identical code to the LDT sync or the CR4.PCE sync? --Andy ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu wrote: > > A #GP fault is generated when ENQCMD instruction is executed without > a valid PASID value programmed in the current thread's PASID MSR. The > #GP fault handler will initialize the MSR if a PASID has been allocated > for this process. > > Decoding the user instruction is ugly and sets a bad architecture > precedent. It may not function if the faulting instruction is modified > after #GP. > > Thomas suggested to provide a reason for the #GP caused by executing ENQCMD > without a valid PASID value programmed. #GP error codes are 16 bits and all > 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other > choice was to reflect the error code in an MSR. ENQCMD can also cause #GP > when loading from the source operand, so its not fully comprehending all > the reasons. Rather than special case the ENQCMD, in future Intel may > choose a different fault mechanism for such cases if recovery is needed on > #GP. Decoding the user instruction is ugly and sets a bad architecture precedent, but we already do it in #GP for UMIP. So I'm unconvinced. Memo to Intel, though: you REALLY need to start thinking about what the heck an OS is supposed to do with all these new faults you're coming up with. The new #NM for TILE is utterly nonsensical. Sure, it works for an OS that does not use CR0.TS and as long as no one tries to extend the same mechanism for some new optional piece of state, but as soon as Intel tries to use the same mechanism for anything else, it falls apart. Please do better. > + > +/* > + * Write the current task's PASID MSR/state. This is called only when PASID > + * is enabled. > + */ > +static void fpu__pasid_write(u32 pasid) > +{ > + u64 msr_val = pasid | MSR_IA32_PASID_VALID; > + > + fpregs_lock(); > + > + /* > +* If the MSR is active and owned by the current task's FPU, it can > +* be directly written. > +* > +* Otherwise, write the fpstate. > +*/ > + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) { > + wrmsrl(MSR_IA32_PASID, msr_val); > + } else { > + struct ia32_pasid_state *ppasid_state; > + > + ppasid_state = > get_xsave_addr(>thread.fpu.state.xsave, > + XFEATURE_PASID); > + /* > +* ppasid_state shouldn't be NULL because XFEATURE_PASID > +* is enabled. > +*/ > + WARN_ON_ONCE(!ppasid_state); > + ppasid_state->pasid = msr_val; WARN instead of BUG is nice, but you'll immediate oops if this fails. How about: if (!WARN_ON_ONCE(!ppasid_state)) ppasid_state->pasid = msr_val; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v6 12/12] x86/traps: Fix up invalid PASID
A #GP fault is generated when ENQCMD instruction is executed without a valid PASID value programmed in the current thread's PASID MSR. The #GP fault handler will initialize the MSR if a PASID has been allocated for this process. Decoding the user instruction is ugly and sets a bad architecture precedent. It may not function if the faulting instruction is modified after #GP. Thomas suggested to provide a reason for the #GP caused by executing ENQCMD without a valid PASID value programmed. #GP error codes are 16 bits and all 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other choice was to reflect the error code in an MSR. ENQCMD can also cause #GP when loading from the source operand, so its not fully comprehending all the reasons. Rather than special case the ENQCMD, in future Intel may choose a different fault mechanism for such cases if recovery is needed on #GP. The following heuristic is used to avoid decoding the user instructions to determine the precise reason for the #GP fault: 1) If the mm for the process has not been allocated a PASID, this #GP cannot be fixed. 2) If the PASID MSR is already initialized, then the #GP was for some other reason 3) Try initializing the PASID MSR and returning. If the #GP was from an ENQCMD this will fix it. If not, the #GP fault will be repeated and will hit case "2". Suggested-by: Thomas Gleixner Signed-off-by: Fenghua Yu Reviewed-by: Tony Luck Reviewed-by: Lu Baolu --- v5: - Use cpu_feature_enabled() and remove redundant CONFIG_INTEL_IOMMU_SVM check which is included in cpu_feature_enabled() in fixup_pasid_exception() (PeterZ and Dave Hansen) - Reviewed by Lu Baolu v4: - Change PASID type to u32 (Christoph) v3: - Check and set current->has_valid_pasid in fixup() (PeterZ) - Add fpu__pasid_write() to update the MSR (PeterZ) - Add ioasid_find() sanity check in fixup() v2: - Update the first paragraph of the commit message (Thomas) - Add reasons why don't decode the user instruction and don't use #GP error code (Thomas) - Change get_task_mm() to current->mm (Thomas) - Add comments on why IRQ is disabled during PASID fixup (Thomas) - Add comment in fixup() that the function is called when #GP is from user (so mm is not NULL) (Dave Hansen) arch/x86/include/asm/iommu.h | 1 + arch/x86/kernel/traps.c | 12 ++ drivers/iommu/intel/svm.c| 78 3 files changed, 91 insertions(+) diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h index ed41259fe7ac..e9365a5d6f7d 100644 --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr) } void __free_pasid(struct mm_struct *mm); +bool __fixup_pasid_exception(void); #endif /* _ASM_X86_IOMMU_H */ diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index f58679e487f6..fe0f7d00523b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -59,6 +59,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -518,6 +519,14 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs, return GP_CANONICAL; } +static bool fixup_pasid_exception(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_ENQCMD)) + return false; + + return __fixup_pasid_exception(); +} + #define GPFSTR "general protection fault" DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) @@ -530,6 +539,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) cond_local_irq_enable(regs); + if (user_mode(regs) && fixup_pasid_exception()) + goto exit; + if (static_cpu_has(X86_FEATURE_UMIP)) { if (user_mode(regs) && fixup_umip_exception(regs)) goto exit; diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 4c70b037..4a84c82a4f8c 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -1105,3 +1105,81 @@ void __free_pasid(struct mm_struct *mm) */ ioasid_free(pasid); } + +/* + * Write the current task's PASID MSR/state. This is called only when PASID + * is enabled. + */ +static void fpu__pasid_write(u32 pasid) +{ + u64 msr_val = pasid | MSR_IA32_PASID_VALID; + + fpregs_lock(); + + /* +* If the MSR is active and owned by the current task's FPU, it can +* be directly written. +* +* Otherwise, write the fpstate. +*/ + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) { + wrmsrl(MSR_IA32_PASID, msr_val); + } else { + struct ia32_pasid_state *ppasid_state; + + ppasid_state = get_xsave_addr(>thread.fpu.state.xsave, + XFEATURE_PASID); + /* +* ppasid_state shouldn't be NULL because XFEATURE_PASID +* is enabled. +*/ +