Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state
For AMX, we must still reserve the space, but we are not going to write zeros for clean state. We so this in software by checking XINUSE=0, and clearing the xstate_bf for the XSAVE. As a result, for XINUSE=0, we can skip writing the zeros, even though we can't compress the space. So my understanding is that clearing xstate_bv will not help prevent saving zeros, but only not masking EDX:EAX, since the following logic. Not sure if this is just what you mean. :) FWIW, PATCH21 [1] uses the instruction mask to skip writing zeros on sigframe. Then, XSAVE will clear the xstate_bv for the xtile data state bit. [1] https://lore.kernel.org/lkml/20210221185637.19281-22-chang.seok@intel.com/ Yes, no mask in EDX:EAX works in such case. Thanks for pointing out the patch. BRs, Jing Thanks, Chang
Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state
On 3/25/2021 5:09 AM, Len Brown wrote: On Tue, Mar 23, 2021 at 11:15 PM Liu, Jing2 wrote: IMO, the problem with AVX512 state is that we guaranteed it will be zero for XINUSE=0. That means we have to write 0's on saves. why "we have to write 0's on saves" when XINUSE=0. Since due to SDM, if XINUSE=0, the XSAVES will *not* save the data and xstate_bv bit is 0; if use XSAVE, it need save the state but xstate_bv bit is also 0. It would be better to be able to skip the write -- even if we can't save the space we can save the data transfer. (This is what we did for AMX). With XFD feature that XFD=1, XSAVE command still has to save INIT state to the area. So it seems with XINUSE=0 and XFD=1, the XSAVE(S) commands do the same that both can help save the data transfer. Hi Jing, Good observation! There are 3 cases. Hi Len, thanks for your reply. 1. Task context switch save into the context switch buffer. Here we use XSAVES, and as you point out, XSAVES includes the compaction optimization feature tracked by XINUSE. So when AMX is enabled, but clean, XSAVES doesn't write zeros. Further, it omits the buffer space for AMX in the destination altogether! However, since XINUSE=1 is possible, we have to *allocate* a buffer large enough to handle the dirty data for when XSAVES can not employ that optimization. Yes, I agree with you about the first case. 2. Entry into user signal handler saves into the user space sigframe. Here we use XSAVE, and so the hardware will write zeros for XINUSE=0, and for AVX512, we save neither time or space. My understanding that for application compatibility, we can *not* compact the destination buffer that user-space sees. This is because existing code may have adopted fixed size offsets. (which is unfortunate). And so, for AVX512, we both reserve the space, and we write zeros for clean AVX512 state. By XSAVE, I think this is true if we assume setting EDX:EAX AVX512 bits as 1, which means XSAVE will write zeros when XINUSE=0. Is this the same assumption with yours?... For AMX, we must still reserve the space, but we are not going to write zeros for clean state. We so this in software by checking XINUSE=0, and clearing the xstate_bf for the XSAVE. As a result, for XINUSE=0, we can skip writing the zeros, even though we can't compress the space. So my understanding is that clearing xstate_bv will not help prevent saving zeros, but only not masking EDX:EAX, since the following logic. Not sure if this is just what you mean. :) RFBM ← XCR0 AND EDX:EAX; /* bitwise logical AND */ OLD_BV ← XSTATE_BV field from XSAVE header; ... FOR i ← 2 TO 62 IF RFBM[i] = 1 THEN save XSAVE state component i at offset n from base of XSAVE area; FI; ENDFOR; XSTATE_BV field in XSAVE header ← (OLD_BV AND NOT RFBM) OR (XINUSE AND RFBM); 3. user space always uses fully uncompacted XSAVE buffers. The reason I'm interested in XINUSE denotation is that it might be helpful for the XFD MSRs context switch cost during vmexit and vmenter. As the guest OS may be using XFD, the VMM can not use it for itself. Rather, the VMM must context switch it when it switches between guests. (or not expose it to guests at all) My understand is that KVM cannot assume that userspace qemu uses XFD or not, so KVM need context switch XFD between vcpu threads when vmexit/vmenter. That's why I am thinking about detecting XINUSE when vmexit, otherwise, a wrong armed IA32_XFD will impact XSAVES/XRSTORS causing guest AMX states lost. Thanks, Jing cheers, -Len cheers, Len Brown, Intel Open Source Technology Center
Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state
On 3/24/2021 5:01 AM, Len Brown wrote: I have an obnoxious question: do we really want to use the XFD mechanism? Obnoxious questions are often the most valuable! :-) [...] cheers, Len Brown, Intel Open Source Technology Center ps. I agree that un-necessary XINUSE=1 is possible. Notwithstanding the issues initially deploying AVX512, I am skeptical that it is common today. Sorry, I'm trying to understand from... IMO, the problem with AVX512 state is that we guaranteed it will be zero for XINUSE=0. That means we have to write 0's on saves. why "we have to write 0's on saves" when XINUSE=0. Since due to SDM, if XINUSE=0, the XSAVES will *not* save the data and xstate_bv bit is 0; if use XSAVE, it need save the state but xstate_bv bit is also 0. It would be better to be able to skip the write -- even if we can't save the space we can save the data transfer. (This is what we did for AMX). With XFD feature that XFD=1, XSAVE command still has to save INIT state to the area. So it seems with XINUSE=0 and XFD=1, the XSAVE(S) commands do the same that both can help save the data transfer. The reason I'm interested in XINUSE denotation is that it might be helpful for the XFD MSRs context switch cost during vmexit and vmenter. Thanks, Jing pps. your idea of requiring the user to allocate their own signal stack is interesting. It isn't really about allocating the stack, though -- the stack of the task that uses the feature is generally fine already. The opportunity is to allow tasks that do *not* use the new feature to get away with minimal data transfer and stack size. As we don't have the 0's guarantee for AMX, we bought the important part of that back.
Re: [PATCH v2] KVM: x86: Revise guest_fpu xcomp_bv field
On 3/2/2021 7:59 AM, Sean Christopherson wrote: On Thu, Feb 25, 2021, Jing Liu wrote: XCOMP_BV[63] field indicates that the save area is in the compacted format and XCOMP_BV[62:0] indicates the states that have space allocated in the save area, including both XCR0 and XSS bits enabled by the host kernel. Use xfeatures_mask_all for calculating xcomp_bv and reuse XCOMP_BV_COMPACTED_FORMAT defined by kernel. Signed-off-by: Jing Liu --- arch/x86/kvm/x86.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b404e4d7dd8..f115493f577d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4435,8 +4435,6 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, return 0; } -#define XSTATE_COMPACTION_ENABLED (1ULL << 63) - static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu) { struct xregs_state *xsave = >arch.guest_fpu->state.xsave; @@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) /* Set XSTATE_BV and possibly XCOMP_BV. */ xsave->header.xfeatures = xstate_bv; if (boot_cpu_has(X86_FEATURE_XSAVES)) - xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED; + xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | +xfeatures_mask_all; Doesn't fill_xsave also need to be updated? Not with xfeatures_mask_all, but to account for arch.ia32_xss? I believe it's a nop with the current code, since supported_xss is zero, but it should be fixed, no? Yes. For the arch.ia32_xss, I noticed CET (https://lkml.org/lkml/2020/7/15/1412) has posted related change so I didn't touch xstate_bv for fill_xsave for now. Finally, fill_xsave() need e.g. arch.guest_supported_xss for xstate_bv, for xcomp_bv, xfeatures_mask_all is ok. /* * Copy each region from the non-compacted offset to the @@ -9912,9 +9911,6 @@ static void fx_init(struct kvm_vcpu *vcpu) return; fpstate_init(>arch.guest_fpu->state); - if (boot_cpu_has(X86_FEATURE_XSAVES)) - vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv = - host_xcr0 | XSTATE_COMPACTION_ENABLED; Ugh, this _really_ needs a comment in the changelog. It took me a while to realize fpstate_init() does exactly what the new fill_xave() is doing. How about introducing that "fx_init()->fpstate_init() initializes xcomp_bv of guest_fpu so no need to set again in later fill_xsave() and load_xsave()" in commit message? And isn't the code in load_xsave() redundant and can be removed? Oh, yes. Keep fx_init() initializing xcomp_bv for guest_fpu is enough. Let me remove it in load_xsave later. And for fill_xsave(), I think no need to set xcomp_bv there. Any code that uses get_xsave_addr() would be have a dependency on load_xsave() if it's not redundant, and I can't see how that would work. Sorry I didn't quite understand why get_xsave_addr() has dependency on load_xsave(), do you mean the xstate_bv instead of xcomp_bv, that load_xsave() uses it to get the addr? Thanks, Jing /* * Ensure guest xcr0 is valid for loading -- 2.18.4
Re: [PATCH v1] kvm: x86: Revise guest_fpu xcomp_bv field
On 2/25/2021 4:40 AM, Sean Christopherson wrote: On Tue, Feb 23, 2021, Liu, Jing2 wrote: XCOMP_BV[63] field indicates that the save area is in the compacted format and XCOMP_BV[62:0] indicates the states that have space allocated in the save area, including both XCR0 and XSS bits enable by the host kernel. Use xfeatures_mask_all for calculating xcomp_bv and reuse XCOMP_BV_COMPACTED_FORMAT defined by kernel. Works for me, just please wrap at ~73-75 chars, not ~64. Thanks! Sure, let me update v2. BRs, Jing
Re: [PATCH v1] kvm: x86: Revise guest_fpu xcomp_bv field
On 2/23/2021 12:06 AM, Sean Christopherson wrote: On Mon, Feb 22, 2021, Liu, Jing2 wrote: On 2/9/2021 1:24 AM, Sean Christopherson wrote: On Mon, Feb 08, 2021, Dave Hansen wrote: On 2/8/21 8:16 AM, Jing Liu wrote: -#define XSTATE_COMPACTION_ENABLED (1ULL << 63) - static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu) { struct xregs_state *xsave = >arch.guest_fpu->state.xsave; @@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) /* Set XSTATE_BV and possibly XCOMP_BV. */ xsave->header.xfeatures = xstate_bv; if (boot_cpu_has(X86_FEATURE_XSAVES)) - xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED; + xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | +xfeatures_mask_all; This is wrong, xfeatures_mask_all also tracks supervisor states. When looking at SDM Vol2 XSAVES instruction Operation part, it says as follows, RFBM ← (XCR0 OR IA32_XSS) AND EDX:EAX; COMPMASK ← RFBM OR 8000_H; ... XCOMP_BV field in XSAVE header ← COMPMASK; So it seems xcomp_bv also tracks supervisor states? Yes, sorry, I got distracted by Dave's question and didn't read the changelog closely. Now that I have, I find "Since fpstate_init() has initialized xcomp_bv, let's just use that." confusing. I think what you intend to say is that we can use the same _logic_ as fpstate_init_xstate() for calculating xcomp_bv. Yes, that's the idea. That said, it would be helpful for the changelog to explain why it's correct to use xfeatures_mask_all, e.g. just a short comment stating that the variable holds all XCR0 and XSS bits enabled by the host kernel. Justifying a change with "because other code does it" is sketchy, becuse there's no guarantee that what something else does is also correct for KVM, or that the existing code itself is even correct. Got it, thanks for the details on this. Then how about making the commit message like, XCOMP_BV[63] field indicates that the save area is in the compacted format and XCOMP_BV[62:0] indicates the states that have space allocated in the save area, including both XCR0 and XSS bits enable by the host kernel. Use xfeatures_mask_all for calculating xcomp_bv and reuse XCOMP_BV_COMPACTED_FORMAT defined by kernel. Thanks, Jing
Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch
On 2/9/2021 2:55 AM, Konrad Rzeszutek Wilk wrote: On Mon, Feb 08, 2021 at 07:12:22PM +0100, Paolo Bonzini wrote: [...] However, running the host with _more_ bits set than necessary in XFD should not be a problem as long as the host doesn't use the AMX instructions. So perhaps Jing can look into keeping XFD=0 for as little time as possible, and XFD=host_XFD|guest_XFD as much as possible. This sounds like the lazy-fpu (eagerfpu?) that used to be part of the kernel? I recall that we had a CVE for that - so it may also be worth double-checking that we don't reintroduce that one. Not sure if this means lazy restore, but the spec mentions that "System software should not use XFD to implement a 'lazy restore' approach to management of the XTILEDATA state component." One reason is XSAVE(S) will lose the xTILEDATA when XFD[i] is nonzero. Thanks, Jing
Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch
On 2/9/2021 2:12 AM, Paolo Bonzini wrote: On 08/02/21 19:04, Sean Christopherson wrote: That said, the case where we saw MSR autoload as faster involved EFER, and we decided that it was due to TLB flushes (commit f6577a5fa15d, "x86, kvm, vmx: Always use LOAD_IA32_EFER if available", 2014-11-12). Do you know if RDMSR/WRMSR is always slower than MSR autoload? RDMSR/WRMSR may be marginally slower, but only because the autoload stuff avoids serializing the pipeline after every MSR. That's probably adding up quickly... The autoload paths are effectively just wrappers around the WRMSR ucode, plus some extra VM-Enter specific checks, as ucode needs to perform all the normal fault checks on the index and value. On the flip side, if the load lists are dynamically constructed, I suspect the code overhead of walking the lists negates any advantages of the load lists. ... but yeah this is not very encouraging. Thanks for reviewing the patches. Context switch time is a problem for XFD. In a VM that uses AMX, most threads in the guest will have nonzero XFD but the vCPU thread itself will have zero XFD. So as soon as one thread in the VM forces the vCPU thread to clear XFD, you pay a price on all vmexits and vmentries. Spec says, "If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i, the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1; instead, it saves bit i of XSTATE_BV field of the XSAVE header as 0 (indicating that the state component is in its initialized state). With the exception of XSAVE, no data is saved for the state component (XSAVE saves the initial value of the state component..." Thus, the key point is not losing the non initial AMX state on vmexit and vmenter. If AMX state is in initialized state, it doesn't matter. Otherwise, XFD[i] should not be armed with a nonzero value. If we don't want to extremely set XFD=0 every time on vmexit, it would be useful to first detect if guest AMX state is initial or not. How about using XINUSE notation here? (Details in SDM vol1 13.6 PROCESSOR TRACKING OF XSAVE-MANAGED STATE, and vol2 XRSTOR/XRSTORS instruction operation part) The main idea is processor tracks the status of various state components by XINUSE, and it shows if the state component is in use or not. When XINUSE[i]=0, state component i is in initial configuration. Otherwise, kvm should take care of XFD on vmexit. However, running the host with _more_ bits set than necessary in XFD should not be a problem as long as the host doesn't use the AMX instructions. Does "running the host" mean running in kvm? why need more bits (host_XFD|guest_XFD), I'm trying to think about the case that guest_XFD is not enough? e.g. In guest, it only need bit i when guest supports it and guest uses the passthru XFD[i] for detecting dynamic usage; In kvm, kvm doesn't use AMX instructions; and "system software should not use XFD to implement a 'lazy restore' approach to management of the XTILEDATA state component." Out of kvm, kernel ensures setting correct XFD for threads when scheduling; Thanks, Jing So perhaps Jing can look into keeping XFD=0 for as little time as possible, and XFD=host_XFD|guest_XFD as much as possible. Paolo
Re: [PATCH v1] kvm: x86: Revise guest_fpu xcomp_bv field
On 2/9/2021 1:24 AM, Sean Christopherson wrote: On Mon, Feb 08, 2021, Dave Hansen wrote: On 2/8/21 8:16 AM, Jing Liu wrote: -#define XSTATE_COMPACTION_ENABLED (1ULL << 63) - static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu) { struct xregs_state *xsave = >arch.guest_fpu->state.xsave; @@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) /* Set XSTATE_BV and possibly XCOMP_BV. */ xsave->header.xfeatures = xstate_bv; if (boot_cpu_has(X86_FEATURE_XSAVES)) - xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED; + xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | +xfeatures_mask_all; This is wrong, xfeatures_mask_all also tracks supervisor states. When looking at SDM Vol2 XSAVES instruction Operation part, it says as follows, RFBM ← (XCR0 OR IA32_XSS) AND EDX:EAX; COMPMASK ← RFBM OR 8000_H; ... XCOMP_BV field in XSAVE header ← COMPMASK; So it seems xcomp_bv also tracks supervisor states? BRs, Jing Are 'host_xcr0' and 'xfeatures_mask_all' really interchangeable? If so, shouldn't we just remove 'host_xcr0' everywhere? I think so? But use xfeatures_mask_user(). In theory, host_xss can also be replaced with the _supervisor() and _dynamic() variants. That code needs a good hard look at the _dynamic() features, which is currently just architectural LBRs. E.g. I wouldn't be surprised if KVM currently fails to save/restore arch LBRs due to the bit not being set in host_xss.
Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch
On 2/7/2021 7:49 PM, Borislav Petkov wrote: On Sun, Feb 07, 2021 at 10:42:52AM -0500, Jing Liu wrote: diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index 7e0c68043ce3..fbb761fc13ec 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -145,6 +145,7 @@ EXPORT_SYMBOL_GPL(fpu_kernel_xstate_min_size); * can be dynamically expanded to include some states up to this size. */ unsigned int fpu_kernel_xstate_max_size; +EXPORT_SYMBOL_GPL(fpu_kernel_xstate_max_size); /* Get alignment of the TYPE. */ #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 080f3be9a5e6..9c471a0364e2 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -77,12 +77,14 @@ static struct xfeature_capflag_info xfeature_capflags[] __initdata = { * XSAVE buffer, both supervisor and user xstates. */ u64 xfeatures_mask_all __read_mostly; +EXPORT_SYMBOL_GPL(xfeatures_mask_all); /* * This represents user xstates, a subset of xfeatures_mask_all, saved in a * dynamic kernel XSAVE buffer. */ u64 xfeatures_mask_user_dynamic __read_mostly; +EXPORT_SYMBOL_GPL(xfeatures_mask_user_dynamic); static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1}; static unsigned int xstate_sizes[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1}; Make sure you Cc x...@kernel.org when touching code outside of kvm. There's this script called scripts/get_maintainer.pl which will tell you who to Cc. Use it before you send next time please. Thx. Thank you for the reminder. I'll cc that next time. BRs, Jing
Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
On 1/15/2021 12:59 PM, Bae, Chang Seok wrote: On Jan 11, 2021, at 18:52, Liu, Jing2 wrote: On 1/8/2021 2:40 AM, Bae, Chang Seok wrote: On Jan 7, 2021, at 17:41, Liu, Jing2 wrote: static void kvm_save_current_fpu(struct fpu *fpu) { + struct fpu *src_fpu = >thread.fpu; + /* * If the target FPU state is not resident in the CPU registers, just * memcpy() from current, else save CPU state directly to the target. */ - if (test_thread_flag(TIF_NEED_FPU_LOAD)) - memcpy(>state, >thread.fpu.state, + if (test_thread_flag(TIF_NEED_FPU_LOAD)) { + memcpy(>state, _fpu->state, fpu_kernel_xstate_min_size); - else + } else { + if (fpu->state_mask != src_fpu->state_mask) + fpu->state_mask = src_fpu->state_mask; Though dynamic feature is not supported in kvm now, this function still need consider more things for fpu->state_mask. Can you elaborate this? Which path might be affected by fpu->state_mask without dynamic state supported in KVM? I suggest that we can set it before if...else (for both cases) and not change other. I tried a minimum change here. The fpu->state_mask value does not impact the memcpy(). So, why do we need to change it for both? Sure, what I'm considering is that "mask" is the first time introduced into "fpu", representing the usage, so not only set it when needed, but also make it as a representation, in case of anywhere using it (especially between the interval of this series and kvm series in future). Thank you for the feedback. Sorry, I don't get any logical reason to set the mask always here. Sure, I'd like to see if fx_init()->memset is the case, though maybe no hurt so far in test. Thanks, Jing Perhaps, KVM code can be updated like you mentioned when supporting the dynamic states there. Please let me know if I’m missing any functional issues. Thanks, Chang
Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
On 1/8/2021 2:40 AM, Bae, Chang Seok wrote: On Jan 7, 2021, at 17:41, Liu, Jing2 wrote: static void kvm_save_current_fpu(struct fpu *fpu) { + struct fpu *src_fpu = >thread.fpu; + /* * If the target FPU state is not resident in the CPU registers, just * memcpy() from current, else save CPU state directly to the target. */ - if (test_thread_flag(TIF_NEED_FPU_LOAD)) - memcpy(>state, >thread.fpu.state, + if (test_thread_flag(TIF_NEED_FPU_LOAD)) { + memcpy(>state, _fpu->state, fpu_kernel_xstate_min_size); For kvm, if we assume that it does not support dynamic features until this series, memcpy for only fpu->state is correct. I think this kind of assumption is reasonable and we only make original xstate work. - else + } else { + if (fpu->state_mask != src_fpu->state_mask) + fpu->state_mask = src_fpu->state_mask; Though dynamic feature is not supported in kvm now, this function still need consider more things for fpu->state_mask. Can you elaborate this? Which path might be affected by fpu->state_mask without dynamic state supported in KVM? I suggest that we can set it before if...else (for both cases) and not change other. I tried a minimum change here. The fpu->state_mask value does not impact the memcpy(). So, why do we need to change it for both? Sure, what I'm considering is that "mask" is the first time introduced into "fpu", representing the usage, so not only set it when needed, but also make it as a representation, in case of anywhere using it (especially between the interval of this series and kvm series in future). Thanks, Jing Thanks, Chang
RE: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
-Original Message- From: Bae, Chang Seok Sent: Wednesday, December 23, 2020 11:57 PM To: b...@suse.de; l...@kernel.org; t...@linutronix.de; mi...@kernel.org; x...@kernel.org Cc: Brown, Len ; Hansen, Dave ; Liu, Jing2 ; Shankar, Ravi V ; linux-kernel@vger.kernel.org; Bae, Chang Seok ; k...@vger.kernel.org Subject: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate copy_xregs_to_kernel() used to save all user states in a kernel buffer. When the dynamic user state is enabled, it becomes conditional which state to be saved. fpu->state_mask can indicate which state components are reserved to be saved in XSAVE buffer. Use it as XSAVE's instruction mask to select states. KVM used to save all xstate via copy_xregs_to_kernel(). Update KVM to set a valid fpu->state_mask, which will be necessary to correctly handle dynamic state buffers. See comments together below. No functional change until the kernel supports dynamic user states. Signed-off-by: Chang S. Bae Reviewed-by: Len Brown Cc: x...@kernel.org Cc: linux-kernel@vger.kernel.org Cc: k...@vger.kernel.org [...] /* * AVX512 state is tracked here because its use is diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4aecfba04bd3..93b5bacad67a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9214,15 +9214,20 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) static void kvm_save_current_fpu(struct fpu *fpu) { + struct fpu *src_fpu = >thread.fpu; + /* * If the target FPU state is not resident in the CPU registers, just * memcpy() from current, else save CPU state directly to the target. */ - if (test_thread_flag(TIF_NEED_FPU_LOAD)) - memcpy(>state, >thread.fpu.state, + if (test_thread_flag(TIF_NEED_FPU_LOAD)) { + memcpy(>state, _fpu->state, fpu_kernel_xstate_min_size); For kvm, if we assume that it does not support dynamic features until this series, memcpy for only fpu->state is correct. I think this kind of assumption is reasonable and we only make original xstate work. - else + } else { + if (fpu->state_mask != src_fpu->state_mask) + fpu->state_mask = src_fpu->state_mask; Though dynamic feature is not supported in kvm now, this function still need consider more things for fpu->state_mask. I suggest that we can set it before if...else (for both cases) and not change other. Thanks, Jing copy_fpregs_to_fpstate(fpu); + } } /* Swap (qemu) user FPU context for the guest FPU context. */ -- 2.17.1
Re: [PATCH RFC] kvm: x86: Expose AVX512_BF16 feature to guest
Hi Paolo, On 6/20/2019 8:16 PM, Paolo Bonzini wrote: On 20/06/19 13:21, Jing Liu wrote: + for (i = 1; i <= times; i++) { + if (*nent >= maxnent) + goto out; + do_cpuid_1_ent([i], function, i); + entry[i].eax &= F(AVX512_BF16); + entry[i].ebx = 0; + entry[i].ecx = 0; + entry[i].edx = 0; + entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; + ++*nent; This woud be wrong for i > 1, so instead make this if (entry->eax >= 1) I am confused about the @index parameter. @index seems not used for every case except 0x07. Since the caller function only has @index=0, so all other cases except 0x07 put cpuid info from subleaf=0 to max subleaf. What do you think about @index in current function? Does it mean, we need put cpuid from index to max subleaf to @entry[i]? If so, the logic seems as follows, if (index == 0) { // Put subleaf 0 into @entry // Put subleaf 1 into @entry[1] } else if (index < entry->eax) { // Put subleaf 1 into @entry } else { // Put all zero into @entry } But this seems not identical with other cases, for current caller function. Or we can simply ignore @index in 0x07 and just put all possible subleaf info back? and define F(AVX512_BF16) as a new constant kvm_cpuid_7_1_eax_features. Got it. Thanks, Jing Paolo