Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state

2021-03-25 Thread Liu, Jing2





For AMX, we must still reserve the space, but we are not going to write zeros
for clean state.  We so this in software by checking XINUSE=0, and clearing
the xstate_bf for the XSAVE.  As a result, for XINUSE=0, we can skip
writing the zeros, even though we can't compress the space.

So my understanding is that clearing xstate_bv will not help prevent saving
zeros, but only not masking EDX:EAX, since the following logic. Not sure if
this is just what you mean. :)

FWIW, PATCH21 [1] uses the instruction mask to skip writing zeros on sigframe.
Then, XSAVE will clear the xstate_bv for the xtile data state bit.

[1] 
https://lore.kernel.org/lkml/20210221185637.19281-22-chang.seok@intel.com/
Yes, no mask in EDX:EAX works in such case. Thanks for pointing out the 
patch.


BRs,
Jing


Thanks,
Chang




Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state

2021-03-24 Thread Liu, Jing2




On 3/25/2021 5:09 AM, Len Brown wrote:

On Tue, Mar 23, 2021 at 11:15 PM Liu, Jing2  wrote:


IMO, the problem with AVX512 state
is that we guaranteed it will be zero for XINUSE=0.
That means we have to write 0's on saves.

why "we have to write 0's on saves" when XINUSE=0.

Since due to SDM, if XINUSE=0, the XSAVES will *not* save the data and
xstate_bv bit is 0; if use XSAVE, it need save the state but
xstate_bv bit is also 0.

   It would be better
to be able to skip the write -- even if we can't save the space
we can save the data transfer.  (This is what we did for AMX).

With XFD feature that XFD=1, XSAVE command still has to save INIT state
to the area. So it seems with XINUSE=0 and XFD=1, the XSAVE(S) commands
do the same that both can help save the data transfer.

Hi Jing, Good observation!

There are 3 cases.

Hi Len, thanks for your reply.


1. Task context switch save into the context switch buffer.
Here we use XSAVES, and as you point out, XSAVES includes
the compaction optimization feature tracked by XINUSE.
So when AMX is enabled, but clean, XSAVES doesn't write zeros.
Further, it omits the buffer space for AMX in the destination altogether!
However, since XINUSE=1 is possible, we have to *allocate* a buffer
large enough to handle the dirty data for when XSAVES can not
employ that optimization.

Yes, I agree with you about the first case.


2. Entry into user signal handler saves into the user space sigframe.
Here we use XSAVE, and so the hardware will write zeros for XINUSE=0,
and for AVX512, we save neither time or space.

My understanding that for application compatibility, we can *not* compact
the destination buffer that user-space sees.  This is because existing code
may have adopted fixed size offsets.  (which is unfortunate).



And so, for AVX512, we both reserve the space, and we write zeros
for clean AVX512 state.
By XSAVE, I think this is true if we assume setting EDX:EAX AVX512 bits 
as 1,
which means XSAVE will write zeros when XINUSE=0. Is this the same 
assumption

with yours?...

For AMX, we must still reserve the space, but we are not going to write zeros
for clean state.  We so this in software by checking XINUSE=0, and clearing
the xstate_bf for the XSAVE.  As a result, for XINUSE=0, we can skip
writing the zeros, even though we can't compress the space.

So my understanding is that clearing xstate_bv will not help prevent saving
zeros, but only not masking EDX:EAX, since the following logic. Not sure if
this is just what you mean. :)

RFBM ← XCR0 AND EDX:EAX; /* bitwise logical AND */
OLD_BV ← XSTATE_BV field from XSAVE header;
...
FOR i ← 2 TO 62
IF RFBM[i] = 1
THEN save XSAVE state component i at offset n from base of XSAVE area;
FI;
ENDFOR;

XSTATE_BV field in XSAVE header ← (OLD_BV AND NOT RFBM) OR (XINUSE AND 
RFBM);



3. user space always uses fully uncompacted XSAVE buffers.


The reason I'm interested in XINUSE denotation is that it might be helpful
for the XFD MSRs context switch cost during vmexit and vmenter.

As the guest OS may be using XFD, the VMM can not use it for itself.
Rather, the VMM must context switch it when it switches between guests.
(or not expose it to guests at all)


My understand is that KVM cannot assume that userspace qemu uses XFD or not,
so KVM need context switch XFD between vcpu threads when vmexit/vmenter.

That's why I am thinking about detecting XINUSE when vmexit, otherwise, a
wrong armed IA32_XFD will impact XSAVES/XRSTORS causing guest AMX states
lost.

Thanks,
Jing


cheers,
-Len


cheers,
Len Brown, Intel Open Source Technology Center




Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state

2021-03-23 Thread Liu, Jing2




On 3/24/2021 5:01 AM, Len Brown wrote:

I have an obnoxious question: do we really want to use the XFD mechanism?

Obnoxious questions are often the most valuable! :-)

[...]
cheers,
Len Brown, Intel Open Source Technology Center

ps. I agree that un-necessary XINUSE=1 is possible.
Notwithstanding the issues initially deploying AVX512, I am skeptical
that it is common today.

Sorry, I'm trying to understand from...

IMO, the problem with AVX512 state
is that we guaranteed it will be zero for XINUSE=0.
That means we have to write 0's on saves.

why "we have to write 0's on saves" when XINUSE=0.

Since due to SDM, if XINUSE=0, the XSAVES will *not* save the data and
xstate_bv bit is 0; if use XSAVE, it need save the state but
xstate_bv bit is also 0.

  It would be better
to be able to skip the write -- even if we can't save the space
we can save the data transfer.  (This is what we did for AMX).

With XFD feature that XFD=1, XSAVE command still has to save INIT state
to the area. So it seems with XINUSE=0 and XFD=1, the XSAVE(S) commands
do the same that both can help save the data transfer.

The reason I'm interested in XINUSE denotation is that it might be helpful
for the XFD MSRs context switch cost during vmexit and vmenter.

Thanks,
Jing


pps. your idea of requiring the user to allocate their own signal stack
is interesting.   It isn't really about allocating the stack, though --
the stack of the task that uses the feature is generally fine already.
The opportunity is to allow tasks that do *not* use the new feature to
get away with minimal data transfer and stack size.  As we don't
have the 0's guarantee for AMX, we bought the important part
of that back.




Re: [PATCH v2] KVM: x86: Revise guest_fpu xcomp_bv field

2021-03-08 Thread Liu, Jing2




On 3/2/2021 7:59 AM, Sean Christopherson wrote:

On Thu, Feb 25, 2021, Jing Liu wrote:

XCOMP_BV[63] field indicates that the save area is in the compacted
format and XCOMP_BV[62:0] indicates the states that have space allocated
in the save area, including both XCR0 and XSS bits enabled by the host
kernel. Use xfeatures_mask_all for calculating xcomp_bv and reuse
XCOMP_BV_COMPACTED_FORMAT defined by kernel.

Signed-off-by: Jing Liu 
---
  arch/x86/kvm/x86.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b404e4d7dd8..f115493f577d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4435,8 +4435,6 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
kvm_vcpu *vcpu,
return 0;
  }
  
-#define XSTATE_COMPACTION_ENABLED (1ULL << 63)

-
  static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
  {
struct xregs_state *xsave = >arch.guest_fpu->state.xsave;
@@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
/* Set XSTATE_BV and possibly XCOMP_BV.  */
xsave->header.xfeatures = xstate_bv;
if (boot_cpu_has(X86_FEATURE_XSAVES))
-   xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
+   xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT |
+xfeatures_mask_all;

Doesn't fill_xsave also need to be updated?  Not with xfeatures_mask_all, but
to account for arch.ia32_xss?  I believe it's a nop with the current code, since
supported_xss is zero, but it should be fixed, no?
Yes. For the arch.ia32_xss, I noticed CET 
(https://lkml.org/lkml/2020/7/15/1412)
has posted related change so I didn't touch xstate_bv for fill_xsave for 
now.

Finally, fill_xsave() need e.g. arch.guest_supported_xss for xstate_bv,
for xcomp_bv, xfeatures_mask_all is ok.


  
  	/*

 * Copy each region from the non-compacted offset to the
@@ -9912,9 +9911,6 @@ static void fx_init(struct kvm_vcpu *vcpu)
return;
  
  	fpstate_init(>arch.guest_fpu->state);

-   if (boot_cpu_has(X86_FEATURE_XSAVES))
-   vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv =
-   host_xcr0 | XSTATE_COMPACTION_ENABLED;

Ugh, this _really_ needs a comment in the changelog.  It took me a while to
realize fpstate_init() does exactly what the new fill_xave() is doing.

How about introducing that "fx_init()->fpstate_init() initializes xcomp_bv
of guest_fpu so no need to set again in later fill_xsave() and 
load_xsave()"

in commit message?


And isn't the code in load_xsave() redundant and can be removed?

Oh, yes. Keep fx_init() initializing xcomp_bv for guest_fpu is enough.
Let me remove it in load_xsave later.
And for fill_xsave(), I think no need to set xcomp_bv there.


Any code that
uses get_xsave_addr() would be have a dependency on load_xsave() if it's not
redundant, and I can't see how that would work.

Sorry I didn't quite understand why get_xsave_addr() has dependency on
load_xsave(), do you mean the xstate_bv instead of xcomp_bv, that 
load_xsave()

uses it to get the addr?

Thanks,
Jing


  
  	/*

 * Ensure guest xcr0 is valid for loading
--
2.18.4





Re: [PATCH v1] kvm: x86: Revise guest_fpu xcomp_bv field

2021-02-24 Thread Liu, Jing2




On 2/25/2021 4:40 AM, Sean Christopherson wrote:

On Tue, Feb 23, 2021, Liu, Jing2 wrote:

XCOMP_BV[63] field indicates that the save area is in the
compacted format and XCOMP_BV[62:0] indicates the states that
have space allocated in the save area, including both XCR0
and XSS bits enable by the host kernel. Use xfeatures_mask_all
for calculating xcomp_bv and reuse XCOMP_BV_COMPACTED_FORMAT
defined by kernel.

Works for me, just please wrap at ~73-75 chars, not ~64.

Thanks!

Sure, let me update v2.

BRs,
Jing



Re: [PATCH v1] kvm: x86: Revise guest_fpu xcomp_bv field

2021-02-22 Thread Liu, Jing2




On 2/23/2021 12:06 AM, Sean Christopherson wrote:

On Mon, Feb 22, 2021, Liu, Jing2 wrote:

On 2/9/2021 1:24 AM, Sean Christopherson wrote:

On Mon, Feb 08, 2021, Dave Hansen wrote:

On 2/8/21 8:16 AM, Jing Liu wrote:

-#define XSTATE_COMPACTION_ENABLED (1ULL << 63)
-
   static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
   {
struct xregs_state *xsave = >arch.guest_fpu->state.xsave;
@@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
/* Set XSTATE_BV and possibly XCOMP_BV.  */
xsave->header.xfeatures = xstate_bv;
if (boot_cpu_has(X86_FEATURE_XSAVES))
-   xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
+   xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT |
+xfeatures_mask_all;

This is wrong, xfeatures_mask_all also tracks supervisor states.

When looking at SDM Vol2 XSAVES instruction Operation part, it says as
follows,

RFBM ← (XCR0 OR IA32_XSS) AND EDX:EAX;
COMPMASK ← RFBM OR 8000_H;
...

XCOMP_BV field in XSAVE header ← COMPMASK;


So it seems xcomp_bv also tracks supervisor states?

Yes, sorry, I got distracted by Dave's question and didn't read the changelog
closely.

Now that I have, I find "Since fpstate_init() has initialized xcomp_bv, let's
just use that." confusing.  I think what you intend to say is that we can use
the same _logic_ as fpstate_init_xstate() for calculating xcomp_bv.

Yes, that's the idea.


That said, it would be helpful for the changelog to explain why it's correct to
use xfeatures_mask_all, e.g. just a short comment stating that the variable 
holds
all XCR0 and XSS bits enabled by the host kernel.  Justifying a change with
"because other code does it" is sketchy, becuse there's no guarantee that what
something else does is also correct for KVM, or that the existing code itself is
even correct.

Got it, thanks for the details on this.
Then how about making the commit message like,

XCOMP_BV[63] field indicates that the save area is in the
compacted format and XCOMP_BV[62:0] indicates the states that
have space allocated in the save area, including both XCR0
and XSS bits enable by the host kernel. Use xfeatures_mask_all
for calculating xcomp_bv and reuse XCOMP_BV_COMPACTED_FORMAT
defined by kernel.

Thanks,
Jing




Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch

2021-02-22 Thread Liu, Jing2




On 2/9/2021 2:55 AM, Konrad Rzeszutek Wilk wrote:

On Mon, Feb 08, 2021 at 07:12:22PM +0100, Paolo Bonzini wrote:

[...]


However, running the host with _more_ bits set than necessary in XFD should
not be a problem as long as the host doesn't use the AMX instructions.  So
perhaps Jing can look into keeping XFD=0 for as little time as possible, and
XFD=host_XFD|guest_XFD as much as possible.

This sounds like the lazy-fpu (eagerfpu?) that used to be part of the
kernel? I recall that we had a CVE for that - so it may also be worth
double-checking that we don't reintroduce that one.

Not sure if this means lazy restore, but the spec mentions that
"System software should not use XFD to implement a 'lazy restore' approach
to management of the XTILEDATA state component." One reason is XSAVE(S)
will lose the xTILEDATA when XFD[i] is nonzero.

Thanks,
Jing


Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch

2021-02-22 Thread Liu, Jing2



On 2/9/2021 2:12 AM, Paolo Bonzini wrote:

On 08/02/21 19:04, Sean Christopherson wrote:
That said, the case where we saw MSR autoload as faster involved 
EFER, and
we decided that it was due to TLB flushes (commit f6577a5fa15d, 
"x86, kvm,
vmx: Always use LOAD_IA32_EFER if available", 2014-11-12). Do you 
know if

RDMSR/WRMSR is always slower than MSR autoload?
RDMSR/WRMSR may be marginally slower, but only because the autoload 
stuff avoids

serializing the pipeline after every MSR.


That's probably adding up quickly...


The autoload paths are effectively
just wrappers around the WRMSR ucode, plus some extra VM-Enter 
specific checks,
as ucode needs to perform all the normal fault checks on the index 
and value.
On the flip side, if the load lists are dynamically constructed, I 
suspect the
code overhead of walking the lists negates any advantages of the load 
lists.


... but yeah this is not very encouraging.

Thanks for reviewing the patches.

Context switch time is a problem for XFD.  In a VM that uses AMX, most 
threads in the guest will have nonzero XFD but the vCPU thread itself 
will have zero XFD.  So as soon as one thread in the VM forces the 
vCPU thread to clear XFD, you pay a price on all vmexits and vmentries.




Spec says,
"If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it saves bit i of XSTATE_BV field of the XSAVE header as 0
(indicating that the state component is in its initialized state).
With the exception of XSAVE, no data is saved for the state
component (XSAVE saves the initial value of the state component..."

Thus, the key point is not losing the non initial AMX state on vmexit
and vmenter. If AMX state is in initialized state, it doesn't matter.
Otherwise, XFD[i] should not be armed with a nonzero value.

If we don't want to extremely set XFD=0 every time on vmexit, it would
be useful to first detect if guest AMX state is initial or not.
How about using XINUSE notation here?
(Details in SDM vol1 13.6 PROCESSOR TRACKING OF
XSAVE-MANAGED STATE, and vol2 XRSTOR/XRSTORS instruction operation part)
The main idea is processor tracks the status of various state components
by XINUSE, and it shows if the state component is in use or not.
When XINUSE[i]=0, state component i is in initial configuration.
Otherwise, kvm should take care of XFD on vmexit.


However, running the host with _more_ bits set than necessary in XFD 
should not be a problem as long as the host doesn't use the AMX 
instructions. 
Does "running the host" mean running in kvm? why need more bits 
(host_XFD|guest_XFD),

I'm trying to think about the case that guest_XFD is not enough? e.g.
In guest, it only need bit i when guest supports it and guest uses
the passthru XFD[i] for detecting dynamic usage;
In kvm, kvm doesn't use AMX instructions; and "system software should not
use XFD to implement a 'lazy restore' approach to management of the 
XTILEDATA

state component."
Out of kvm, kernel ensures setting correct XFD for threads when scheduling;

Thanks,
Jing

So perhaps Jing can look into keeping XFD=0 for as little time as 
possible, and XFD=host_XFD|guest_XFD as much as possible.


Paolo





Re: [PATCH v1] kvm: x86: Revise guest_fpu xcomp_bv field

2021-02-21 Thread Liu, Jing2




On 2/9/2021 1:24 AM, Sean Christopherson wrote:

On Mon, Feb 08, 2021, Dave Hansen wrote:

On 2/8/21 8:16 AM, Jing Liu wrote:

-#define XSTATE_COMPACTION_ENABLED (1ULL << 63)
-
  static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
  {
struct xregs_state *xsave = >arch.guest_fpu->state.xsave;
@@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
/* Set XSTATE_BV and possibly XCOMP_BV.  */
xsave->header.xfeatures = xstate_bv;
if (boot_cpu_has(X86_FEATURE_XSAVES))
-   xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
+   xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT |
+xfeatures_mask_all;

This is wrong, xfeatures_mask_all also tracks supervisor states.
When looking at SDM Vol2 XSAVES instruction Operation part, it says as 
follows,


RFBM ← (XCR0 OR IA32_XSS) AND EDX:EAX;
COMPMASK ← RFBM OR 8000_H;
...

XCOMP_BV field in XSAVE header ← COMPMASK;


So it seems xcomp_bv also tracks supervisor states?

BRs,
Jing



Are 'host_xcr0' and 'xfeatures_mask_all' really interchangeable?  If so,
shouldn't we just remove 'host_xcr0' everywhere?

I think so?  But use xfeatures_mask_user().

In theory, host_xss can also be replaced with the _supervisor() and _dynamic()
variants.  That code needs a good hard look at the _dynamic() features, which is
currently just architectural LBRs.  E.g. I wouldn't be surprised if KVM 
currently
fails to save/restore arch LBRs due to the bit not being set in host_xss.




Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch

2021-02-07 Thread Liu, Jing2




On 2/7/2021 7:49 PM, Borislav Petkov wrote:

On Sun, Feb 07, 2021 at 10:42:52AM -0500, Jing Liu wrote:

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 7e0c68043ce3..fbb761fc13ec 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -145,6 +145,7 @@ EXPORT_SYMBOL_GPL(fpu_kernel_xstate_min_size);
   * can be dynamically expanded to include some states up to this size.
   */
  unsigned int fpu_kernel_xstate_max_size;
+EXPORT_SYMBOL_GPL(fpu_kernel_xstate_max_size);
  
  /* Get alignment of the TYPE. */

  #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 080f3be9a5e6..9c471a0364e2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -77,12 +77,14 @@ static struct xfeature_capflag_info xfeature_capflags[] 
__initdata = {
   * XSAVE buffer, both supervisor and user xstates.
   */
  u64 xfeatures_mask_all __read_mostly;
+EXPORT_SYMBOL_GPL(xfeatures_mask_all);
  
  /*

   * This represents user xstates, a subset of xfeatures_mask_all, saved in a
   * dynamic kernel XSAVE buffer.
   */
  u64 xfeatures_mask_user_dynamic __read_mostly;
+EXPORT_SYMBOL_GPL(xfeatures_mask_user_dynamic);
  
  static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};

  static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 
1] = -1};

Make sure you Cc x...@kernel.org when touching code outside of kvm.

There's this script called scripts/get_maintainer.pl which will tell you who to
Cc. Use it before you send next time please.

Thx.

Thank you for the reminder. I'll cc that next time.

BRs,
Jing



Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate

2021-01-14 Thread Liu, Jing2



On 1/15/2021 12:59 PM, Bae, Chang Seok wrote:

On Jan 11, 2021, at 18:52, Liu, Jing2  wrote:

On 1/8/2021 2:40 AM, Bae, Chang Seok wrote:

On Jan 7, 2021, at 17:41, Liu, Jing2  wrote:

static void kvm_save_current_fpu(struct fpu *fpu)  {
+   struct fpu *src_fpu = >thread.fpu;
+
/*
 * If the target FPU state is not resident in the CPU registers, just
 * memcpy() from current, else save CPU state directly to the target.
 */
-   if (test_thread_flag(TIF_NEED_FPU_LOAD))
-   memcpy(>state, >thread.fpu.state,
+   if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+   memcpy(>state, _fpu->state,
   fpu_kernel_xstate_min_size);




-   else
+   } else {
+   if (fpu->state_mask != src_fpu->state_mask)
+   fpu->state_mask = src_fpu->state_mask;

Though dynamic feature is not supported in kvm now, this function still need
consider more things for fpu->state_mask.

Can you elaborate this? Which path might be affected by fpu->state_mask
without dynamic state supported in KVM?


I suggest that we can set it before if...else (for both cases) and not change 
other.

I tried a minimum change here.  The fpu->state_mask value does not impact the
memcpy(). So, why do we need to change it for both?

Sure, what I'm considering is that "mask" is the first time introduced into 
"fpu",
representing the usage, so not only set it when needed, but also make it as a
representation, in case of anywhere using it (especially between the interval
of this series and kvm series in future).

Thank you for the feedback. Sorry, I don't get any logical reason to set the
mask always here.


Sure, I'd like to see if fx_init()->memset is the case,

though maybe no hurt so far in test.

Thanks,

Jing


  Perhaps, KVM code can be updated like you mentioned when
supporting the dynamic states there.

Please let me know if I’m missing any functional issues.

Thanks,
Chang


Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate

2021-01-11 Thread Liu, Jing2



On 1/8/2021 2:40 AM, Bae, Chang Seok wrote:

On Jan 7, 2021, at 17:41, Liu, Jing2  wrote:

static void kvm_save_current_fpu(struct fpu *fpu)  {
+   struct fpu *src_fpu = >thread.fpu;
+
/*
 * If the target FPU state is not resident in the CPU registers, just
 * memcpy() from current, else save CPU state directly to the target.
 */
-   if (test_thread_flag(TIF_NEED_FPU_LOAD))
-   memcpy(>state, >thread.fpu.state,
+   if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+   memcpy(>state, _fpu->state,
   fpu_kernel_xstate_min_size);
For kvm, if we assume that it does not support dynamic features until this 
series,
memcpy for only fpu->state is correct.
I think this kind of assumption is reasonable and we only make original xstate 
work.

-   else
+   } else {
+   if (fpu->state_mask != src_fpu->state_mask)
+   fpu->state_mask = src_fpu->state_mask;

Though dynamic feature is not supported in kvm now, this function still need
consider more things for fpu->state_mask.

Can you elaborate this? Which path might be affected by fpu->state_mask
without dynamic state supported in KVM?


I suggest that we can set it before if...else (for both cases) and not change 
other.

I tried a minimum change here.  The fpu->state_mask value does not impact the
memcpy(). So, why do we need to change it for both?


Sure, what I'm considering is that "mask" is the first time introduced 
into "fpu",


representing the usage, so not only set it when needed, but also make it 
as a


representation, in case of anywhere using it (especially between the 
interval


of this series and kvm series in future).

Thanks,

Jing


Thanks,
Chang


RE: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate

2021-01-07 Thread Liu, Jing2



-Original Message-
From: Bae, Chang Seok  
Sent: Wednesday, December 23, 2020 11:57 PM
To: b...@suse.de; l...@kernel.org; t...@linutronix.de; mi...@kernel.org; 
x...@kernel.org
Cc: Brown, Len ; Hansen, Dave ; 
Liu, Jing2 ; Shankar, Ravi V ; 
linux-kernel@vger.kernel.org; Bae, Chang Seok ; 
k...@vger.kernel.org
Subject: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to 
support dynamic xstate

copy_xregs_to_kernel() used to save all user states in a kernel buffer.
When the dynamic user state is enabled, it becomes conditional which state to 
be saved.

fpu->state_mask can indicate which state components are reserved to be
saved in XSAVE buffer. Use it as XSAVE's instruction mask to select states.

KVM used to save all xstate via copy_xregs_to_kernel(). Update KVM to set a 
valid fpu->state_mask, which will be necessary to correctly handle dynamic 
state buffers.

See comments together below.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae 
Reviewed-by: Len Brown 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: k...@vger.kernel.org
[...]
/*
 * AVX512 state is tracked here because its use is diff --git 
a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4aecfba04bd3..93b5bacad67a 
100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9214,15 +9214,20 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 
 static void kvm_save_current_fpu(struct fpu *fpu)  {
+   struct fpu *src_fpu = >thread.fpu;
+
/*
 * If the target FPU state is not resident in the CPU registers, just
 * memcpy() from current, else save CPU state directly to the target.
 */
-   if (test_thread_flag(TIF_NEED_FPU_LOAD))
-   memcpy(>state, >thread.fpu.state,
+   if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+   memcpy(>state, _fpu->state,
   fpu_kernel_xstate_min_size);
For kvm, if we assume that it does not support dynamic features until this 
series,
memcpy for only fpu->state is correct. 
I think this kind of assumption is reasonable and we only make original xstate 
work.

-   else
+   } else {
+   if (fpu->state_mask != src_fpu->state_mask)
+   fpu->state_mask = src_fpu->state_mask;

Though dynamic feature is not supported in kvm now, this function still need
consider more things for fpu->state_mask.
I suggest that we can set it before if...else (for both cases) and not change 
other. 

Thanks,
Jing

copy_fpregs_to_fpstate(fpu);
+   }

 }

 
 /* Swap (qemu) user FPU context for the guest FPU context. */
--
2.17.1



Re: [PATCH RFC] kvm: x86: Expose AVX512_BF16 feature to guest

2019-06-20 Thread Liu, Jing2

Hi Paolo,

On 6/20/2019 8:16 PM, Paolo Bonzini wrote:

On 20/06/19 13:21, Jing Liu wrote:

+   for (i = 1; i <= times; i++) {
+   if (*nent >= maxnent)
+   goto out;
+   do_cpuid_1_ent([i], function, i);
+   entry[i].eax &= F(AVX512_BF16);
+   entry[i].ebx = 0;
+   entry[i].ecx = 0;
+   entry[i].edx = 0;
+   entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+   ++*nent;


This woud be wrong for i > 1, so instead make this

if (entry->eax >= 1)



I am confused about the @index parameter. @index seems not used for
every case except 0x07. Since the caller function only has @index=0, so
all other cases except 0x07 put cpuid info from subleaf=0 to max subleaf.

What do you think about @index in current function? Does it mean, we
need put cpuid from index to max subleaf to @entry[i]? If so, the logic
seems as follows,

if (index == 0) {
// Put subleaf 0 into @entry
// Put subleaf 1 into @entry[1]
} else if (index < entry->eax) {
// Put subleaf 1 into @entry
} else {
// Put all zero into @entry
}

But this seems not identical with other cases, for current caller
function. Or we can simply ignore @index in 0x07 and just put all possible
subleaf info back?


and define F(AVX512_BF16) as a new constant kvm_cpuid_7_1_eax_features.


Got it.


Thanks,
Jing


Paolo