Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-16 Thread Andrew Cooper
On 07/01/2020 16:17, Wei Liu wrote:
> On Sun, Jan 05, 2020 at 10:06:08PM +, Andrew Cooper wrote:
>> On 05/01/2020 21:22, Wei Liu wrote:
>>> On Sun, Jan 05, 2020 at 07:08:28PM +, Andrew Cooper wrote:
> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, 
> paddr_t output)
> +{
> +uint64_t status;
> +
> +asm volatile ("mov %[output], %%r8\n"
> +  "call hv_hypercall_page"
> +  : "=a" (status), "+c" (control),
> +"+d" (input) ASM_CALL_CONSTRAINT
> +  : [output] "rm" (output)
> +  : "cc", "memory", "r8", "r9", "r10", "r11");
 I think you want:

 register unsigned long r8 asm("r8") = output;

 and "+r" (r8) as an output constraint.
>>> Although it is named `output`, it is really just an input parameter from
>>> Hyper-V's PoV.
>> Yes, but it is also clobbered.
>>
>> This is an awkward corner case of gnu inline assembly.
>>
>> It is not permitted to have a clobber list overlap with any input/output
>> operations, and because r8 doesn't have a unique letter, you can't do
>> the usual trick of "=r8" (discard) : "r8" (input).
>>
>> The only available option is to mark it as read and written (which is
>> "+r" in the output list), and not use the C variable r8 at any point later.
> But r8 is only listed in clobber list, so it certainly doesn't overlap
> with any input register. I fail to see what the bug (if there is any) is
> here.

%r8 is an input parameter.  You have "mov %[output], %%r8" in the asm.

The way this is written causes the compiler to construct %[output] in a
register, pass it in via the sole input parameter, and behind the scenes
move it into %r8.

There is a small chance of the emitted asm being "mov %r8, %r8", and a
much larger chance of the compiler being forced to spill a different
register when it could have used %r8 in the first place.

> I think what you're asking for here is an optimisation. Is that correct?
> I don't mind changing the code. What I need is clarification here.

I'm on the fence as to whether to put in the category of optimisation,
or buggy asm.

I think the generated code will DTRT, but it is a suspicious looking
piece of assembly, so "optimisation" I guess?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-16 Thread Wei Liu
On Sun, Jan 05, 2020 at 07:08:28PM +, Andrew Cooper wrote:
> 
> > +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, 
> > paddr_t output)
> > +{
> > +uint64_t status;
> > +
> > +asm volatile ("mov %[output], %%r8\n"
> > +  "call hv_hypercall_page"
> > +  : "=a" (status), "+c" (control),
> > +"+d" (input) ASM_CALL_CONSTRAINT
> > +  : [output] "rm" (output)
> > +  : "cc", "memory", "r8", "r9", "r10", "r11");
> 
> I think you want:
> 
> register unsigned long r8 asm("r8") = output;
> 
> and "+r" (r8) as an output constraint.
> 
> In particular, that doesn't force the compiler to put output into a
> register other than r8 (or worse, spill it to the stack) to have the
> opaque blob of asm move it back into r8.  What it will do in practice is
> cause the compiler to construct output directly in r8.
> 
> As for the other clobbers, I can't find anything at all in the spec
> which even mentions those registers.  There will be a decent improvement
> to code generation if we don't force them to be spilled around a hypercall.

This is actually from Windows 2012 R2's TLFS.

Current version of Hyper-V doesn't clobber those registers anymore. I
think just putting "memory" there should be fine.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-07 Thread Wei Liu
On Mon, Jan 06, 2020 at 10:38:23AM +0100, Jan Beulich wrote:
[...]
> > +
> > +static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t 
> > rep_count,
> > +   uint16_t varhead_size,
> > +   paddr_t input, paddr_t output)
> > +{
> > +uint64_t control = code;
> > +uint64_t status;
> > +uint16_t rep_comp;
> > +
> > +control |= (uint64_t)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET;
> > +control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET;
> > +
> > +do {
> > +status = hv_do_hypercall(control, input, output);
> > +if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS )
> > +break;
> > +
> > +rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >>
> > +HV_HYPERCALL_REP_COMP_OFFSET;
> 
> MASK_EXTR()? (I then also wonder whether MASK_INSR() would better be
> used with some of the other constructs here.)

Sure, I can see if that can be used.

> 
> What's worse though - looking at the definition of
> HV_HYPERCALL_REP_COMP_MASK I notice that it and a few others use
> GENMASK_ULL(), when it was clearly said during review (perhaps of
> another but related patch) that this macro should not be used
> outside of Arm-specific code until it gets put into better shape:
> https://lists.xenproject.org/archives/html/xen-devel/2019-12/msg00705.html

That's a straight import from Linux. I only made the header build
without further inspection.

That can be fixed, of course.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-07 Thread Wei Liu
On Sun, Jan 05, 2020 at 10:06:08PM +, Andrew Cooper wrote:
> On 05/01/2020 21:22, Wei Liu wrote:
> > On Sun, Jan 05, 2020 at 07:08:28PM +, Andrew Cooper wrote:
> >>> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, 
> >>> paddr_t output)
> >>> +{
> >>> +uint64_t status;
> >>> +
> >>> +asm volatile ("mov %[output], %%r8\n"
> >>> +  "call hv_hypercall_page"
> >>> +  : "=a" (status), "+c" (control),
> >>> +"+d" (input) ASM_CALL_CONSTRAINT
> >>> +  : [output] "rm" (output)
> >>> +  : "cc", "memory", "r8", "r9", "r10", "r11");
> >> I think you want:
> >>
> >> register unsigned long r8 asm("r8") = output;
> >>
> >> and "+r" (r8) as an output constraint.
> > Although it is named `output`, it is really just an input parameter from
> > Hyper-V's PoV.
> 
> Yes, but it is also clobbered.
> 
> This is an awkward corner case of gnu inline assembly.
> 
> It is not permitted to have a clobber list overlap with any input/output
> operations, and because r8 doesn't have a unique letter, you can't do
> the usual trick of "=r8" (discard) : "r8" (input).
> 
> The only available option is to mark it as read and written (which is
> "+r" in the output list), and not use the C variable r8 at any point later.

But r8 is only listed in clobber list, so it certainly doesn't overlap
with any input register. I fail to see what the bug (if there is any) is
here.

I think what you're asking for here is an optimisation. Is that correct?
I don't mind changing the code. What I need is clarification here.

> 
> 
> Having looked through the spec a bit more, is this a wise API to have in
> the first place?  input and output (perhaps better named input_addr and
> output_addr) are fixed per CPU, and control is semantically linked to
> the hypercall and its particular ABI.
> 
> I suppose the answer ultimately depends on what the callers look like.

The call sites will be like

struct hv_input_arg *input_arg;
input_arg = per_cpu_input_page;
input_arg.foo = xxx;
input_arg.bar = xxx;

hv_do_hypercall(control, virt_to_maddr(input_arg), NULL);

.

(Alternatively, we can put virt_to_maddr in hv_do_hypercall now that
we're sure the input page is from xenheap)

> 
> >
> >> In particular, that doesn't force the compiler to put output into a
> >> register other than r8 (or worse, spill it to the stack) to have the
> >> opaque blob of asm move it back into r8.  What it will do in practice is
> >> cause the compiler to construct output directly in r8.
> >>
> >> As for the other clobbers, I can't find anything at all in the spec
> >> which even mentions those registers.  There will be a decent improvement
> >> to code generation if we don't force them to be spilled around a hypercall.
> >>
> > Neither can I. But Linux's commit says that's needed, so I chose to err
> > on the safe side.
> 
> That's dull.  Is there any qualifying information?

See Linux commit fc53662f13b.

I will also ask my contact in Hyper-V team for clarification.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-06 Thread Jan Beulich
On 05.01.2020 17:47, Wei Liu wrote:
> +static inline uint64_t hv_do_fast_hypercall(uint16_t code,
> +uint64_t input1, uint64_t input2)
> +{
> +uint64_t status;
> +uint64_t control = (uint64_t)code | HV_HYPERCALL_FAST_BIT;

Unnecessary (afaict) cast.

> +asm volatile ("mov %[input2], %%r8\n"
> +  "call hv_hypercall_page"
> +  : "=a" (status), "+c" (control),
> +"+d" (input1) ASM_CALL_CONSTRAINT
> +  : [input2] "rm" (input2)
> +  : "cc", "r8", "r9", "r10", "r11");
> +
> +return status;
> +}
> +
> +static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t rep_count,
> +   uint16_t varhead_size,
> +   paddr_t input, paddr_t output)
> +{
> +uint64_t control = code;
> +uint64_t status;
> +uint16_t rep_comp;
> +
> +control |= (uint64_t)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET;
> +control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET;
> +
> +do {
> +status = hv_do_hypercall(control, input, output);
> +if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS )
> +break;
> +
> +rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >>
> +HV_HYPERCALL_REP_COMP_OFFSET;

MASK_EXTR()? (I then also wonder whether MASK_INSR() would better be
used with some of the other constructs here.)

What's worse though - looking at the definition of
HV_HYPERCALL_REP_COMP_MASK I notice that it and a few others use
GENMASK_ULL(), when it was clearly said during review (perhaps of
another but related patch) that this macro should not be used
outside of Arm-specific code until it gets put into better shape:
https://lists.xenproject.org/archives/html/xen-devel/2019-12/msg00705.html

> +control &= ~HV_HYPERCALL_REP_START_MASK;
> +control |= (uint64_t)rep_comp << HV_HYPERCALL_REP_COMP_OFFSET;
> +
> +} while ( rep_comp < rep_count );

We don't normally have a blank line ahead of the closing brace of a
block.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-05 Thread Andrew Cooper
On 05/01/2020 21:22, Wei Liu wrote:
> On Sun, Jan 05, 2020 at 07:08:28PM +, Andrew Cooper wrote:
>>> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, 
>>> paddr_t output)
>>> +{
>>> +uint64_t status;
>>> +
>>> +asm volatile ("mov %[output], %%r8\n"
>>> +  "call hv_hypercall_page"
>>> +  : "=a" (status), "+c" (control),
>>> +"+d" (input) ASM_CALL_CONSTRAINT
>>> +  : [output] "rm" (output)
>>> +  : "cc", "memory", "r8", "r9", "r10", "r11");
>> I think you want:
>>
>> register unsigned long r8 asm("r8") = output;
>>
>> and "+r" (r8) as an output constraint.
> Although it is named `output`, it is really just an input parameter from
> Hyper-V's PoV.

Yes, but it is also clobbered.

This is an awkward corner case of gnu inline assembly.

It is not permitted to have a clobber list overlap with any input/output
operations, and because r8 doesn't have a unique letter, you can't do
the usual trick of "=r8" (discard) : "r8" (input).

The only available option is to mark it as read and written (which is
"+r" in the output list), and not use the C variable r8 at any point later.


Having looked through the spec a bit more, is this a wise API to have in
the first place?  input and output (perhaps better named input_addr and
output_addr) are fixed per CPU, and control is semantically linked to
the hypercall and its particular ABI.

I suppose the answer ultimately depends on what the callers look like.

>
>> In particular, that doesn't force the compiler to put output into a
>> register other than r8 (or worse, spill it to the stack) to have the
>> opaque blob of asm move it back into r8.  What it will do in practice is
>> cause the compiler to construct output directly in r8.
>>
>> As for the other clobbers, I can't find anything at all in the spec
>> which even mentions those registers.  There will be a decent improvement
>> to code generation if we don't force them to be spilled around a hypercall.
>>
> Neither can I. But Linux's commit says that's needed, so I chose to err
> on the safe side.

That's dull.  Is there any qualifying information?

>> However, HyperV will(may?) modify %xmm{0..5} in some cases, and this
>> will corrupt the current vcpu's FPU context.  Care will need to be taken
>> to spill these if necessary.
>>
> The hypercalls we care about (TLB, APIC etc) don't use that ABI as far
> as I can tell. At the very least Linux, which uses the same set of
> hypercalls, doesn't need to save / restore XMM registers around the
> calls.

Ok - it looks to be fine until we need to use a hypercall which uses the
fast extended ABI, which is the first to introduce the use of the %xmm
registers.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-05 Thread Wei Liu
On Sun, Jan 05, 2020 at 07:08:28PM +, Andrew Cooper wrote:
> 
> > +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, 
> > paddr_t output)
> > +{
> > +uint64_t status;
> > +
> > +asm volatile ("mov %[output], %%r8\n"
> > +  "call hv_hypercall_page"
> > +  : "=a" (status), "+c" (control),
> > +"+d" (input) ASM_CALL_CONSTRAINT
> > +  : [output] "rm" (output)
> > +  : "cc", "memory", "r8", "r9", "r10", "r11");
> 
> I think you want:
> 
> register unsigned long r8 asm("r8") = output;
> 
> and "+r" (r8) as an output constraint.

Although it is named `output`, it is really just an input parameter from
Hyper-V's PoV. It contains the address of the page Hyper-V should write
its output to.

> 
> In particular, that doesn't force the compiler to put output into a
> register other than r8 (or worse, spill it to the stack) to have the
> opaque blob of asm move it back into r8.  What it will do in practice is
> cause the compiler to construct output directly in r8.
> 
> As for the other clobbers, I can't find anything at all in the spec
> which even mentions those registers.  There will be a decent improvement
> to code generation if we don't force them to be spilled around a hypercall.
> 

Neither can I. But Linux's commit says that's needed, so I chose to err
on the safe side.

I'm fine with dropping r9-r11.

> However, HyperV will(may?) modify %xmm{0..5} in some cases, and this
> will corrupt the current vcpu's FPU context.  Care will need to be taken
> to spill these if necessary.
> 

The hypercalls we care about (TLB, APIC etc) don't use that ABI as far
as I can tell. At the very least Linux, which uses the same set of
hypercalls, doesn't need to save / restore XMM registers around the
calls.

Wei.

> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-05 Thread Andrew Cooper

> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, 
> paddr_t output)
> +{
> +uint64_t status;
> +
> +asm volatile ("mov %[output], %%r8\n"
> +  "call hv_hypercall_page"
> +  : "=a" (status), "+c" (control),
> +"+d" (input) ASM_CALL_CONSTRAINT
> +  : [output] "rm" (output)
> +  : "cc", "memory", "r8", "r9", "r10", "r11");

I think you want:

register unsigned long r8 asm("r8") = output;

and "+r" (r8) as an output constraint.

In particular, that doesn't force the compiler to put output into a
register other than r8 (or worse, spill it to the stack) to have the
opaque blob of asm move it back into r8.  What it will do in practice is
cause the compiler to construct output directly in r8.

As for the other clobbers, I can't find anything at all in the spec
which even mentions those registers.  There will be a decent improvement
to code generation if we don't force them to be spilled around a hypercall.

However, HyperV will(may?) modify %xmm{0..5} in some cases, and this
will corrupt the current vcpu's FPU context.  Care will need to be taken
to spill these if necessary.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 2/5] x86/hyperv: provide Hyper-V hypercall functions

2020-01-05 Thread Wei Liu
These functions will be used later to make hypercalls to Hyper-V.

I couldn't find reference in TLFS that Hyper-V clobbers flags and
r9-r11, but Linux's commit message says it does. Err on the safe side.

Signed-off-by: Wei Liu 
---
v3:
1. Name the file hyperv-hcall.h

v2:
1. Use direct call
---
 xen/include/asm-x86/guest/hyperv-hcall.h | 95 
 1 file changed, 95 insertions(+)
 create mode 100644 xen/include/asm-x86/guest/hyperv-hcall.h

diff --git a/xen/include/asm-x86/guest/hyperv-hcall.h 
b/xen/include/asm-x86/guest/hyperv-hcall.h
new file mode 100644
index 00..4b87dcfe64
--- /dev/null
+++ b/xen/include/asm-x86/guest/hyperv-hcall.h
@@ -0,0 +1,95 @@
+/**
+ * asm-x86/guest/hyperv-hcall.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see .
+ *
+ * Copyright (c) 2019 Microsoft.
+ */
+
+#ifndef __X86_HYPERV_HCALL_H__
+#define __X86_HYPERV_HCALL_H__
+
+#include 
+
+#include 
+#include 
+#include 
+
+static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, 
paddr_t output)
+{
+uint64_t status;
+
+asm volatile ("mov %[output], %%r8\n"
+  "call hv_hypercall_page"
+  : "=a" (status), "+c" (control),
+"+d" (input) ASM_CALL_CONSTRAINT
+  : [output] "rm" (output)
+  : "cc", "memory", "r8", "r9", "r10", "r11");
+
+return status;
+}
+
+static inline uint64_t hv_do_fast_hypercall(uint16_t code,
+uint64_t input1, uint64_t input2)
+{
+uint64_t status;
+uint64_t control = (uint64_t)code | HV_HYPERCALL_FAST_BIT;
+
+asm volatile ("mov %[input2], %%r8\n"
+  "call hv_hypercall_page"
+  : "=a" (status), "+c" (control),
+"+d" (input1) ASM_CALL_CONSTRAINT
+  : [input2] "rm" (input2)
+  : "cc", "r8", "r9", "r10", "r11");
+
+return status;
+}
+
+static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t rep_count,
+   uint16_t varhead_size,
+   paddr_t input, paddr_t output)
+{
+uint64_t control = code;
+uint64_t status;
+uint16_t rep_comp;
+
+control |= (uint64_t)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET;
+control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET;
+
+do {
+status = hv_do_hypercall(control, input, output);
+if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS )
+break;
+
+rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >>
+HV_HYPERCALL_REP_COMP_OFFSET;
+
+control &= ~HV_HYPERCALL_REP_START_MASK;
+control |= (uint64_t)rep_comp << HV_HYPERCALL_REP_COMP_OFFSET;
+
+} while ( rep_comp < rep_count );
+
+return status;
+}
+
+#endif /* __X86_HYPERV_HCALL_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel