Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Kuppuswamy, Sathyanarayanan




On 4/20/21 4:53 PM, Dan Williams wrote:

On Tue, Apr 20, 2021 at 4:12 PM Kuppuswamy, Sathyanarayanan
 wrote:
[..]

Also, do you *REALLY* need to do this from assembly?  Can't it be done
in the C wrapper?

Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
so added
it here.




Can I ask a favor?

Please put a line break between quoted lines and your reply.


will do




That's not a good reason.  You could just as easily have a C wrapper
which all uses of TDVMCALL go through.


...because this runs together when reading otherwise.


Any reason for not preferring it in assembly code?
Also, using wrapper will add more complication for in/out instruction
substitution use case. please check the use case in following patch.
https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543


This commit still has open coded assembly for the TDVMCALL? I thought
we talked about it being unified with the common definition, or has
this patch not been reworked with that feedback yet? I expect there is
no performance reason why in/out need to get their own custom coded
TDVMCALL implementation. It should also be the case the failure should
behave the same as native in/out failure i.e. all ones on read
failure, and silent drops on write failure.



That link is for older version. My next version addresses your review
comments (re-uses TDVMCALL() function). Although the patch is ready, I am
waiting to fix other review comments before sending the next version. I
have just shared that link to explain about the use case.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Dan Williams
On Tue, Apr 20, 2021 at 4:12 PM Kuppuswamy, Sathyanarayanan
 wrote:
[..]
> >>> Also, do you *REALLY* need to do this from assembly?  Can't it be done
> >>> in the C wrapper?
> >> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
> >> so added
> >> it here.
> >

Can I ask a favor?

Please put a line break between quoted lines and your reply.

> > That's not a good reason.  You could just as easily have a C wrapper
> > which all uses of TDVMCALL go through.

...because this runs together when reading otherwise.

> Any reason for not preferring it in assembly code?
> Also, using wrapper will add more complication for in/out instruction
> substitution use case. please check the use case in following patch.
> https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543

This commit still has open coded assembly for the TDVMCALL? I thought
we talked about it being unified with the common definition, or has
this patch not been reworked with that feedback yet? I expect there is
no performance reason why in/out need to get their own custom coded
TDVMCALL implementation. It should also be the case the failure should
behave the same as native in/out failure i.e. all ones on read
failure, and silent drops on write failure.


Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Dave Hansen
On 4/20/21 4:12 PM, Kuppuswamy, Sathyanarayanan wrote:
> On 4/20/21 12:59 PM, Dave Hansen wrote:
>> On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
> approach is, it adds a few extra instructions for every
> TDCALL use case when compared to distributed checks. Although
> it's a bit less efficient, it's worth it to make the code more
> readable.

 What's a "distributed check"?
>>>
>>> It should be "distributed TDVMCALL/TDCALL inline assembly calls"
>>
>> It's still not clear to what that refers.
> 
> I am just comparing the performance cost of using generic 
> TDCALL()/TDVMCALL() function implementation with "usage specific"
> (GetQuote,MapGPA, HLT,etc) custom TDCALL()/TDVMCALL() inline assembly
> implementation.

So, I actually had an idea what you were talking about, but I have
*ZERO* idea what "distributed" means in this context.

I think you are trying to say something along the lines of:

Just like syscalls, not all TDVMCALL/TDCALLs use the same set
of argument registers.  The implementation here picks the
current worst-case scenario for TDCALL (4 registers).  For
TDCALLs with fewer than 4 arguments, there will end up being
a few superfluous (cheap) instructions.  But, this approach
maximizes code reuse.


 This also doesn't talk at all about why this approach was
 chosen versus inline assembly.  You're going to be asked "why
 not use inline asm?"
>>> "To make the core more readable and less error prone." I have
>>> added this info in above paragraph. Do you think we need more
>>> argument to justify our approach?
>> 
>> Yes, you need much more justification.  That's pretty generic and 
>> non-specific.
> readability is one of the main motivation for not choosing inline 

I'm curious.  Is there a reason you are not choosing to use
capitalization in your replies?  I personally use capitalization as a
visual clue for where a reply starts.

I'm not sure whether this indicates that your keyboard is not
functioning properly, or that these replies are simply not important
enough to warrant the use of the Shift key.  Or, is it simply an
oversight?  Or, maybe I'm just being overly picky because I've been
working on these exact things with my third-grader a bit too much lately.

Either way, I personally would appreciate your attention to detail in
crafting writing that is easy to parse, since I'm the one that's going
to have to parse it.  Details show that you care about the content you
produce.  Even if you don't mean it, a lack of attention to detail (even
capital letters) can be perceived to mean that you do not care about
what you write.  If you don't care about it, why should the reader?

> assembly. Since number of lines of instructions (with comments) are 
> over 70, using inline assembly made it hard to read. Another reason 
> is, since we
> are using many registers (R8-R15, R[A-D]X)) in TDVMCAL/TDCALL 
> operation, we are not sure whether some older compiler can follow
> our specified inline assembly constraints.

As for the justification, that's much improved.  Please include that,
along with some careful review of the grammar.

> +    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
> +
> +    tdcall
> +
> +    /* Panic if TDCALL reports failure. */
> +    test %rax, %rax
> +    jnz 2f

 Why panic?
>>> As per spec, TDCALL should never fail. Note that it has nothing to do
>>> with TDVMCALL function specific failure (which is reported via R10).
>>
>> You've introduced two concepts here, without differentiating them.  You
>> need to work to differentiate these two kinds of failure somewhere.  You
>> can't simply refer to both as "failure".
> will clarify it. I have assumed that once the user reads the spec, its
> easier
> to understand.

Your code should be 100% self-supporting without the spec.  The spec can
be there in a supportive role to help resolve ambiguity or add fine
detail.  But, I think this is a major, repeated problem with this patch
set: it relies too much on reviewers spending quality time with the spec.

 Also, do you *REALLY* need to do this from assembly?  Can't it be done
 in the C wrapper?
>>> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
>>> so added
>>> it here.
>>
>> That's not a good reason.  You could just as easily have a C wrapper
>> which all uses of TDVMCALL go through.
> Any reason for not preferring it in assembly code?

Assembly is a last resort.  It should only be used for things that
simply can't be written in C or are horrific to understand and manage
when written in C.  A single statement like:

BUG_ON(something);

does not qualify in my book as something that's horrific to write in C.

> Also, using wrapper will add more complication for in/out instruction
> substitution use case. please check the use case in following patch.
> https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543

I'm s

Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Kuppuswamy, Sathyanarayanan




On 4/20/21 12:59 PM, Dave Hansen wrote:

On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:

approach is, it adds a few extra instructions for every
TDCALL use case when compared to distributed checks. Although
it's a bit less efficient, it's worth it to make the code more
readable.


What's a "distributed check"?


It should be "distributed TDVMCALL/TDCALL inline assembly calls"


It's still not clear to what that refers.


I am just comparing the performance cost of using generic TDCALL()/TDVMCALL()
function implementation with "usage specific" (GetQuote,MapGPA, HLT,etc) custom
TDCALL()/TDVMCALL() inline assembly implementation.



This also doesn't talk at all about why this approach was chosen versus
inline assembly.  You're going to be asked "why not use inline asm?"

"To make the core more readable and less error prone." I have added this
info in above paragraph. Do you think we need more argument to
justify our approach?


Yes, you need much more justification.  That's pretty generic and
non-specific.

readability is one of the main motivation for not choosing inline
assembly. Since number of lines of instructions (with comments) are over
70, using inline assembly made it hard to read. Another reason is, since we
are using many registers (R8-R15, R[A-D]X)) in TDVMCAL/TDCALL operation, we are
not sure whether some older compiler can follow our specified inline assembly
constraints.



+    /*
+ * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
+ * defined in the GHCI.  Note, RAX and RCX are consumed, but
only by
+ * TDX-Module and so don't need to be listed in the mask.
+ */


"GCHI" is out of the blue here.  So is "TDX-Module".  There needs to be
more context.

ok. will add it. Do you want GHCI spec reference with section id here?


Absolutely not.  I dislike all of the section references as-is.  Doesn't
a comment like this say what you said above and also add context?

Expose every register currently used in the
guest-to-host communication interface (GHCI).

ok.



+    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
+
+    tdcall
+
+    /* Panic if TDCALL reports failure. */
+    test %rax, %rax
+    jnz 2f


Why panic?

As per spec, TDCALL should never fail. Note that it has nothing to do
with TDVMCALL function specific failure (which is reported via R10).


You've introduced two concepts here, without differentiating them.  You
need to work to differentiate these two kinds of failure somewhere.  You
can't simply refer to both as "failure".

will clarify it. I have assumed that once the user reads the spec, its easier
to understand.



Also, do you *REALLY* need to do this from assembly?  Can't it be done
in the C wrapper?

Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
so added
it here.


That's not a good reason.  You could just as easily have a C wrapper
which all uses of TDVMCALL go through.

Any reason for not preferring it in assembly code?
Also, using wrapper will add more complication for in/out instruction
substitution use case. please check the use case in following patch.
https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543



+    /* Propagate TDVMCALL success/failure to return value. */
+    mov %r10, %rax


You just said it panic's on failure.  How can this propagate failure?

we use panic for TDCALL failure. But, R10 content used to identify
whether given
TDVMCALL function operation is successful or not.


As I said above, please endeavor to differentiate the two classes of
failures.

Also, if the spec is violated, do you *REALLY* want to blow up the whole
world with a panic?  I guess you can argue that it could have been
something security-related that failed.  But, either way, you owe a
description of why panic'ing is a good idea, not just blindly deferring
to "the spec says this can't happen".

ok. will add some comments justifying our case.



+    xor %r10d, %r10d
+    xor %r11d, %r11d
+    xor %r12d, %r12d
+    xor %r13d, %r13d
+    xor %r14d, %r14d
+    xor %r15d, %r15d
+
+    pop %r12
+    pop %r13
+    pop %r14
+    pop %r15
+
+    FRAME_END
+    ret
+2:
+    ud2
+.endm
+
+SYM_FUNC_START(__tdvmcall)
+    xor %r10, %r10
+    tdvmcall_core
+SYM_FUNC_END(__tdvmcall)
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 0d00dd50a6ff..1147e7e765d6 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -3,6 +3,36 @@
     #include 
   +/*
+ * Wrapper for the common case with standard output value (R10).
+ */


... and oddly enough there is no explicit mention of R10 anywhere.  Why?

For Guest to Host call -> R10 holds TDCALL function id (which is 0 for
TDVMCALL). so
we don't need special argument.
After TDVMCALL execution, R10 value is returned via RAX.


OK... so this is how it works.  But why mention R10 in the comment?  Why
is *THAT* worth mentioning?

its not needed. will remove it.



+static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
+{
+

Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Dave Hansen
On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>> approach is, it adds a few extra instructions for every
>>> TDCALL use case when compared to distributed checks. Although
>>> it's a bit less efficient, it's worth it to make the code more
>>> readable.
>>
>> What's a "distributed check"?
> 
> It should be "distributed TDVMCALL/TDCALL inline assembly calls"

It's still not clear to what that refers.

>> This also doesn't talk at all about why this approach was chosen versus
>> inline assembly.  You're going to be asked "why not use inline asm?"
> "To make the core more readable and less error prone." I have added this
> info in above paragraph. Do you think we need more argument to
> justify our approach?

Yes, you need much more justification.  That's pretty generic and
non-specific.

>>> +    /*
>>> + * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
>>> + * defined in the GHCI.  Note, RAX and RCX are consumed, but
>>> only by
>>> + * TDX-Module and so don't need to be listed in the mask.
>>> + */
>>
>> "GCHI" is out of the blue here.  So is "TDX-Module".  There needs to be
>> more context.
> ok. will add it. Do you want GHCI spec reference with section id here?

Absolutely not.  I dislike all of the section references as-is.  Doesn't
a comment like this say what you said above and also add context?

Expose every register currently used in the
guest-to-host communication interface (GHCI).

>>> +    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>>> +
>>> +    tdcall
>>> +
>>> +    /* Panic if TDCALL reports failure. */
>>> +    test %rax, %rax
>>> +    jnz 2f
>>
>> Why panic?
> As per spec, TDCALL should never fail. Note that it has nothing to do
> with TDVMCALL function specific failure (which is reported via R10).

You've introduced two concepts here, without differentiating them.  You
need to work to differentiate these two kinds of failure somewhere.  You
can't simply refer to both as "failure".

>> Also, do you *REALLY* need to do this from assembly?  Can't it be done
>> in the C wrapper?
> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
> so added
> it here.

That's not a good reason.  You could just as easily have a C wrapper
which all uses of TDVMCALL go through.

>>> +    /* Propagate TDVMCALL success/failure to return value. */
>>> +    mov %r10, %rax
>>
>> You just said it panic's on failure.  How can this propagate failure?
> we use panic for TDCALL failure. But, R10 content used to identify
> whether given
> TDVMCALL function operation is successful or not.

As I said above, please endeavor to differentiate the two classes of
failures.

Also, if the spec is violated, do you *REALLY* want to blow up the whole
world with a panic?  I guess you can argue that it could have been
something security-related that failed.  But, either way, you owe a
description of why panic'ing is a good idea, not just blindly deferring
to "the spec says this can't happen".

>>> +    xor %r10d, %r10d
>>> +    xor %r11d, %r11d
>>> +    xor %r12d, %r12d
>>> +    xor %r13d, %r13d
>>> +    xor %r14d, %r14d
>>> +    xor %r15d, %r15d
>>> +
>>> +    pop %r12
>>> +    pop %r13
>>> +    pop %r14
>>> +    pop %r15
>>> +
>>> +    FRAME_END
>>> +    ret
>>> +2:
>>> +    ud2
>>> +.endm
>>> +
>>> +SYM_FUNC_START(__tdvmcall)
>>> +    xor %r10, %r10
>>> +    tdvmcall_core
>>> +SYM_FUNC_END(__tdvmcall)
>>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>>> index 0d00dd50a6ff..1147e7e765d6 100644
>>> --- a/arch/x86/kernel/tdx.c
>>> +++ b/arch/x86/kernel/tdx.c
>>> @@ -3,6 +3,36 @@
>>>     #include 
>>>   +/*
>>> + * Wrapper for the common case with standard output value (R10).
>>> + */
>>
>> ... and oddly enough there is no explicit mention of R10 anywhere.  Why?
> For Guest to Host call -> R10 holds TDCALL function id (which is 0 for
> TDVMCALL). so
> we don't need special argument.
> After TDVMCALL execution, R10 value is returned via RAX.

OK... so this is how it works.  But why mention R10 in the comment?  Why
is *THAT* worth mentioning?

>>> +static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
>>> +{
>>> +    u64 err;
>>> +
>>> +    err = __tdvmcall(fn, r12, r13, r14, r15, NULL);
>>> +
>>> +    WARN_ON(err);
>>> +
>>> +    return err;
>>> +}
>>
>> Are there really *ZERO* reasons for a TDVMCALL to return an error?
> No. Its useful for debugging TDVMCALL failures.
>> Won't this let a malicious VMM spew endless warnings into the guest
>> console?
> As per GHCI spec, R10 will hold error code details which can be used to
> determine
> the type of TDVMCALL failure.

I would encourage you to stop citing the GCCI spec.  In all of these
conversations, you seem to rely on it without considering the underlying
reasons.  The fact that R10 is the error code is 100% irrelevant for
this conversation.

It's also entirely possible that the host would have bugs, or forget to
clear a bit somewhere, even if the spec says, "don't do it".

> More warni

Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Kuppuswamy, Sathyanarayanan




On 4/20/21 10:36 AM, Dave Hansen wrote:

On 3/26/21 4:38 PM, Kuppuswamy Sathyanarayanan wrote:

Implement common helper functions to communicate with
the TDX Module and VMM (using TDCALL instruction).


This is missing any kind of background.  I'd say:

Guests communicate with VMMs with hypercalls. Historically, these are
implemented using instructions that are known to cause VMEXITs like
.  However, with TDX, VMEXITs no longer expose guest
state from the host.  This prevents the old hypercall mechanisms from
working

... and then go on to talk about what you are introducing, why there are
two of them and so forth.

Ok. I will add it.



__tdvmcall() function can be used to request services
from VMM.


^ "from a VMM" or "from the VMM", please


will use "from the VMM".

__tdcall() function can be used to communicate with the
TDX Module.

Using common helper functions makes the code more readable
and less error prone compared to distributed and use case
specific inline assembly code. Only downside in using this


 ^ "The only downside..."

will fix it.



approach is, it adds a few extra instructions for every
TDCALL use case when compared to distributed checks. Although
it's a bit less efficient, it's worth it to make the code more
readable.


What's a "distributed check"?


It should be "distributed TDVMCALL/TDCALL inline assembly calls"


This also doesn't talk at all about why this approach was chosen versus
inline assembly.  You're going to be asked "why not use inline asm?"

"To make the core more readable and less error prone." I have added this info
in above paragraph. Do you think we need more argument to justify our approach?



--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,12 +8,35 @@
  #ifdef CONFIG_INTEL_TDX_GUEST
  
  #include 

+#include 
+
+struct tdcall_output {
+   u64 rcx;
+   u64 rdx;
+   u64 r8;
+   u64 r9;
+   u64 r10;
+   u64 r11;
+};
+
+struct tdvmcall_output {
+   u64 r11;
+   u64 r12;
+   u64 r13;
+   u64 r14;
+   u64 r15;
+};
  
  /* Common API to check TDX support in decompression and common kernel code. */

  bool is_tdx_guest(void);
  
  void __init tdx_early_init(void);
  
+u64 __tdcall(u64 fn, u64 rcx, u64 rdx, struct tdcall_output *out);

+
+u64 __tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15,
+  struct tdvmcall_output *out);


Some one-liner comments about what these do would be nice.

will do.



  #else // !CONFIG_INTEL_TDX_GUEST
  
  static inline bool is_tdx_guest(void)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index ea111bf50691..7966c10ea8d1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -127,7 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)+= pvclock.o
  obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
  
  obj-$(CONFIG_JAILHOUSE_GUEST)	+= jailhouse.o

-obj-$(CONFIG_INTEL_TDX_GUEST)  += tdx.o
+obj-$(CONFIG_INTEL_TDX_GUEST)  += tdcall.o tdx.o
  
  obj-$(CONFIG_EISA)		+= eisa.o

  obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 60b9f42ce3c1..72de0b49467e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -23,6 +23,10 @@
  #include 
  #endif
  
+#ifdef CONFIG_INTEL_TDX_GUEST

+#include 
+#endif
+
  #ifdef CONFIG_X86_32
  # include "asm-offsets_32.c"
  #else
@@ -75,6 +79,24 @@ static void __used common(void)
OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
  #endif
  
+#ifdef CONFIG_INTEL_TDX_GUEST

+   BLANK();
+   /* Offset for fields in tdcall_output */
+   OFFSET(TDCALL_rcx, tdcall_output, rcx);
+   OFFSET(TDCALL_rdx, tdcall_output, rdx);
+   OFFSET(TDCALL_r8, tdcall_output, r8);
+   OFFSET(TDCALL_r9, tdcall_output, r9);
+   OFFSET(TDCALL_r10, tdcall_output, r10);
+   OFFSET(TDCALL_r11, tdcall_output, r11);


 ^ vertically align this


will fix it.

+   /* Offset for fields in tdvmcall_output */
+   OFFSET(TDVMCALL_r11, tdvmcall_output, r11);
+   OFFSET(TDVMCALL_r12, tdvmcall_output, r12);
+   OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
+   OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
+   OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
+#endif
+
BLANK();
OFFSET(BP_scratch, boot_params, scratch);
OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
new file mode 100644
index ..a73b67c0b407
--- /dev/null
+++ b/arch/x86/kernel/tdcall.S
@@ -0,0 +1,163 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define TDVMCALL_EXPOSE_REGS_MASK  0xfc00


This looks like an undocumented magic number.


+/*
+ * TDCALL instruction is newly added in TDX architecture,
+ * used by TD for requesting the host VMM to provide
+ * (untrusted) services. Supported in Binu

Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Dave Hansen
On 3/26/21 4:38 PM, Kuppuswamy Sathyanarayanan wrote:
> Implement common helper functions to communicate with
> the TDX Module and VMM (using TDCALL instruction).

This is missing any kind of background.  I'd say:

Guests communicate with VMMs with hypercalls. Historically, these are
implemented using instructions that are known to cause VMEXITs like
.  However, with TDX, VMEXITs no longer expose guest
state from the host.  This prevents the old hypercall mechanisms from
working

... and then go on to talk about what you are introducing, why there are
two of them and so forth.

> __tdvmcall() function can be used to request services
> from VMM.

^ "from a VMM" or "from the VMM", please

> __tdcall() function can be used to communicate with the
> TDX Module.
> 
> Using common helper functions makes the code more readable
> and less error prone compared to distributed and use case
> specific inline assembly code. Only downside in using this

 ^ "The only downside..."

> approach is, it adds a few extra instructions for every
> TDCALL use case when compared to distributed checks. Although
> it's a bit less efficient, it's worth it to make the code more
> readable.

What's a "distributed check"?

This also doesn't talk at all about why this approach was chosen versus
inline assembly.  You're going to be asked "why not use inline asm?"

> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -8,12 +8,35 @@
>  #ifdef CONFIG_INTEL_TDX_GUEST
>  
>  #include 
> +#include 
> +
> +struct tdcall_output {
> + u64 rcx;
> + u64 rdx;
> + u64 r8;
> + u64 r9;
> + u64 r10;
> + u64 r11;
> +};
> +
> +struct tdvmcall_output {
> + u64 r11;
> + u64 r12;
> + u64 r13;
> + u64 r14;
> + u64 r15;
> +};
>  
>  /* Common API to check TDX support in decompression and common kernel code. 
> */
>  bool is_tdx_guest(void);
>  
>  void __init tdx_early_init(void);
>  
> +u64 __tdcall(u64 fn, u64 rcx, u64 rdx, struct tdcall_output *out);
> +
> +u64 __tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15,
> +struct tdvmcall_output *out);

Some one-liner comments about what these do would be nice.

>  #else // !CONFIG_INTEL_TDX_GUEST
>  
>  static inline bool is_tdx_guest(void)
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index ea111bf50691..7966c10ea8d1 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -127,7 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)  += pvclock.o
>  obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
>  
>  obj-$(CONFIG_JAILHOUSE_GUEST)+= jailhouse.o
> -obj-$(CONFIG_INTEL_TDX_GUEST)+= tdx.o
> +obj-$(CONFIG_INTEL_TDX_GUEST)+= tdcall.o tdx.o
>  
>  obj-$(CONFIG_EISA)   += eisa.o
>  obj-$(CONFIG_PCSPKR_PLATFORM)+= pcspeaker.o
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index 60b9f42ce3c1..72de0b49467e 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -23,6 +23,10 @@
>  #include 
>  #endif
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +#include 
> +#endif
> +
>  #ifdef CONFIG_X86_32
>  # include "asm-offsets_32.c"
>  #else
> @@ -75,6 +79,24 @@ static void __used common(void)
>   OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
>  #endif
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST
> + BLANK();
> + /* Offset for fields in tdcall_output */
> + OFFSET(TDCALL_rcx, tdcall_output, rcx);
> + OFFSET(TDCALL_rdx, tdcall_output, rdx);
> + OFFSET(TDCALL_r8, tdcall_output, r8);
> + OFFSET(TDCALL_r9, tdcall_output, r9);
> + OFFSET(TDCALL_r10, tdcall_output, r10);
> + OFFSET(TDCALL_r11, tdcall_output, r11);

 ^ vertically align this

> + /* Offset for fields in tdvmcall_output */
> + OFFSET(TDVMCALL_r11, tdvmcall_output, r11);
> + OFFSET(TDVMCALL_r12, tdvmcall_output, r12);
> + OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
> + OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
> + OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
> +#endif
> +
>   BLANK();
>   OFFSET(BP_scratch, boot_params, scratch);
>   OFFSET(BP_secure_boot, boot_params, secure_boot);
> diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
> new file mode 100644
> index ..a73b67c0b407
> --- /dev/null
> +++ b/arch/x86/kernel/tdcall.S
> @@ -0,0 +1,163 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define TDVMCALL_EXPOSE_REGS_MASK0xfc00

This looks like an undocumented magic number.

> +/*
> + * TDCALL instruction is newly added in TDX architecture,
> + * used by TD for requesting the host VMM to provide
> + * (untrusted) services. Supported in Binutils >= 2.36
> + */

Host VMM *AND* TD-module, right?

> +#define tdcall .byte 0x66,0x0f,0x01,0xcc

How well will the "newly added" comment age?

"host VMM" is redundant.

/*
 * TDX guests us

[PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-03-26 Thread Kuppuswamy Sathyanarayanan
Implement common helper functions to communicate with
the TDX Module and VMM (using TDCALL instruction).

__tdvmcall() function can be used to request services
from VMM.

__tdcall() function can be used to communicate with the
TDX Module.

Using common helper functions makes the code more readable
and less error prone compared to distributed and use case
specific inline assembly code. Only downside in using this
approach is, it adds a few extra instructions for every
TDCALL use case when compared to distributed checks. Although
it's a bit less efficient, it's worth it to make the code more
readable.

Originally-by: Sean Christopherson 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---

Hi All,

Please let me know your review comments. If you agree with this patch
and want to see the use of these APIs in rest of the patches, I will
re-send the patch series with updated code. Please let me know.

Changes since v1:
 * Implemented tdvmcall and tdcall helper functions as assembly code.
 * Followed suggestion provided by Sean & Dave.

 arch/x86/include/asm/tdx.h|  23 +
 arch/x86/kernel/Makefile  |   2 +-
 arch/x86/kernel/asm-offsets.c |  22 +
 arch/x86/kernel/tdcall.S  | 163 ++
 arch/x86/kernel/tdx.c |  30 +++
 5 files changed, 239 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/tdcall.S

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 69af72d08d3d..ce6212ce5f45 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,12 +8,35 @@
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 #include 
+#include 
+
+struct tdcall_output {
+   u64 rcx;
+   u64 rdx;
+   u64 r8;
+   u64 r9;
+   u64 r10;
+   u64 r11;
+};
+
+struct tdvmcall_output {
+   u64 r11;
+   u64 r12;
+   u64 r13;
+   u64 r14;
+   u64 r15;
+};
 
 /* Common API to check TDX support in decompression and common kernel code. */
 bool is_tdx_guest(void);
 
 void __init tdx_early_init(void);
 
+u64 __tdcall(u64 fn, u64 rcx, u64 rdx, struct tdcall_output *out);
+
+u64 __tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15,
+  struct tdvmcall_output *out);
+
 #else // !CONFIG_INTEL_TDX_GUEST
 
 static inline bool is_tdx_guest(void)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index ea111bf50691..7966c10ea8d1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -127,7 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_JAILHOUSE_GUEST)  += jailhouse.o
-obj-$(CONFIG_INTEL_TDX_GUEST)  += tdx.o
+obj-$(CONFIG_INTEL_TDX_GUEST)  += tdcall.o tdx.o
 
 obj-$(CONFIG_EISA) += eisa.o
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 60b9f42ce3c1..72de0b49467e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -23,6 +23,10 @@
 #include 
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#include 
+#endif
+
 #ifdef CONFIG_X86_32
 # include "asm-offsets_32.c"
 #else
@@ -75,6 +79,24 @@ static void __used common(void)
OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+   BLANK();
+   /* Offset for fields in tdcall_output */
+   OFFSET(TDCALL_rcx, tdcall_output, rcx);
+   OFFSET(TDCALL_rdx, tdcall_output, rdx);
+   OFFSET(TDCALL_r8, tdcall_output, r8);
+   OFFSET(TDCALL_r9, tdcall_output, r9);
+   OFFSET(TDCALL_r10, tdcall_output, r10);
+   OFFSET(TDCALL_r11, tdcall_output, r11);
+
+   /* Offset for fields in tdvmcall_output */
+   OFFSET(TDVMCALL_r11, tdvmcall_output, r11);
+   OFFSET(TDVMCALL_r12, tdvmcall_output, r12);
+   OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
+   OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
+   OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
+#endif
+
BLANK();
OFFSET(BP_scratch, boot_params, scratch);
OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
new file mode 100644
index ..a73b67c0b407
--- /dev/null
+++ b/arch/x86/kernel/tdcall.S
@@ -0,0 +1,163 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define TDVMCALL_EXPOSE_REGS_MASK  0xfc00
+
+/*
+ * TDCALL instruction is newly added in TDX architecture,
+ * used by TD for requesting the host VMM to provide
+ * (untrusted) services. Supported in Binutils >= 2.36
+ */
+#define tdcall .byte 0x66,0x0f,0x01,0xcc
+
+/* Only for non TDVMCALL use cases */
+SYM_FUNC_START(__tdcall)
+   FRAME_BEGIN
+
+   /* Save/restore non-volatile GPRs that are exposed to the VMM. */
+   push %r15
+   push %r14
+   push %r13
+   push %r12
+
+   /*
+* RDI  => RAX = TDCALL leaf
+* RSI  => RCX = input param 1
+* RDX  => RDX = input param 2
+