Re: RFC: userspace exception fixups

2018-11-26 Thread Jarkko Sakkinen
On Mon, Nov 26, 2018 at 06:35:34AM -0800, Sean Christopherson wrote:
> And how would you determine the #UD is related to SGX?  Hardware doesn't
> provide any indication that a #UD (or any other fault) is related to SGX
> or occurred in an enclave.  The only fault that is special-cased in a
> non-virtualized environment is #PF signaled by the EPCM, which gets the
> PF_SGX bit set in the error code.

Could you not detect #UD from address where it happened? Kernel knows
where enclaves are mapped. BTW, how does Intel run-time emulate opcodes
currently?

Anyway, I've fully discarded the whole idea because implementing single
stepping w/o well defined AEP handler is nasty. I think vDSO's are the
only viable path that at least I'm aware off...

/Jarkko


Re: RFC: userspace exception fixups

2018-11-26 Thread Jarkko Sakkinen
On Mon, Nov 26, 2018 at 06:35:34AM -0800, Sean Christopherson wrote:
> And how would you determine the #UD is related to SGX?  Hardware doesn't
> provide any indication that a #UD (or any other fault) is related to SGX
> or occurred in an enclave.  The only fault that is special-cased in a
> non-virtualized environment is #PF signaled by the EPCM, which gets the
> PF_SGX bit set in the error code.

Could you not detect #UD from address where it happened? Kernel knows
where enclaves are mapped. BTW, how does Intel run-time emulate opcodes
currently?

Anyway, I've fully discarded the whole idea because implementing single
stepping w/o well defined AEP handler is nasty. I think vDSO's are the
only viable path that at least I'm aware off...

/Jarkko


Re: RFC: userspace exception fixups

2018-11-26 Thread Sean Christopherson
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote:
> On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote:
> > Jarkko, can you please explain you solution in detail? The CPU receives an
> > exception. This will be handled by the kernel exception handler. What
> > information does the kernel exception handler use to determine whether to
> > deliver the exception as a regular signal to the process, or whether to set
> > the special registers values for userspace and just continue executing the
> > process manually?
> 
> Now we throw SIGSEGV when PF_SGX set, right? In my solution that would
> be turned just doing iret to AEP with the extra that three registers get
> exception data (type, reason, addr). No decoding or RIP adjusting
> involved.
> 
> That would mean that you would actually have to implement AEP handler
> than just have enclu there.
> 
> I've also proposed that perhaps for SGX also #UD should be propagated
> this way because for some instructions you need outside help to emulate
> "non-enclave" environment.

And how would you determine the #UD is related to SGX?  Hardware doesn't
provide any indication that a #UD (or any other fault) is related to SGX
or occurred in an enclave.  The only fault that is special-cased in a
non-virtualized environment is #PF signaled by the EPCM, which gets the
PF_SGX bit set in the error code.

> That is all I have drafted together so far. I'll try to finish v18 this
> week with other stuff and refine further next week (unless someone gives
> obvious reason why this doesn't work, which might well be because I
> haven't went too deep with my analysis yet because of lack of time).
> 
> /Jarkko


Re: RFC: userspace exception fixups

2018-11-26 Thread Sean Christopherson
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote:
> On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote:
> > Jarkko, can you please explain you solution in detail? The CPU receives an
> > exception. This will be handled by the kernel exception handler. What
> > information does the kernel exception handler use to determine whether to
> > deliver the exception as a regular signal to the process, or whether to set
> > the special registers values for userspace and just continue executing the
> > process manually?
> 
> Now we throw SIGSEGV when PF_SGX set, right? In my solution that would
> be turned just doing iret to AEP with the extra that three registers get
> exception data (type, reason, addr). No decoding or RIP adjusting
> involved.
> 
> That would mean that you would actually have to implement AEP handler
> than just have enclu there.
> 
> I've also proposed that perhaps for SGX also #UD should be propagated
> this way because for some instructions you need outside help to emulate
> "non-enclave" environment.

And how would you determine the #UD is related to SGX?  Hardware doesn't
provide any indication that a #UD (or any other fault) is related to SGX
or occurred in an enclave.  The only fault that is special-cased in a
non-virtualized environment is #PF signaled by the EPCM, which gets the
PF_SGX bit set in the error code.

> That is all I have drafted together so far. I'll try to finish v18 this
> week with other stuff and refine further next week (unless someone gives
> obvious reason why this doesn't work, which might well be because I
> haven't went too deep with my analysis yet because of lack of time).
> 
> /Jarkko


Re: RFC: userspace exception fixups

2018-11-24 Thread Jarkko Sakkinen
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote:
> On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote:
> > Jarkko, can you please explain you solution in detail? The CPU receives an
> > exception. This will be handled by the kernel exception handler. What
> > information does the kernel exception handler use to determine whether to
> > deliver the exception as a regular signal to the process, or whether to set
> > the special registers values for userspace and just continue executing the
> > process manually?
> 
> Now we throw SIGSEGV when PF_SGX set, right? In my solution that would
> be turned just doing iret to AEP with the extra that three registers get
> exception data (type, reason, addr). No decoding or RIP adjusting
> involved.
> 
> That would mean that you would actually have to implement AEP handler
> than just have enclu there.
> 
> I've also proposed that perhaps for SGX also #UD should be propagated
> this way because for some instructions you need outside help to emulate
> "non-enclave" environment.
> 
> That is all I have drafted together so far. I'll try to finish v18 this
> week with other stuff and refine further next week (unless someone gives
> obvious reason why this doesn't work, which might well be because I
> haven't went too deep with my analysis yet because of lack of time).

The obvious con in this approach is that if you single step the code,
the whole AEP handler would single stepped also everytime. Probably big
enough con that it is better to go with the vDSO approach anyhow...

/Jarkko


Re: RFC: userspace exception fixups

2018-11-24 Thread Jarkko Sakkinen
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote:
> On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote:
> > Jarkko, can you please explain you solution in detail? The CPU receives an
> > exception. This will be handled by the kernel exception handler. What
> > information does the kernel exception handler use to determine whether to
> > deliver the exception as a regular signal to the process, or whether to set
> > the special registers values for userspace and just continue executing the
> > process manually?
> 
> Now we throw SIGSEGV when PF_SGX set, right? In my solution that would
> be turned just doing iret to AEP with the extra that three registers get
> exception data (type, reason, addr). No decoding or RIP adjusting
> involved.
> 
> That would mean that you would actually have to implement AEP handler
> than just have enclu there.
> 
> I've also proposed that perhaps for SGX also #UD should be propagated
> this way because for some instructions you need outside help to emulate
> "non-enclave" environment.
> 
> That is all I have drafted together so far. I'll try to finish v18 this
> week with other stuff and refine further next week (unless someone gives
> obvious reason why this doesn't work, which might well be because I
> haven't went too deep with my analysis yet because of lack of time).

The obvious con in this approach is that if you single step the code,
the whole AEP handler would single stepped also everytime. Probably big
enough con that it is better to go with the vDSO approach anyhow...

/Jarkko


Re: RFC: userspace exception fixups

2018-11-21 Thread Jarkko Sakkinen
On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote:
> Jarkko, can you please explain you solution in detail? The CPU receives an
> exception. This will be handled by the kernel exception handler. What
> information does the kernel exception handler use to determine whether to
> deliver the exception as a regular signal to the process, or whether to set
> the special registers values for userspace and just continue executing the
> process manually?

Now we throw SIGSEGV when PF_SGX set, right? In my solution that would
be turned just doing iret to AEP with the extra that three registers get
exception data (type, reason, addr). No decoding or RIP adjusting
involved.

That would mean that you would actually have to implement AEP handler
than just have enclu there.

I've also proposed that perhaps for SGX also #UD should be propagated
this way because for some instructions you need outside help to emulate
"non-enclave" environment.

That is all I have drafted together so far. I'll try to finish v18 this
week with other stuff and refine further next week (unless someone gives
obvious reason why this doesn't work, which might well be because I
haven't went too deep with my analysis yet because of lack of time).

/Jarkko


Re: RFC: userspace exception fixups

2018-11-21 Thread Jarkko Sakkinen
On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote:
> Jarkko, can you please explain you solution in detail? The CPU receives an
> exception. This will be handled by the kernel exception handler. What
> information does the kernel exception handler use to determine whether to
> deliver the exception as a regular signal to the process, or whether to set
> the special registers values for userspace and just continue executing the
> process manually?

Now we throw SIGSEGV when PF_SGX set, right? In my solution that would
be turned just doing iret to AEP with the extra that three registers get
exception data (type, reason, addr). No decoding or RIP adjusting
involved.

That would mean that you would actually have to implement AEP handler
than just have enclu there.

I've also proposed that perhaps for SGX also #UD should be propagated
this way because for some instructions you need outside help to emulate
"non-enclave" environment.

That is all I have drafted together so far. I'll try to finish v18 this
week with other stuff and refine further next week (unless someone gives
obvious reason why this doesn't work, which might well be because I
haven't went too deep with my analysis yet because of lack of time).

/Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Jethro Beekman

On 2018-11-21 04:25, Jarkko Sakkinen wrote:

On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote:

general by mucking with some regs and retrying -- that will infinite
loop and confuse everyone.  I'm not even 100% convinced that decoding
the insn stream is useful -- AEP can point to something that isn't
ENCLU.


In my return-to-AEP approach to whole point was not to do any decoding
but instead have something else always in the AEP handler than just
ENCLU.

No instruction decoding. No RIP manipulation.


IOW the kernel needs to know *when* to apply this special behavior.
Sadly there is no bit in the exception frame that says "came from
SGX".


Jarkko, can you please explain you solution in detail? The CPU receives 
an exception. This will be handled by the kernel exception handler. What 
information does the kernel exception handler use to determine whether 
to deliver the exception as a regular signal to the process, or whether 
to set the special registers values for userspace and just continue 
executing the process manually?


--
Jethro Beekman | Fortanix



smime.p7s
Description: S/MIME Cryptographic Signature


Re: RFC: userspace exception fixups

2018-11-20 Thread Jethro Beekman

On 2018-11-21 04:25, Jarkko Sakkinen wrote:

On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote:

general by mucking with some regs and retrying -- that will infinite
loop and confuse everyone.  I'm not even 100% convinced that decoding
the insn stream is useful -- AEP can point to something that isn't
ENCLU.


In my return-to-AEP approach to whole point was not to do any decoding
but instead have something else always in the AEP handler than just
ENCLU.

No instruction decoding. No RIP manipulation.


IOW the kernel needs to know *when* to apply this special behavior.
Sadly there is no bit in the exception frame that says "came from
SGX".


Jarkko, can you please explain you solution in detail? The CPU receives 
an exception. This will be handled by the kernel exception handler. What 
information does the kernel exception handler use to determine whether 
to deliver the exception as a regular signal to the process, or whether 
to set the special registers values for userspace and just continue 
executing the process manually?


--
Jethro Beekman | Fortanix



smime.p7s
Description: S/MIME Cryptographic Signature


Re: RFC: userspace exception fixups

2018-11-20 Thread Jarkko Sakkinen
On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote:
> What is "#GP with EPCM"?  We certainly don't want to react to #UD in

A typo. Meant #PF with PF_SGX set i.e. EPCM conflict.

> general by mucking with some regs and retrying -- that will infinite
> loop and confuse everyone.  I'm not even 100% convinced that decoding
> the insn stream is useful -- AEP can point to something that isn't
> ENCLU.

In my return-to-AEP approach to whole point was not to do any decoding
but instead have something else always in the AEP handler than just
ENCLU.

No instruction decoding. No RIP manipulation.

> IOW the kernel needs to know *when* to apply this special behavior.
> Sadly there is no bit in the exception frame that says "came from
> SGX".

/Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Jarkko Sakkinen
On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote:
> What is "#GP with EPCM"?  We certainly don't want to react to #UD in

A typo. Meant #PF with PF_SGX set i.e. EPCM conflict.

> general by mucking with some regs and retrying -- that will infinite
> loop and confuse everyone.  I'm not even 100% convinced that decoding
> the insn stream is useful -- AEP can point to something that isn't
> ENCLU.

In my return-to-AEP approach to whole point was not to do any decoding
but instead have something else always in the AEP handler than just
ENCLU.

No instruction decoding. No RIP manipulation.

> IOW the kernel needs to know *when* to apply this special behavior.
> Sadly there is no bit in the exception frame that says "came from
> SGX".

/Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Jarkko Sakkinen
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote:
> On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
> >  wrote:
> > >
> > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > > Decoding the instruction stream and doing it to all exceptions that
> > > > hit an ENCLU instruction seems like a poor design.
> > >
> > > I'm not sure why you would ever need to do any type of fixup as the idea
> > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > > would work the same way as for exceptions that the kernel can deal with
> > > except filling the exception information to registers.
> > 
> > Sure, but how does the kernel know when to do that and when to send a
> > signal?  I don't really like decoding the instruction stream to figure
> > it out.
> 
> Hmm... why you have to decode instruction stream to find that out? Would
> just depend on exception type (#GP with EPCM, #UD). Or are you saying
> that kernel should need to SIGSEGV if there is in fact ENCLU so that
> there is no infinite trap loop? Sorry, I'm a bit lost here that where
> does this decoding requirement comes from in the first place. I
> understand how it is used in Sean's proposal...
> 
> Anyway, this option can be probably discarded without further
> consideration because apparently single stepping can cause #DB SS fault
> if AEP handler is anything else than a single instruction.
> 
> For me it seems that by ruling out options, vDSO option is what is
> left. I don't like it but at least it works...

The section relevant in the SDM is 43.2.6 but I started to think that
why in dumbed down return-to-AEP that would even be a problem? If you
are single step debugging isn't that what you want? Continue single
stepping in the AEP handler...

I still don't understand the part where the need for decoding
instruction stream comes in this dumbed down approach. There's
not RIP manipulation or anything involved at all.

With this reconsideration I would keep this as one option at least.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Jarkko Sakkinen
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote:
> On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
> >  wrote:
> > >
> > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > > Decoding the instruction stream and doing it to all exceptions that
> > > > hit an ENCLU instruction seems like a poor design.
> > >
> > > I'm not sure why you would ever need to do any type of fixup as the idea
> > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > > would work the same way as for exceptions that the kernel can deal with
> > > except filling the exception information to registers.
> > 
> > Sure, but how does the kernel know when to do that and when to send a
> > signal?  I don't really like decoding the instruction stream to figure
> > it out.
> 
> Hmm... why you have to decode instruction stream to find that out? Would
> just depend on exception type (#GP with EPCM, #UD). Or are you saying
> that kernel should need to SIGSEGV if there is in fact ENCLU so that
> there is no infinite trap loop? Sorry, I'm a bit lost here that where
> does this decoding requirement comes from in the first place. I
> understand how it is used in Sean's proposal...
> 
> Anyway, this option can be probably discarded without further
> consideration because apparently single stepping can cause #DB SS fault
> if AEP handler is anything else than a single instruction.
> 
> For me it seems that by ruling out options, vDSO option is what is
> left. I don't like it but at least it works...

The section relevant in the SDM is 43.2.6 but I started to think that
why in dumbed down return-to-AEP that would even be a problem? If you
are single step debugging isn't that what you want? Continue single
stepping in the AEP handler...

I still don't understand the part where the need for decoding
instruction stream comes in this dumbed down approach. There's
not RIP manipulation or anything involved at all.

With this reconsideration I would keep this as one option at least.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Sean Christopherson
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote:
> On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
> >  wrote:
> > >
> > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > > Decoding the instruction stream and doing it to all exceptions that
> > > > hit an ENCLU instruction seems like a poor design.
> > >
> > > I'm not sure why you would ever need to do any type of fixup as the idea
> > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > > would work the same way as for exceptions that the kernel can deal with
> > > except filling the exception information to registers.
> > 
> > Sure, but how does the kernel know when to do that and when to send a
> > signal?  I don't really like decoding the instruction stream to figure
> > it out.
> 
> Hmm... why you have to decode instruction stream to find that out? Would
> just depend on exception type (#GP with EPCM, #UD).

#PF w/ PFEC_SGX is the only exception that indicates a fault is related
to SGX.  Theoretically we could avoid decoding by using a magic value
for the AEP itself and doing even more magic fixup, but that wouldn't
help for faults that occur on EENTER, which can be generic #GPs due to
loss of EPC on SGX1 systems. 

> Or are you saying
> that kernel should need to SIGSEGV if there is in fact ENCLU so that
> there is no infinite trap loop? Sorry, I'm a bit lost here that where
> does this decoding requirement comes from in the first place. I
> understand how it is used in Sean's proposal...
> 
> Anyway, this option can be probably discarded without further
> consideration because apparently single stepping can cause #DB SS fault
> if AEP handler is anything else than a single instruction.

Not that it matters, but we could satisfy the "one instruction"
requirement if the fixup changed RIP to point at an ENCLU for #DBs.

> For me it seems that by ruling out options, vDSO option is what is
> left. I don't like it but at least it works...
> 
> /Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Sean Christopherson
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote:
> On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
> >  wrote:
> > >
> > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > > Decoding the instruction stream and doing it to all exceptions that
> > > > hit an ENCLU instruction seems like a poor design.
> > >
> > > I'm not sure why you would ever need to do any type of fixup as the idea
> > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > > would work the same way as for exceptions that the kernel can deal with
> > > except filling the exception information to registers.
> > 
> > Sure, but how does the kernel know when to do that and when to send a
> > signal?  I don't really like decoding the instruction stream to figure
> > it out.
> 
> Hmm... why you have to decode instruction stream to find that out? Would
> just depend on exception type (#GP with EPCM, #UD).

#PF w/ PFEC_SGX is the only exception that indicates a fault is related
to SGX.  Theoretically we could avoid decoding by using a magic value
for the AEP itself and doing even more magic fixup, but that wouldn't
help for faults that occur on EENTER, which can be generic #GPs due to
loss of EPC on SGX1 systems. 

> Or are you saying
> that kernel should need to SIGSEGV if there is in fact ENCLU so that
> there is no infinite trap loop? Sorry, I'm a bit lost here that where
> does this decoding requirement comes from in the first place. I
> understand how it is used in Sean's proposal...
> 
> Anyway, this option can be probably discarded without further
> consideration because apparently single stepping can cause #DB SS fault
> if AEP handler is anything else than a single instruction.

Not that it matters, but we could satisfy the "one instruction"
requirement if the fixup changed RIP to point at an ENCLU for #DBs.

> For me it seems that by ruling out options, vDSO option is what is
> left. I don't like it but at least it works...
> 
> /Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Andy Lutomirski
On Tue, Nov 20, 2018 at 2:11 AM Jarkko Sakkinen
 wrote:
>
> On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
> >  wrote:
> > >
> > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > > Decoding the instruction stream and doing it to all exceptions that
> > > > hit an ENCLU instruction seems like a poor design.
> > >
> > > I'm not sure why you would ever need to do any type of fixup as the idea
> > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > > would work the same way as for exceptions that the kernel can deal with
> > > except filling the exception information to registers.
> >
> > Sure, but how does the kernel know when to do that and when to send a
> > signal?  I don't really like decoding the instruction stream to figure
> > it out.
>
> Hmm... why you have to decode instruction stream to find that out? Would
> just depend on exception type (#GP with EPCM, #UD).

What is "#GP with EPCM"?  We certainly don't want to react to #UD in
general by mucking with some regs and retrying -- that will infinite
loop and confuse everyone.  I'm not even 100% convinced that decoding
the insn stream is useful -- AEP can point to something that isn't
ENCLU.

IOW the kernel needs to know *when* to apply this special behavior.
Sadly there is no bit in the exception frame that says "came from
SGX".


Re: RFC: userspace exception fixups

2018-11-20 Thread Andy Lutomirski
On Tue, Nov 20, 2018 at 2:11 AM Jarkko Sakkinen
 wrote:
>
> On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
> >  wrote:
> > >
> > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > > Decoding the instruction stream and doing it to all exceptions that
> > > > hit an ENCLU instruction seems like a poor design.
> > >
> > > I'm not sure why you would ever need to do any type of fixup as the idea
> > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > > would work the same way as for exceptions that the kernel can deal with
> > > except filling the exception information to registers.
> >
> > Sure, but how does the kernel know when to do that and when to send a
> > signal?  I don't really like decoding the instruction stream to figure
> > it out.
>
> Hmm... why you have to decode instruction stream to find that out? Would
> just depend on exception type (#GP with EPCM, #UD).

What is "#GP with EPCM"?  We certainly don't want to react to #UD in
general by mucking with some regs and retrying -- that will infinite
loop and confuse everyone.  I'm not even 100% convinced that decoding
the insn stream is useful -- AEP can point to something that isn't
ENCLU.

IOW the kernel needs to know *when* to apply this special behavior.
Sadly there is no bit in the exception frame that says "came from
SGX".


Re: RFC: userspace exception fixups

2018-11-20 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
>  wrote:
> >
> > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > Decoding the instruction stream and doing it to all exceptions that
> > > hit an ENCLU instruction seems like a poor design.
> >
> > I'm not sure why you would ever need to do any type of fixup as the idea
> > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > would work the same way as for exceptions that the kernel can deal with
> > except filling the exception information to registers.
> 
> Sure, but how does the kernel know when to do that and when to send a
> signal?  I don't really like decoding the instruction stream to figure
> it out.

Hmm... why you have to decode instruction stream to find that out? Would
just depend on exception type (#GP with EPCM, #UD). Or are you saying
that kernel should need to SIGSEGV if there is in fact ENCLU so that
there is no infinite trap loop? Sorry, I'm a bit lost here that where
does this decoding requirement comes from in the first place. I
understand how it is used in Sean's proposal...

Anyway, this option can be probably discarded without further
consideration because apparently single stepping can cause #DB SS fault
if AEP handler is anything else than a single instruction.

For me it seems that by ruling out options, vDSO option is what is
left. I don't like it but at least it works...

/Jarkko


Re: RFC: userspace exception fixups

2018-11-20 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote:
> On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
>  wrote:
> >
> > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > > 1. The kernel needs some way to know *when* to apply this fixup.
> > > Decoding the instruction stream and doing it to all exceptions that
> > > hit an ENCLU instruction seems like a poor design.
> >
> > I'm not sure why you would ever need to do any type of fixup as the idea
> > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> > would work the same way as for exceptions that the kernel can deal with
> > except filling the exception information to registers.
> 
> Sure, but how does the kernel know when to do that and when to send a
> signal?  I don't really like decoding the instruction stream to figure
> it out.

Hmm... why you have to decode instruction stream to find that out? Would
just depend on exception type (#GP with EPCM, #UD). Or are you saying
that kernel should need to SIGSEGV if there is in fact ENCLU so that
there is no infinite trap loop? Sorry, I'm a bit lost here that where
does this decoding requirement comes from in the first place. I
understand how it is used in Sean's proposal...

Anyway, this option can be probably discarded without further
consideration because apparently single stepping can cause #DB SS fault
if AEP handler is anything else than a single instruction.

For me it seems that by ruling out options, vDSO option is what is
left. I don't like it but at least it works...

/Jarkko


Re: RFC: userspace exception fixups

2018-11-19 Thread Andy Lutomirski
On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
 wrote:
>
> On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > 1. The kernel needs some way to know *when* to apply this fixup.
> > Decoding the instruction stream and doing it to all exceptions that
> > hit an ENCLU instruction seems like a poor design.
>
> I'm not sure why you would ever need to do any type of fixup as the idea
> is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> would work the same way as for exceptions that the kernel can deal with
> except filling the exception information to registers.

Sure, but how does the kernel know when to do that and when to send a
signal?  I don't really like decoding the instruction stream to figure
it out.


Re: RFC: userspace exception fixups

2018-11-19 Thread Andy Lutomirski
On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen
 wrote:
>
> On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> > 1. The kernel needs some way to know *when* to apply this fixup.
> > Decoding the instruction stream and doing it to all exceptions that
> > hit an ENCLU instruction seems like a poor design.
>
> I'm not sure why you would ever need to do any type of fixup as the idea
> is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
> would work the same way as for exceptions that the kernel can deal with
> except filling the exception information to registers.

Sure, but how does the kernel know when to do that and when to send a
signal?  I don't really like decoding the instruction stream to figure
it out.


Re: RFC: userspace exception fixups

2018-11-19 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> 1. The kernel needs some way to know *when* to apply this fixup.
> Decoding the instruction stream and doing it to all exceptions that
> hit an ENCLU instruction seems like a poor design.

I'm not sure why you would ever need to do any type of fixup as the idea
is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
would work the same way as for exceptions that the kernel can deal with
except filling the exception information to registers.

> 2. It starts exposing what looks like a more generic exception
> handling mechanism to userspace, except that it's nonsensical for
> anything other than ENCLU.

Well, I see the user space and namely the run-time the host for the
enclave i.e. middle-man to provide services for emulating instructions
etc.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-19 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote:
> 1. The kernel needs some way to know *when* to apply this fixup.
> Decoding the instruction stream and doing it to all exceptions that
> hit an ENCLU instruction seems like a poor design.

I'm not sure why you would ever need to do any type of fixup as the idea
is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP
would work the same way as for exceptions that the kernel can deal with
except filling the exception information to registers.

> 2. It starts exposing what looks like a more generic exception
> handling mechanism to userspace, except that it's nonsensical for
> anything other than ENCLU.

Well, I see the user space and namely the run-time the host for the
enclave i.e. middle-man to provide services for emulating instructions
etc.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-19 Thread Andy Lutomirski
On Sat, Nov 17, 2018 at 11:16 PM Jarkko Sakkinen
 wrote:
>
> On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > Hi all-
> >
> > The people working on SGX enablement are grappling with a somewhat
> > annoying issue: the x86 EENTER instruction is used from user code and
> > can, as part of its normal-ish operation, raise an exception.  It is
> > also highly likely to be used from a library, and signal handling in
> > libraries is unpleasant at best.
> >
> > There's been some discussion of adding a vDSO entry point to wrap
> > EENTER and do something sensible with the exceptions, but I'm
> > wondering if a more general mechanism would be helpful.
>
> I haven't really followed all of this discussion because I've been busy
> working on the patch set but for me all of these approaches look awfully
> complicated.
>
> I'll throw my own suggestion and apologize if this has been already
> suggested and discarded: return-to-AEP.
>
> My idea is to do just a small extension to SGX AEX handling. At the
> moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> fill extend this by filling other three spare registers with exception
> information.

I have two issues with this approach:

1. The kernel needs some way to know *when* to apply this fixup.
Decoding the instruction stream and doing it to all exceptions that
hit an ENCLU instruction seems like a poor design.

2. It starts exposing what looks like a more generic exception
handling mechanism to userspace, except that it's nonsensical for
anything other than ENCLU.


Re: RFC: userspace exception fixups

2018-11-19 Thread Andy Lutomirski
On Sat, Nov 17, 2018 at 11:16 PM Jarkko Sakkinen
 wrote:
>
> On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > Hi all-
> >
> > The people working on SGX enablement are grappling with a somewhat
> > annoying issue: the x86 EENTER instruction is used from user code and
> > can, as part of its normal-ish operation, raise an exception.  It is
> > also highly likely to be used from a library, and signal handling in
> > libraries is unpleasant at best.
> >
> > There's been some discussion of adding a vDSO entry point to wrap
> > EENTER and do something sensible with the exceptions, but I'm
> > wondering if a more general mechanism would be helpful.
>
> I haven't really followed all of this discussion because I've been busy
> working on the patch set but for me all of these approaches look awfully
> complicated.
>
> I'll throw my own suggestion and apologize if this has been already
> suggested and discarded: return-to-AEP.
>
> My idea is to do just a small extension to SGX AEX handling. At the
> moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> fill extend this by filling other three spare registers with exception
> information.

I have two issues with this approach:

1. The kernel needs some way to know *when* to apply this fixup.
Decoding the instruction stream and doing it to all exceptions that
hit an ENCLU instruction seems like a poor design.

2. It starts exposing what looks like a more generic exception
handling mechanism to userspace, except that it's nonsensical for
anything other than ENCLU.


Re: RFC: userspace exception fixups

2018-11-19 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 04:05:43PM +0200, Jarkko Sakkinen wrote:
> On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote:
> > On 2018-11-18 18:32, Jarkko Sakkinen wrote:
> > > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> > > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > > > > Hi all-
> > > > > 
> > > > > The people working on SGX enablement are grappling with a somewhat
> > > > > annoying issue: the x86 EENTER instruction is used from user code and
> > > > > can, as part of its normal-ish operation, raise an exception.  It is
> > > > > also highly likely to be used from a library, and signal handling in
> > > > > libraries is unpleasant at best.
> > > > > 
> > > > > There's been some discussion of adding a vDSO entry point to wrap
> > > > > EENTER and do something sensible with the exceptions, but I'm
> > > > > wondering if a more general mechanism would be helpful.
> > > > 
> > > > I haven't really followed all of this discussion because I've been busy
> > > > working on the patch set but for me all of these approaches look awfully
> > > > complicated.
> > > > 
> > > > I'll throw my own suggestion and apologize if this has been already
> > > > suggested and discarded: return-to-AEP.
> > > > 
> > > > My idea is to do just a small extension to SGX AEX handling. At the
> > > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> > > > fill extend this by filling other three spare registers with exception
> > > > information.
> > > > 
> > > > AEP handler can then do whatever it wants to do with this information
> > > > or just do ERESUME.
> > > 
> > > A correction here. In practice this will add a requirement to have a bit
> > > more complicated AEP code (check the regs for exceptions) than before
> > > and not just bytes for ENCLU.
> > > 
> > > e.g. AEP handler should be along the lines
> > > 
> > > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
> > > handle the exception and returns back to user space i.e. to the
> > > AEP handler.
> > > 2. Check the registers containing exception information. If they have
> > > been filled, take whatever actions user space wants to take.
> > > 3. Otherwise, just ERESUME.
> > > 
> > >  From my point of view this is making the AEP parameter useful. Its
> > > standard use is just weird (always point to a place just containing
> > > ENCLU bytes, why the heck it even exists).
> > 
> > I like this solution. Keeps things simple. One question: when an exception
> > occurs, how does the kernel know whether to set special registers or send a
> > signal?
> 
> Yes, and AFAIK people do in many cases people want to do something else
> than just direct ERESUME in AEP handler so would neither be a major
> bummer for user space. If I remember correctly you have such?
> 
> You can check the cases that we have for SIGSEGV (namely EPCM conflict)
> from Sean's patch 08/23.
> 
> I'm open for expanding the scope. It is the easy part after there is
> consensus for the handling mechanism :-)

Not sure if it a good idea or not but maybe even have new ioctl in
addition to the enclave construction ioctls that you use to specify per
enclave what you want to get. SIGSEGV could be the fallback behavior if
you do not "register" to any exceptions.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-19 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 04:05:43PM +0200, Jarkko Sakkinen wrote:
> On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote:
> > On 2018-11-18 18:32, Jarkko Sakkinen wrote:
> > > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> > > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > > > > Hi all-
> > > > > 
> > > > > The people working on SGX enablement are grappling with a somewhat
> > > > > annoying issue: the x86 EENTER instruction is used from user code and
> > > > > can, as part of its normal-ish operation, raise an exception.  It is
> > > > > also highly likely to be used from a library, and signal handling in
> > > > > libraries is unpleasant at best.
> > > > > 
> > > > > There's been some discussion of adding a vDSO entry point to wrap
> > > > > EENTER and do something sensible with the exceptions, but I'm
> > > > > wondering if a more general mechanism would be helpful.
> > > > 
> > > > I haven't really followed all of this discussion because I've been busy
> > > > working on the patch set but for me all of these approaches look awfully
> > > > complicated.
> > > > 
> > > > I'll throw my own suggestion and apologize if this has been already
> > > > suggested and discarded: return-to-AEP.
> > > > 
> > > > My idea is to do just a small extension to SGX AEX handling. At the
> > > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> > > > fill extend this by filling other three spare registers with exception
> > > > information.
> > > > 
> > > > AEP handler can then do whatever it wants to do with this information
> > > > or just do ERESUME.
> > > 
> > > A correction here. In practice this will add a requirement to have a bit
> > > more complicated AEP code (check the regs for exceptions) than before
> > > and not just bytes for ENCLU.
> > > 
> > > e.g. AEP handler should be along the lines
> > > 
> > > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
> > > handle the exception and returns back to user space i.e. to the
> > > AEP handler.
> > > 2. Check the registers containing exception information. If they have
> > > been filled, take whatever actions user space wants to take.
> > > 3. Otherwise, just ERESUME.
> > > 
> > >  From my point of view this is making the AEP parameter useful. Its
> > > standard use is just weird (always point to a place just containing
> > > ENCLU bytes, why the heck it even exists).
> > 
> > I like this solution. Keeps things simple. One question: when an exception
> > occurs, how does the kernel know whether to set special registers or send a
> > signal?
> 
> Yes, and AFAIK people do in many cases people want to do something else
> than just direct ERESUME in AEP handler so would neither be a major
> bummer for user space. If I remember correctly you have such?
> 
> You can check the cases that we have for SIGSEGV (namely EPCM conflict)
> from Sean's patch 08/23.
> 
> I'm open for expanding the scope. It is the easy part after there is
> consensus for the handling mechanism :-)

Not sure if it a good idea or not but maybe even have new ioctl in
addition to the enclave construction ioctls that you use to specify per
enclave what you want to get. SIGSEGV could be the fallback behavior if
you do not "register" to any exceptions.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-19 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote:
> On 2018-11-18 18:32, Jarkko Sakkinen wrote:
> > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > > > Hi all-
> > > > 
> > > > The people working on SGX enablement are grappling with a somewhat
> > > > annoying issue: the x86 EENTER instruction is used from user code and
> > > > can, as part of its normal-ish operation, raise an exception.  It is
> > > > also highly likely to be used from a library, and signal handling in
> > > > libraries is unpleasant at best.
> > > > 
> > > > There's been some discussion of adding a vDSO entry point to wrap
> > > > EENTER and do something sensible with the exceptions, but I'm
> > > > wondering if a more general mechanism would be helpful.
> > > 
> > > I haven't really followed all of this discussion because I've been busy
> > > working on the patch set but for me all of these approaches look awfully
> > > complicated.
> > > 
> > > I'll throw my own suggestion and apologize if this has been already
> > > suggested and discarded: return-to-AEP.
> > > 
> > > My idea is to do just a small extension to SGX AEX handling. At the
> > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> > > fill extend this by filling other three spare registers with exception
> > > information.
> > > 
> > > AEP handler can then do whatever it wants to do with this information
> > > or just do ERESUME.
> > 
> > A correction here. In practice this will add a requirement to have a bit
> > more complicated AEP code (check the regs for exceptions) than before
> > and not just bytes for ENCLU.
> > 
> > e.g. AEP handler should be along the lines
> > 
> > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
> > handle the exception and returns back to user space i.e. to the
> > AEP handler.
> > 2. Check the registers containing exception information. If they have
> > been filled, take whatever actions user space wants to take.
> > 3. Otherwise, just ERESUME.
> > 
> >  From my point of view this is making the AEP parameter useful. Its
> > standard use is just weird (always point to a place just containing
> > ENCLU bytes, why the heck it even exists).
> 
> I like this solution. Keeps things simple. One question: when an exception
> occurs, how does the kernel know whether to set special registers or send a
> signal?

Yes, and AFAIK people do in many cases people want to do something else
than just direct ERESUME in AEP handler so would neither be a major
bummer for user space. If I remember correctly you have such?

You can check the cases that we have for SIGSEGV (namely EPCM conflict)
from Sean's patch 08/23.

I'm open for expanding the scope. It is the easy part after there is
consensus for the handling mechanism :-)

/Jarkko


Re: RFC: userspace exception fixups

2018-11-19 Thread Jarkko Sakkinen
On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote:
> On 2018-11-18 18:32, Jarkko Sakkinen wrote:
> > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > > > Hi all-
> > > > 
> > > > The people working on SGX enablement are grappling with a somewhat
> > > > annoying issue: the x86 EENTER instruction is used from user code and
> > > > can, as part of its normal-ish operation, raise an exception.  It is
> > > > also highly likely to be used from a library, and signal handling in
> > > > libraries is unpleasant at best.
> > > > 
> > > > There's been some discussion of adding a vDSO entry point to wrap
> > > > EENTER and do something sensible with the exceptions, but I'm
> > > > wondering if a more general mechanism would be helpful.
> > > 
> > > I haven't really followed all of this discussion because I've been busy
> > > working on the patch set but for me all of these approaches look awfully
> > > complicated.
> > > 
> > > I'll throw my own suggestion and apologize if this has been already
> > > suggested and discarded: return-to-AEP.
> > > 
> > > My idea is to do just a small extension to SGX AEX handling. At the
> > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> > > fill extend this by filling other three spare registers with exception
> > > information.
> > > 
> > > AEP handler can then do whatever it wants to do with this information
> > > or just do ERESUME.
> > 
> > A correction here. In practice this will add a requirement to have a bit
> > more complicated AEP code (check the regs for exceptions) than before
> > and not just bytes for ENCLU.
> > 
> > e.g. AEP handler should be along the lines
> > 
> > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
> > handle the exception and returns back to user space i.e. to the
> > AEP handler.
> > 2. Check the registers containing exception information. If they have
> > been filled, take whatever actions user space wants to take.
> > 3. Otherwise, just ERESUME.
> > 
> >  From my point of view this is making the AEP parameter useful. Its
> > standard use is just weird (always point to a place just containing
> > ENCLU bytes, why the heck it even exists).
> 
> I like this solution. Keeps things simple. One question: when an exception
> occurs, how does the kernel know whether to set special registers or send a
> signal?

Yes, and AFAIK people do in many cases people want to do something else
than just direct ERESUME in AEP handler so would neither be a major
bummer for user space. If I remember correctly you have such?

You can check the cases that we have for SIGSEGV (namely EPCM conflict)
from Sean's patch 08/23.

I'm open for expanding the scope. It is the easy part after there is
consensus for the handling mechanism :-)

/Jarkko


Re: RFC: userspace exception fixups

2018-11-18 Thread Jethro Beekman

On 2018-11-18 18:32, Jarkko Sakkinen wrote:

On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:

On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:

Hi all-

The people working on SGX enablement are grappling with a somewhat
annoying issue: the x86 EENTER instruction is used from user code and
can, as part of its normal-ish operation, raise an exception.  It is
also highly likely to be used from a library, and signal handling in
libraries is unpleasant at best.

There's been some discussion of adding a vDSO entry point to wrap
EENTER and do something sensible with the exceptions, but I'm
wondering if a more general mechanism would be helpful.


I haven't really followed all of this discussion because I've been busy
working on the patch set but for me all of these approaches look awfully
complicated.

I'll throw my own suggestion and apologize if this has been already
suggested and discarded: return-to-AEP.

My idea is to do just a small extension to SGX AEX handling. At the
moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
fill extend this by filling other three spare registers with exception
information.

AEP handler can then do whatever it wants to do with this information
or just do ERESUME.


A correction here. In practice this will add a requirement to have a bit
more complicated AEP code (check the regs for exceptions) than before
and not just bytes for ENCLU.

e.g. AEP handler should be along the lines

1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
handle the exception and returns back to user space i.e. to the
AEP handler.
2. Check the registers containing exception information. If they have
been filled, take whatever actions user space wants to take.
3. Otherwise, just ERESUME.

 From my point of view this is making the AEP parameter useful. Its
standard use is just weird (always point to a place just containing
ENCLU bytes, why the heck it even exists).


I like this solution. Keeps things simple. One question: when an 
exception occurs, how does the kernel know whether to set special 
registers or send a signal?


--
Jethro Beekman | Fortanix




smime.p7s
Description: S/MIME Cryptographic Signature


Re: RFC: userspace exception fixups

2018-11-18 Thread Jethro Beekman

On 2018-11-18 18:32, Jarkko Sakkinen wrote:

On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:

On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:

Hi all-

The people working on SGX enablement are grappling with a somewhat
annoying issue: the x86 EENTER instruction is used from user code and
can, as part of its normal-ish operation, raise an exception.  It is
also highly likely to be used from a library, and signal handling in
libraries is unpleasant at best.

There's been some discussion of adding a vDSO entry point to wrap
EENTER and do something sensible with the exceptions, but I'm
wondering if a more general mechanism would be helpful.


I haven't really followed all of this discussion because I've been busy
working on the patch set but for me all of these approaches look awfully
complicated.

I'll throw my own suggestion and apologize if this has been already
suggested and discarded: return-to-AEP.

My idea is to do just a small extension to SGX AEX handling. At the
moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
fill extend this by filling other three spare registers with exception
information.

AEP handler can then do whatever it wants to do with this information
or just do ERESUME.


A correction here. In practice this will add a requirement to have a bit
more complicated AEP code (check the regs for exceptions) than before
and not just bytes for ENCLU.

e.g. AEP handler should be along the lines

1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
handle the exception and returns back to user space i.e. to the
AEP handler.
2. Check the registers containing exception information. If they have
been filled, take whatever actions user space wants to take.
3. Otherwise, just ERESUME.

 From my point of view this is making the AEP parameter useful. Its
standard use is just weird (always point to a place just containing
ENCLU bytes, why the heck it even exists).


I like this solution. Keeps things simple. One question: when an 
exception occurs, how does the kernel know whether to set special 
registers or send a signal?


--
Jethro Beekman | Fortanix




smime.p7s
Description: S/MIME Cryptographic Signature


Re: RFC: userspace exception fixups

2018-11-18 Thread Jarkko Sakkinen
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > Hi all-
> > 
> > The people working on SGX enablement are grappling with a somewhat
> > annoying issue: the x86 EENTER instruction is used from user code and
> > can, as part of its normal-ish operation, raise an exception.  It is
> > also highly likely to be used from a library, and signal handling in
> > libraries is unpleasant at best.
> > 
> > There's been some discussion of adding a vDSO entry point to wrap
> > EENTER and do something sensible with the exceptions, but I'm
> > wondering if a more general mechanism would be helpful.
> 
> I haven't really followed all of this discussion because I've been busy
> working on the patch set but for me all of these approaches look awfully
> complicated.
> 
> I'll throw my own suggestion and apologize if this has been already
> suggested and discarded: return-to-AEP.
> 
> My idea is to do just a small extension to SGX AEX handling. At the
> moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> fill extend this by filling other three spare registers with exception
> information.
> 
> AEP handler can then do whatever it wants to do with this information
> or just do ERESUME.

A correction here. In practice this will add a requirement to have a bit
more complicated AEP code (check the regs for exceptions) than before
and not just bytes for ENCLU.

e.g. AEP handler should be along the lines

1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
   handle the exception and returns back to user space i.e. to the
   AEP handler.
2. Check the registers containing exception information. If they have
   been filled, take whatever actions user space wants to take.
3. Otherwise, just ERESUME.

>From my point of view this is making the AEP parameter useful. Its
standard use is just weird (always point to a place just containing
ENCLU bytes, why the heck it even exists).

/Jarkko


Re: RFC: userspace exception fixups

2018-11-18 Thread Jarkko Sakkinen
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > Hi all-
> > 
> > The people working on SGX enablement are grappling with a somewhat
> > annoying issue: the x86 EENTER instruction is used from user code and
> > can, as part of its normal-ish operation, raise an exception.  It is
> > also highly likely to be used from a library, and signal handling in
> > libraries is unpleasant at best.
> > 
> > There's been some discussion of adding a vDSO entry point to wrap
> > EENTER and do something sensible with the exceptions, but I'm
> > wondering if a more general mechanism would be helpful.
> 
> I haven't really followed all of this discussion because I've been busy
> working on the patch set but for me all of these approaches look awfully
> complicated.
> 
> I'll throw my own suggestion and apologize if this has been already
> suggested and discarded: return-to-AEP.
> 
> My idea is to do just a small extension to SGX AEX handling. At the
> moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> fill extend this by filling other three spare registers with exception
> information.
> 
> AEP handler can then do whatever it wants to do with this information
> or just do ERESUME.

A correction here. In practice this will add a requirement to have a bit
more complicated AEP code (check the regs for exceptions) than before
and not just bytes for ENCLU.

e.g. AEP handler should be along the lines

1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
   handle the exception and returns back to user space i.e. to the
   AEP handler.
2. Check the registers containing exception information. If they have
   been filled, take whatever actions user space wants to take.
3. Otherwise, just ERESUME.

>From my point of view this is making the AEP parameter useful. Its
standard use is just weird (always point to a place just containing
ENCLU bytes, why the heck it even exists).

/Jarkko


Re: RFC: userspace exception fixups

2018-11-17 Thread Jarkko Sakkinen
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > Hi all-
> > 
> > The people working on SGX enablement are grappling with a somewhat
> > annoying issue: the x86 EENTER instruction is used from user code and
> > can, as part of its normal-ish operation, raise an exception.  It is
> > also highly likely to be used from a library, and signal handling in
> > libraries is unpleasant at best.
> > 
> > There's been some discussion of adding a vDSO entry point to wrap
> > EENTER and do something sensible with the exceptions, but I'm
> > wondering if a more general mechanism would be helpful.
> 
> I haven't really followed all of this discussion because I've been busy
> working on the patch set but for me all of these approaches look awfully
> complicated.
> 
> I'll throw my own suggestion and apologize if this has been already
> suggested and discarded: return-to-AEP.
> 
> My idea is to do just a small extension to SGX AEX handling. At the
> moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> fill extend this by filling other three spare registers with exception

s/fill extend/extend/

/Jarkko


Re: RFC: userspace exception fixups

2018-11-17 Thread Jarkko Sakkinen
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> > Hi all-
> > 
> > The people working on SGX enablement are grappling with a somewhat
> > annoying issue: the x86 EENTER instruction is used from user code and
> > can, as part of its normal-ish operation, raise an exception.  It is
> > also highly likely to be used from a library, and signal handling in
> > libraries is unpleasant at best.
> > 
> > There's been some discussion of adding a vDSO entry point to wrap
> > EENTER and do something sensible with the exceptions, but I'm
> > wondering if a more general mechanism would be helpful.
> 
> I haven't really followed all of this discussion because I've been busy
> working on the patch set but for me all of these approaches look awfully
> complicated.
> 
> I'll throw my own suggestion and apologize if this has been already
> suggested and discarded: return-to-AEP.
> 
> My idea is to do just a small extension to SGX AEX handling. At the
> moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
> fill extend this by filling other three spare registers with exception

s/fill extend/extend/

/Jarkko


Re: RFC: userspace exception fixups

2018-11-17 Thread Jarkko Sakkinen
On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> Hi all-
> 
> The people working on SGX enablement are grappling with a somewhat
> annoying issue: the x86 EENTER instruction is used from user code and
> can, as part of its normal-ish operation, raise an exception.  It is
> also highly likely to be used from a library, and signal handling in
> libraries is unpleasant at best.
> 
> There's been some discussion of adding a vDSO entry point to wrap
> EENTER and do something sensible with the exceptions, but I'm
> wondering if a more general mechanism would be helpful.

I haven't really followed all of this discussion because I've been busy
working on the patch set but for me all of these approaches look awfully
complicated.

I'll throw my own suggestion and apologize if this has been already
suggested and discarded: return-to-AEP.

My idea is to do just a small extension to SGX AEX handling. At the
moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
fill extend this by filling other three spare registers with exception
information.

AEP handler can then do whatever it wants to do with this information
or just do ERESUME.

In some ways this dummied version of Sean's suggestion.

I think whatever the solution is it should be lightweight and this is
such solution. Why? Because exception handling could be then used to
implement other stuff than just error hadling like syscall wrapper
for the enclaves in nice and lean way.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-17 Thread Jarkko Sakkinen
On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:
> Hi all-
> 
> The people working on SGX enablement are grappling with a somewhat
> annoying issue: the x86 EENTER instruction is used from user code and
> can, as part of its normal-ish operation, raise an exception.  It is
> also highly likely to be used from a library, and signal handling in
> libraries is unpleasant at best.
> 
> There's been some discussion of adding a vDSO entry point to wrap
> EENTER and do something sensible with the exceptions, but I'm
> wondering if a more general mechanism would be helpful.

I haven't really followed all of this discussion because I've been busy
working on the patch set but for me all of these approaches look awfully
complicated.

I'll throw my own suggestion and apologize if this has been already
suggested and discarded: return-to-AEP.

My idea is to do just a small extension to SGX AEX handling. At the
moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
fill extend this by filling other three spare registers with exception
information.

AEP handler can then do whatever it wants to do with this information
or just do ERESUME.

In some ways this dummied version of Sean's suggestion.

I think whatever the solution is it should be lightweight and this is
such solution. Why? Because exception handling could be then used to
implement other stuff than just error hadling like syscall wrapper
for the enclaves in nice and lean way.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-08 Thread Christoph Hellwig
On Thu, Nov 08, 2018 at 12:05:42PM -0800, Andy Lutomirski wrote:
> This whole thing is a mess.  I'm starting to think that the cleanest
> solution would be to provide a way to just tell the kernel that
> certain RIP values have exception fixups.

The bay far cleanest solution would be to say that SGX is sich a mess
that we are not going to support it at all.  It's not like it is a must
have a feature to start with.


Re: RFC: userspace exception fixups

2018-11-08 Thread Christoph Hellwig
On Thu, Nov 08, 2018 at 12:05:42PM -0800, Andy Lutomirski wrote:
> This whole thing is a mess.  I'm starting to think that the cleanest
> solution would be to provide a way to just tell the kernel that
> certain RIP values have exception fixups.

The bay far cleanest solution would be to say that SGX is sich a mess
that we are not going to support it at all.  It's not like it is a must
have a feature to start with.


Re: RFC: userspace exception fixups

2018-11-08 Thread Sean Christopherson
On Thu, Nov 08, 2018 at 01:50:31PM -0800, Dave Hansen wrote:
> On 11/8/18 1:16 PM, Sean Christopherson wrote:
> > On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote:
> >> On 11/8/18 12:05 PM, Andy Lutomirski wrote:
> >>> Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
> >>> not the most terrible thing in the world.  But could the SDK live with
> >>> something more like my suggestion where the vDSO supplies a normal
> >>> function that takes a struct containing registers that are visible to
> >>> the enclave?  This would make it extremely awkward for the enclave to
> >>> use the untrusted stack per se, but it would make it quite easy (I
> >>> think) for the untrusted part of the SDK to allocate some extra memory
> >>> and just tell the enclave that *that* memory is the stack.
> >>
> >> I really think the enclave should keep its grubby mitts off the
> >> untrusted stack.  There are lots of ways to get memory, even with
> >> stack-like semantics, that don't involve mucking with the stack itself.
> >>
> >> I have not heard a good, hard argument for why there is an absolute
> >> *need* to store things on the actual untrusted stack.
> > 
> > Convenience and performance are the only arguments I've heard, e.g. so
> > that allocating memory doesn't require an extra EEXIT->EENTER round trip.
> 
> Well, for the first access, it's going to cost a bunch asynchronous
> exits to fault in all the stack pages.  Instead of that, if you had a
> single area, or an explicit out-call to allocate and populate the area,
> you could do it in a single EEXIT and zero asynchronous exits for demand
> page faults.
> 
> So, it might be convenient, but I'm rather suspicious of any performance
> arguments.

Ya, I meant versus doing an EEXIT on every allocation, i.e. a very
naive allocation scheme.


Re: RFC: userspace exception fixups

2018-11-08 Thread Sean Christopherson
On Thu, Nov 08, 2018 at 01:50:31PM -0800, Dave Hansen wrote:
> On 11/8/18 1:16 PM, Sean Christopherson wrote:
> > On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote:
> >> On 11/8/18 12:05 PM, Andy Lutomirski wrote:
> >>> Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
> >>> not the most terrible thing in the world.  But could the SDK live with
> >>> something more like my suggestion where the vDSO supplies a normal
> >>> function that takes a struct containing registers that are visible to
> >>> the enclave?  This would make it extremely awkward for the enclave to
> >>> use the untrusted stack per se, but it would make it quite easy (I
> >>> think) for the untrusted part of the SDK to allocate some extra memory
> >>> and just tell the enclave that *that* memory is the stack.
> >>
> >> I really think the enclave should keep its grubby mitts off the
> >> untrusted stack.  There are lots of ways to get memory, even with
> >> stack-like semantics, that don't involve mucking with the stack itself.
> >>
> >> I have not heard a good, hard argument for why there is an absolute
> >> *need* to store things on the actual untrusted stack.
> > 
> > Convenience and performance are the only arguments I've heard, e.g. so
> > that allocating memory doesn't require an extra EEXIT->EENTER round trip.
> 
> Well, for the first access, it's going to cost a bunch asynchronous
> exits to fault in all the stack pages.  Instead of that, if you had a
> single area, or an explicit out-call to allocate and populate the area,
> you could do it in a single EEXIT and zero asynchronous exits for demand
> page faults.
> 
> So, it might be convenient, but I'm rather suspicious of any performance
> arguments.

Ya, I meant versus doing an EEXIT on every allocation, i.e. a very
naive allocation scheme.


Re: RFC: userspace exception fixups

2018-11-08 Thread Dave Hansen
On 11/8/18 1:16 PM, Sean Christopherson wrote:
> On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote:
>> On 11/8/18 12:05 PM, Andy Lutomirski wrote:
>>> Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
>>> not the most terrible thing in the world.  But could the SDK live with
>>> something more like my suggestion where the vDSO supplies a normal
>>> function that takes a struct containing registers that are visible to
>>> the enclave?  This would make it extremely awkward for the enclave to
>>> use the untrusted stack per se, but it would make it quite easy (I
>>> think) for the untrusted part of the SDK to allocate some extra memory
>>> and just tell the enclave that *that* memory is the stack.
>>
>> I really think the enclave should keep its grubby mitts off the
>> untrusted stack.  There are lots of ways to get memory, even with
>> stack-like semantics, that don't involve mucking with the stack itself.
>>
>> I have not heard a good, hard argument for why there is an absolute
>> *need* to store things on the actual untrusted stack.
> 
> Convenience and performance are the only arguments I've heard, e.g. so
> that allocating memory doesn't require an extra EEXIT->EENTER round trip.

Well, for the first access, it's going to cost a bunch asynchronous
exits to fault in all the stack pages.  Instead of that, if you had a
single area, or an explicit out-call to allocate and populate the area,
you could do it in a single EEXIT and zero asynchronous exits for demand
page faults.

So, it might be convenient, but I'm rather suspicious of any performance
arguments.


Re: RFC: userspace exception fixups

2018-11-08 Thread Dave Hansen
On 11/8/18 1:16 PM, Sean Christopherson wrote:
> On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote:
>> On 11/8/18 12:05 PM, Andy Lutomirski wrote:
>>> Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
>>> not the most terrible thing in the world.  But could the SDK live with
>>> something more like my suggestion where the vDSO supplies a normal
>>> function that takes a struct containing registers that are visible to
>>> the enclave?  This would make it extremely awkward for the enclave to
>>> use the untrusted stack per se, but it would make it quite easy (I
>>> think) for the untrusted part of the SDK to allocate some extra memory
>>> and just tell the enclave that *that* memory is the stack.
>>
>> I really think the enclave should keep its grubby mitts off the
>> untrusted stack.  There are lots of ways to get memory, even with
>> stack-like semantics, that don't involve mucking with the stack itself.
>>
>> I have not heard a good, hard argument for why there is an absolute
>> *need* to store things on the actual untrusted stack.
> 
> Convenience and performance are the only arguments I've heard, e.g. so
> that allocating memory doesn't require an extra EEXIT->EENTER round trip.

Well, for the first access, it's going to cost a bunch asynchronous
exits to fault in all the stack pages.  Instead of that, if you had a
single area, or an explicit out-call to allocate and populate the area,
you could do it in a single EEXIT and zero asynchronous exits for demand
page faults.

So, it might be convenient, but I'm rather suspicious of any performance
arguments.


Re: RFC: userspace exception fixups

2018-11-08 Thread Sean Christopherson
On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote:
> On 11/8/18 12:05 PM, Andy Lutomirski wrote:
> > Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
> > not the most terrible thing in the world.  But could the SDK live with
> > something more like my suggestion where the vDSO supplies a normal
> > function that takes a struct containing registers that are visible to
> > the enclave?  This would make it extremely awkward for the enclave to
> > use the untrusted stack per se, but it would make it quite easy (I
> > think) for the untrusted part of the SDK to allocate some extra memory
> > and just tell the enclave that *that* memory is the stack.
> 
> I really think the enclave should keep its grubby mitts off the
> untrusted stack.  There are lots of ways to get memory, even with
> stack-like semantics, that don't involve mucking with the stack itself.
> 
> I have not heard a good, hard argument for why there is an absolute
> *need* to store things on the actual untrusted stack.

Convenience and performance are the only arguments I've heard, e.g. so
that allocating memory doesn't require an extra EEXIT->EENTER round trip.

> We could quite easily have the untrusted code just promise to allocate a
> stack-sized virtual area (even derived from the stack rlimit size) and
> pass that into the enclave for parameter use.

I agree more and more the further I dig.  AFAIK there is no need to for
the enclave to actually load %rsp.  The initial EENTER can pass in the
base/top of the pseudo-stack and from there the enclave can manage it
purely in software.


Re: RFC: userspace exception fixups

2018-11-08 Thread Sean Christopherson
On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote:
> On 11/8/18 12:05 PM, Andy Lutomirski wrote:
> > Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
> > not the most terrible thing in the world.  But could the SDK live with
> > something more like my suggestion where the vDSO supplies a normal
> > function that takes a struct containing registers that are visible to
> > the enclave?  This would make it extremely awkward for the enclave to
> > use the untrusted stack per se, but it would make it quite easy (I
> > think) for the untrusted part of the SDK to allocate some extra memory
> > and just tell the enclave that *that* memory is the stack.
> 
> I really think the enclave should keep its grubby mitts off the
> untrusted stack.  There are lots of ways to get memory, even with
> stack-like semantics, that don't involve mucking with the stack itself.
> 
> I have not heard a good, hard argument for why there is an absolute
> *need* to store things on the actual untrusted stack.

Convenience and performance are the only arguments I've heard, e.g. so
that allocating memory doesn't require an extra EEXIT->EENTER round trip.

> We could quite easily have the untrusted code just promise to allocate a
> stack-sized virtual area (even derived from the stack rlimit size) and
> pass that into the enclave for parameter use.

I agree more and more the further I dig.  AFAIK there is no need to for
the enclave to actually load %rsp.  The initial EENTER can pass in the
base/top of the pseudo-stack and from there the enclave can manage it
purely in software.


Re: RFC: userspace exception fixups

2018-11-08 Thread Dave Hansen
On 11/8/18 12:05 PM, Andy Lutomirski wrote:
> Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
> not the most terrible thing in the world.  But could the SDK live with
> something more like my suggestion where the vDSO supplies a normal
> function that takes a struct containing registers that are visible to
> the enclave?  This would make it extremely awkward for the enclave to
> use the untrusted stack per se, but it would make it quite easy (I
> think) for the untrusted part of the SDK to allocate some extra memory
> and just tell the enclave that *that* memory is the stack.

I really think the enclave should keep its grubby mitts off the
untrusted stack.  There are lots of ways to get memory, even with
stack-like semantics, that don't involve mucking with the stack itself.

I have not heard a good, hard argument for why there is an absolute
*need* to store things on the actual untrusted stack.

We could quite easily have the untrusted code just promise to allocate a
stack-sized virtual area (even derived from the stack rlimit size) and
pass that into the enclave for parameter use.


Re: RFC: userspace exception fixups

2018-11-08 Thread Dave Hansen
On 11/8/18 12:05 PM, Andy Lutomirski wrote:
> Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
> not the most terrible thing in the world.  But could the SDK live with
> something more like my suggestion where the vDSO supplies a normal
> function that takes a struct containing registers that are visible to
> the enclave?  This would make it extremely awkward for the enclave to
> use the untrusted stack per se, but it would make it quite easy (I
> think) for the untrusted part of the SDK to allocate some extra memory
> and just tell the enclave that *that* memory is the stack.

I really think the enclave should keep its grubby mitts off the
untrusted stack.  There are lots of ways to get memory, even with
stack-like semantics, that don't involve mucking with the stack itself.

I have not heard a good, hard argument for why there is an absolute
*need* to store things on the actual untrusted stack.

We could quite easily have the untrusted code just promise to allocate a
stack-sized virtual area (even derived from the stack rlimit size) and
pass that into the enclave for parameter use.


Re: RFC: userspace exception fixups

2018-11-08 Thread Andy Lutomirski
On Thu, Nov 8, 2018 at 11:54 AM Sean Christopherson
 wrote:
>
> On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote:
> >
> >
> > > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> > >
> > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> > >> True, but what if we have a nasty enclave that writes to memory just
> > >> below SP *before* decrementing SP?
> > >
> > > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > >
> > >1. EENTER
> > >2. Hardware sets eenter_hwframe->sp = %sp
> > >3. Enclave runs... wants to do out-call
> > >4. Enclave sets up parameters:
> > >memcpy(_hwframe->sp[-offset], arg1, size);
> > >...
> > >5. Enclave sets eenter_hwframe->sp -= offset
> > >
> > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > >
> > > But, this is one of those "fun" parts of the ABI that I think we need to
> > > talk about.  If we do this, we also basically require that the code
> > > which handles asynchronous exits must *not* write to the stack.  That's
> > > not hard because it's typically just a single ERESUME instruction, but
> > > it *is* a requirement.
> > >
> >
> > I was assuming that the async exit stuff was completely hidden by the
> > API.  The AEP code would decide whether the exit got fixed up by the
> > kernel (which may or may not be easy to tell — can the code even tell
> > without kernel help whether it was, say, an IRQ vs #UD?) and then either
> > do ERESUME or cause sgx_enter_enclave() to return with an appropriate
> > return value.
>
> Ok, SDK folks came up with an idea that would allow them to use vDSO,
> albeit with a bit of ugliness and potentially a ROP-attack issue.
> Definitely some weirdness, but the weirdness is well contained, unlike
> the magic prefix approach.
>
> Provide two enter_enclave() vDSO "functions".  The first is a normal
> function with a normal C interface.  The second is a blob of code that
> is "called" and "returns" via indirect jmp, and can be used by SGX
> runtimes that want to use the untrusted stack for out-calls from the
> enclave.
>
> For the indirect jmp "function", use %rbp to stash the return address
> of the caller (either in %rbp itself or in memory pointed to by %rbp).
> It works because hardware also saves/restores %rbp along with %rsp when
> doing enclave transitions, and the SDK can live with %rbp being
> off-limits.  Fault info is passed via registers.

Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
not the most terrible thing in the world.  But could the SDK live with
something more like my suggestion where the vDSO supplies a normal
function that takes a struct containing registers that are visible to
the enclave?  This would make it extremely awkward for the enclave to
use the untrusted stack per se, but it would make it quite easy (I
think) for the untrusted part of the SDK to allocate some extra memory
and just tell the enclave that *that* memory is the stack.

AFAFICS we do have two registers that genuinely are preserved: FSBASE
and GSBASE.  Which is a good thing, because otherwise SGX enablement
would currently be a privilege escalation issue due to making GSBASE
writable when it should not be.

This whole thing is a mess.  I'm starting to think that the cleanest
solution would be to provide a way to just tell the kernel that
certain RIP values have exception fixups.


Re: RFC: userspace exception fixups

2018-11-08 Thread Andy Lutomirski
On Thu, Nov 8, 2018 at 11:54 AM Sean Christopherson
 wrote:
>
> On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote:
> >
> >
> > > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> > >
> > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> > >> True, but what if we have a nasty enclave that writes to memory just
> > >> below SP *before* decrementing SP?
> > >
> > > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > >
> > >1. EENTER
> > >2. Hardware sets eenter_hwframe->sp = %sp
> > >3. Enclave runs... wants to do out-call
> > >4. Enclave sets up parameters:
> > >memcpy(_hwframe->sp[-offset], arg1, size);
> > >...
> > >5. Enclave sets eenter_hwframe->sp -= offset
> > >
> > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > >
> > > But, this is one of those "fun" parts of the ABI that I think we need to
> > > talk about.  If we do this, we also basically require that the code
> > > which handles asynchronous exits must *not* write to the stack.  That's
> > > not hard because it's typically just a single ERESUME instruction, but
> > > it *is* a requirement.
> > >
> >
> > I was assuming that the async exit stuff was completely hidden by the
> > API.  The AEP code would decide whether the exit got fixed up by the
> > kernel (which may or may not be easy to tell — can the code even tell
> > without kernel help whether it was, say, an IRQ vs #UD?) and then either
> > do ERESUME or cause sgx_enter_enclave() to return with an appropriate
> > return value.
>
> Ok, SDK folks came up with an idea that would allow them to use vDSO,
> albeit with a bit of ugliness and potentially a ROP-attack issue.
> Definitely some weirdness, but the weirdness is well contained, unlike
> the magic prefix approach.
>
> Provide two enter_enclave() vDSO "functions".  The first is a normal
> function with a normal C interface.  The second is a blob of code that
> is "called" and "returns" via indirect jmp, and can be used by SGX
> runtimes that want to use the untrusted stack for out-calls from the
> enclave.
>
> For the indirect jmp "function", use %rbp to stash the return address
> of the caller (either in %rbp itself or in memory pointed to by %rbp).
> It works because hardware also saves/restores %rbp along with %rsp when
> doing enclave transitions, and the SDK can live with %rbp being
> off-limits.  Fault info is passed via registers.

Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
not the most terrible thing in the world.  But could the SDK live with
something more like my suggestion where the vDSO supplies a normal
function that takes a struct containing registers that are visible to
the enclave?  This would make it extremely awkward for the enclave to
use the untrusted stack per se, but it would make it quite easy (I
think) for the untrusted part of the SDK to allocate some extra memory
and just tell the enclave that *that* memory is the stack.

AFAFICS we do have two registers that genuinely are preserved: FSBASE
and GSBASE.  Which is a good thing, because otherwise SGX enablement
would currently be a privilege escalation issue due to making GSBASE
writable when it should not be.

This whole thing is a mess.  I'm starting to think that the cleanest
solution would be to provide a way to just tell the kernel that
certain RIP values have exception fixups.


Re: RFC: userspace exception fixups

2018-11-08 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> > 
> >> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> >> True, but what if we have a nasty enclave that writes to memory just
> >> below SP *before* decrementing SP?
> > 
> > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > 
> >1. EENTER
> >2. Hardware sets eenter_hwframe->sp = %sp
> >3. Enclave runs... wants to do out-call
> >4. Enclave sets up parameters:
> >memcpy(_hwframe->sp[-offset], arg1, size);
> >...
> >5. Enclave sets eenter_hwframe->sp -= offset
> > 
> > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > 
> > But, this is one of those "fun" parts of the ABI that I think we need to
> > talk about.  If we do this, we also basically require that the code
> > which handles asynchronous exits must *not* write to the stack.  That's
> > not hard because it's typically just a single ERESUME instruction, but
> > it *is* a requirement.
> > 
> 
> I was assuming that the async exit stuff was completely hidden by the
> API.  The AEP code would decide whether the exit got fixed up by the
> kernel (which may or may not be easy to tell — can the code even tell
> without kernel help whether it was, say, an IRQ vs #UD?) and then either
> do ERESUME or cause sgx_enter_enclave() to return with an appropriate
> return value.

Ok, SDK folks came up with an idea that would allow them to use vDSO,
albeit with a bit of ugliness and potentially a ROP-attack issue.
Definitely some weirdness, but the weirdness is well contained, unlike
the magic prefix approach.

Provide two enter_enclave() vDSO "functions".  The first is a normal
function with a normal C interface.  The second is a blob of code that
is "called" and "returns" via indirect jmp, and can be used by SGX
runtimes that want to use the untrusted stack for out-calls from the
enclave.

For the indirect jmp "function", use %rbp to stash the return address
of the caller (either in %rbp itself or in memory pointed to by %rbp).
It works because hardware also saves/restores %rbp along with %rsp when
doing enclave transitions, and the SDK can live with %rbp being
off-limits.  Fault info is passed via registers.

Basic idea for the "functions" below.  The fixup stuff is obviously not
wired up correctly, just trying to convey the concept.



struct enclu_fault_info {
unsigned intleaf;
unsigned inttrapnr;
unsigned interror_code;
unsigned long   address;
};

int __vdso_enter_enclave(void *tcs, struct enclu_fault_info *fault_info)
{
unsigned int leaf, trapnr;

asm volatile (
"lea2f(%%rip), %%rcx\n\t"
"1: enclu\n\t"
"jmp3f\n\t"

/* ERESUME trampoline */
"2: enclu\n\t"
"ud2\n\t"

/* out: */
"3:\n"

/* EENTER fixup */
".pushsection .fixup,\"ax\"\n\t"
"4:\n\t"
"mov%%eax, %%edi\n\t"
"movl   $"__stringify(SGX_EENTER)", %%eax\n\t"
"jmp3b\n\t"
".popsection\n\t"
_ASM_EXTABLE_FAULT(1b, 4b)

/* ERESUME FIXUP */
".pushsection .fixup,\"ax\"\n\t"
"5:\n\t"
"mov%%eax, %%edi\n\t"
"movl   $"__stringify(SGX_ERESUME)", %%eax\n\t"
"jmp3b\n\t"
".popsection\n\t"
_ASM_EXTABLE_FAULT(2b, 5b)

: "=a"(leaf), "=D" (trapnr)
: "a" (SGX_EENTER), "b" (tcs)
: "cc", "memory", "rcx", "rdx", "rsi", "r8", "r9", "r10",
  "r11", "r12", "r13", "r14", "r15"
);

if (leaf == SGX_EEXIT)
return 0;

if (fault_info) {
fault_info->leaf = leaf;
fault_info->trapnr = trapnr;
fault_info->error_code = 0;
fault_info->address = 0;
}

return -EFAULT;
}


GLOBAL(__vdso_enter_enclave_no_stack)
endbr64

/* %rbp = return target, %rbx = tcs */
leaq3f(%rip), %rcx
movl$2, %eax
1:  enclu

/* "return" to "caller" */
2:  jmp *%rbp

/* ERESUME trampoline */
3:  enclu
ud2

/* EENTER fixup handler */
4:  movq%rax, %rdi
movl$2, %eax
/* %rsi = error code, %rdx = address */
jmp 2b

/* ERESUME fixup handler */
5:  movq%rax, %rdi
movl$3, %eax
/* %rsi = error code, %rdx = address */
jmp 2b




Re: RFC: userspace exception fixups

2018-11-08 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> > 
> >> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> >> True, but what if we have a nasty enclave that writes to memory just
> >> below SP *before* decrementing SP?
> > 
> > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > 
> >1. EENTER
> >2. Hardware sets eenter_hwframe->sp = %sp
> >3. Enclave runs... wants to do out-call
> >4. Enclave sets up parameters:
> >memcpy(_hwframe->sp[-offset], arg1, size);
> >...
> >5. Enclave sets eenter_hwframe->sp -= offset
> > 
> > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > 
> > But, this is one of those "fun" parts of the ABI that I think we need to
> > talk about.  If we do this, we also basically require that the code
> > which handles asynchronous exits must *not* write to the stack.  That's
> > not hard because it's typically just a single ERESUME instruction, but
> > it *is* a requirement.
> > 
> 
> I was assuming that the async exit stuff was completely hidden by the
> API.  The AEP code would decide whether the exit got fixed up by the
> kernel (which may or may not be easy to tell — can the code even tell
> without kernel help whether it was, say, an IRQ vs #UD?) and then either
> do ERESUME or cause sgx_enter_enclave() to return with an appropriate
> return value.

Ok, SDK folks came up with an idea that would allow them to use vDSO,
albeit with a bit of ugliness and potentially a ROP-attack issue.
Definitely some weirdness, but the weirdness is well contained, unlike
the magic prefix approach.

Provide two enter_enclave() vDSO "functions".  The first is a normal
function with a normal C interface.  The second is a blob of code that
is "called" and "returns" via indirect jmp, and can be used by SGX
runtimes that want to use the untrusted stack for out-calls from the
enclave.

For the indirect jmp "function", use %rbp to stash the return address
of the caller (either in %rbp itself or in memory pointed to by %rbp).
It works because hardware also saves/restores %rbp along with %rsp when
doing enclave transitions, and the SDK can live with %rbp being
off-limits.  Fault info is passed via registers.

Basic idea for the "functions" below.  The fixup stuff is obviously not
wired up correctly, just trying to convey the concept.



struct enclu_fault_info {
unsigned intleaf;
unsigned inttrapnr;
unsigned interror_code;
unsigned long   address;
};

int __vdso_enter_enclave(void *tcs, struct enclu_fault_info *fault_info)
{
unsigned int leaf, trapnr;

asm volatile (
"lea2f(%%rip), %%rcx\n\t"
"1: enclu\n\t"
"jmp3f\n\t"

/* ERESUME trampoline */
"2: enclu\n\t"
"ud2\n\t"

/* out: */
"3:\n"

/* EENTER fixup */
".pushsection .fixup,\"ax\"\n\t"
"4:\n\t"
"mov%%eax, %%edi\n\t"
"movl   $"__stringify(SGX_EENTER)", %%eax\n\t"
"jmp3b\n\t"
".popsection\n\t"
_ASM_EXTABLE_FAULT(1b, 4b)

/* ERESUME FIXUP */
".pushsection .fixup,\"ax\"\n\t"
"5:\n\t"
"mov%%eax, %%edi\n\t"
"movl   $"__stringify(SGX_ERESUME)", %%eax\n\t"
"jmp3b\n\t"
".popsection\n\t"
_ASM_EXTABLE_FAULT(2b, 5b)

: "=a"(leaf), "=D" (trapnr)
: "a" (SGX_EENTER), "b" (tcs)
: "cc", "memory", "rcx", "rdx", "rsi", "r8", "r9", "r10",
  "r11", "r12", "r13", "r14", "r15"
);

if (leaf == SGX_EEXIT)
return 0;

if (fault_info) {
fault_info->leaf = leaf;
fault_info->trapnr = trapnr;
fault_info->error_code = 0;
fault_info->address = 0;
}

return -EFAULT;
}


GLOBAL(__vdso_enter_enclave_no_stack)
endbr64

/* %rbp = return target, %rbx = tcs */
leaq3f(%rip), %rcx
movl$2, %eax
1:  enclu

/* "return" to "caller" */
2:  jmp *%rbp

/* ERESUME trampoline */
3:  enclu
ud2

/* EENTER fixup handler */
4:  movq%rax, %rdi
movl$2, %eax
/* %rsi = error code, %rdx = address */
jmp 2b

/* ERESUME fixup handler */
5:  movq%rax, %rdi
movl$3, %eax
/* %rsi = error code, %rdx = address */
jmp 2b




Re: RFC: userspace exception fixups

2018-11-08 Thread Jarkko Sakkinen
On Wed, Nov 07, 2018 at 01:40:59PM -0800, Sean Christopherson wrote:
> > In that case it seems like the only way to use SGX that's not a gaping
> > security hole is to run the SGX enclave in its own fully-seccomp (or
> > equivalent) process, with no host application in the same address
> > space. Since the host application can't see the contents of the
> > enclave to make any determination of whether it's safe to run, running
> > it in the same address space only makes sense if the cpu provides
> > protection against unwanted accesses to the host's memory from the
> > enclave -- and according to you, it doesn't.
> 
> The enclave's code (and any initial data) isn't encrypted until the
> pages are loaded into the Enclave Page Cache (EPC), which can only
> be done by the kernel (via ENCLS[EADD]).  In other words, both the
> kernel and userspace can vet the code/data before running an enclave.
> 
> Practically speaking, an enclave will be coupled with an untrusted
> userspace runtime, i.e. it's loader.  Enclaves are also measured
> as part of their build process, and so the enclave loader needs to
> know which pages to add to the measurement, and in what order.  I
> guess technically speaking an enclave could have zero pages added
> to its measurement, but that'd probably be a big red flag that said
> enclave is up to something fishy.

IMHO the whole idea adds too much policy into kernel even if it would
be doable. You can easily spawn untrusted run-time and enclave to its
own process.

Seccomp limits the syscall space and enclaves cannot do syscalls in the
first place. It is the URT that will do them behalf of the enclave.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-08 Thread Jarkko Sakkinen
On Wed, Nov 07, 2018 at 01:40:59PM -0800, Sean Christopherson wrote:
> > In that case it seems like the only way to use SGX that's not a gaping
> > security hole is to run the SGX enclave in its own fully-seccomp (or
> > equivalent) process, with no host application in the same address
> > space. Since the host application can't see the contents of the
> > enclave to make any determination of whether it's safe to run, running
> > it in the same address space only makes sense if the cpu provides
> > protection against unwanted accesses to the host's memory from the
> > enclave -- and according to you, it doesn't.
> 
> The enclave's code (and any initial data) isn't encrypted until the
> pages are loaded into the Enclave Page Cache (EPC), which can only
> be done by the kernel (via ENCLS[EADD]).  In other words, both the
> kernel and userspace can vet the code/data before running an enclave.
> 
> Practically speaking, an enclave will be coupled with an untrusted
> userspace runtime, i.e. it's loader.  Enclaves are also measured
> as part of their build process, and so the enclave loader needs to
> know which pages to add to the measurement, and in what order.  I
> guess technically speaking an enclave could have zero pages added
> to its measurement, but that'd probably be a big red flag that said
> enclave is up to something fishy.

IMHO the whole idea adds too much policy into kernel even if it would
be doable. You can easily spawn untrusted run-time and enclave to its
own process.

Seccomp limits the syscall space and enclaves cannot do syscalls in the
first place. It is the URT that will do them behalf of the enclave.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-08 Thread Jarkko Sakkinen
On Wed, Nov 07, 2018 at 12:56:58PM -0800, Dave Hansen wrote:
> On 11/7/18 11:01 AM, Sean Christopherson wrote:
> > Going off comments in similar code related to UMIP, we'd need to figure
> > out how to handle protection keys.
> 
> There are two options:
> 1. Don't depend on the userspace mapping.  Do get_user_pages() to find
>the instruction in the kernel direct map, and use that.
> 2. Do a WRPKRU that allows read access, do the read, then put PKRU back.
>This is a pain because of preemption and all that jazz.
> 
> Right now, we just let the prefetch instruction detection fail if you
> mark it unreadable with pkeys.  Tough cookies, basically.  But, that's
> just the kernel being nice, but you need it for functionality, so it's
> tougher.

I would go with one because it is the stable way to do it and we are
100% sure to not conflict with pk's.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-08 Thread Jarkko Sakkinen
On Wed, Nov 07, 2018 at 12:56:58PM -0800, Dave Hansen wrote:
> On 11/7/18 11:01 AM, Sean Christopherson wrote:
> > Going off comments in similar code related to UMIP, we'd need to figure
> > out how to handle protection keys.
> 
> There are two options:
> 1. Don't depend on the userspace mapping.  Do get_user_pages() to find
>the instruction in the kernel direct map, and use that.
> 2. Do a WRPKRU that allows read access, do the read, then put PKRU back.
>This is a pain because of preemption and all that jazz.
> 
> Right now, we just let the prefetch instruction detection fail if you
> mark it unreadable with pkeys.  Tough cookies, basically.  But, that's
> just the kernel being nice, but you need it for functionality, so it's
> tougher.

I would go with one because it is the stable way to do it and we are
100% sure to not conflict with pk's.

/Jarkko


Re: RFC: userspace exception fixups

2018-11-07 Thread Sean Christopherson
On Wed, Nov 07, 2018 at 04:27:58PM -0500, Rich Felker wrote:
> On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote:
> > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  
> > > > wrote:
> > > > >
> > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > > > I almost feel like the right solution is to call into SGX on its own
> > > > > > private stack or maybe even its own private address space.
> > > > >
> > > > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat 
> > > > > the
> > > > > enclave like its own "thread" with its own stack and its own set of
> > > > > registers and context?  That seems like a much more workable model 
> > > > > than
> > > > > trying to weave it together with the EENTER context.
> > > > 
> > > > So maybe the API should be, roughly
> > > > 
> > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > > > host_state *state);
> > > > sgx_exit_reason_t sgx_resume_enclave(same args);
> > > > 
> > > > where host_state is something like:
> > > > 
> > > > struct host_state {
> > > >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > > > };
> > > > 
> > > > and the values in host_state explicitly have nothing to do with the
> > > > actual host registers.  So, if you want to use the outcall mechanism,
> > > > you'd allocate some memory, point sp to that memory, call
> > > > sgx_enter_enclave(), and then read that memory to do the outcall.
> > > > 
> > > > Actually implementing this would be distinctly nontrivial, and would
> > > > almost certainly need some degree of kernel help to avoid an explosion
> > > > when a signal gets delivered while we have host_state.sp loaded into
> > > > the actual SP register.  Maybe rseq could help with this?
> > > > 
> > > > The ISA here is IMO not well thought through.
> > > 
> > > Maybe I'm mistaken about some fundamentals here, but my understanding
> > > of SGX is that the whole point is that the host application and the
> > > code running in the enclave are mutually adversarial towards one
> > > another. Do any or all of the proposed protocols here account for this
> > > and fully protect the host application from malicious code in the
> > > enclave? It seems that having control over the register file on exit
> > > from the enclave is fundamentally problematic but I assume there must
> > > be some way I'm missing that this is fixed up.
> > 
> > SGX provides protections for the enclave but not the other way around.
> > The kernel has all of its normal non-SGX protections in place, but the
> > enclave can certainly wreak havoc on its userspace process.  The basic
> > design idea is that the enclave is a specialized .so that gets extra
> > security protections but is still effectively part of the overall
> > application, e.g. it has full access to its host userspace process'
> > virtual memory.
> 
> In that case it seems like the only way to use SGX that's not a gaping
> security hole is to run the SGX enclave in its own fully-seccomp (or
> equivalent) process, with no host application in the same address
> space. Since the host application can't see the contents of the
> enclave to make any determination of whether it's safe to run, running
> it in the same address space only makes sense if the cpu provides
> protection against unwanted accesses to the host's memory from the
> enclave -- and according to you, it doesn't.

The enclave's code (and any initial data) isn't encrypted until the
pages are loaded into the Enclave Page Cache (EPC), which can only
be done by the kernel (via ENCLS[EADD]).  In other words, both the
kernel and userspace can vet the code/data before running an enclave.

Practically speaking, an enclave will be coupled with an untrusted
userspace runtime, i.e. it's loader.  Enclaves are also measured
as part of their build process, and so the enclave loader needs to
know which pages to add to the measurement, and in what order.  I
guess technically speaking an enclave could have zero pages added
to its measurement, but that'd probably be a big red flag that said
enclave is up to something fishy.


Re: RFC: userspace exception fixups

2018-11-07 Thread Sean Christopherson
On Wed, Nov 07, 2018 at 04:27:58PM -0500, Rich Felker wrote:
> On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote:
> > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  
> > > > wrote:
> > > > >
> > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > > > I almost feel like the right solution is to call into SGX on its own
> > > > > > private stack or maybe even its own private address space.
> > > > >
> > > > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat 
> > > > > the
> > > > > enclave like its own "thread" with its own stack and its own set of
> > > > > registers and context?  That seems like a much more workable model 
> > > > > than
> > > > > trying to weave it together with the EENTER context.
> > > > 
> > > > So maybe the API should be, roughly
> > > > 
> > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > > > host_state *state);
> > > > sgx_exit_reason_t sgx_resume_enclave(same args);
> > > > 
> > > > where host_state is something like:
> > > > 
> > > > struct host_state {
> > > >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > > > };
> > > > 
> > > > and the values in host_state explicitly have nothing to do with the
> > > > actual host registers.  So, if you want to use the outcall mechanism,
> > > > you'd allocate some memory, point sp to that memory, call
> > > > sgx_enter_enclave(), and then read that memory to do the outcall.
> > > > 
> > > > Actually implementing this would be distinctly nontrivial, and would
> > > > almost certainly need some degree of kernel help to avoid an explosion
> > > > when a signal gets delivered while we have host_state.sp loaded into
> > > > the actual SP register.  Maybe rseq could help with this?
> > > > 
> > > > The ISA here is IMO not well thought through.
> > > 
> > > Maybe I'm mistaken about some fundamentals here, but my understanding
> > > of SGX is that the whole point is that the host application and the
> > > code running in the enclave are mutually adversarial towards one
> > > another. Do any or all of the proposed protocols here account for this
> > > and fully protect the host application from malicious code in the
> > > enclave? It seems that having control over the register file on exit
> > > from the enclave is fundamentally problematic but I assume there must
> > > be some way I'm missing that this is fixed up.
> > 
> > SGX provides protections for the enclave but not the other way around.
> > The kernel has all of its normal non-SGX protections in place, but the
> > enclave can certainly wreak havoc on its userspace process.  The basic
> > design idea is that the enclave is a specialized .so that gets extra
> > security protections but is still effectively part of the overall
> > application, e.g. it has full access to its host userspace process'
> > virtual memory.
> 
> In that case it seems like the only way to use SGX that's not a gaping
> security hole is to run the SGX enclave in its own fully-seccomp (or
> equivalent) process, with no host application in the same address
> space. Since the host application can't see the contents of the
> enclave to make any determination of whether it's safe to run, running
> it in the same address space only makes sense if the cpu provides
> protection against unwanted accesses to the host's memory from the
> enclave -- and according to you, it doesn't.

The enclave's code (and any initial data) isn't encrypted until the
pages are loaded into the Enclave Page Cache (EPC), which can only
be done by the kernel (via ENCLS[EADD]).  In other words, both the
kernel and userspace can vet the code/data before running an enclave.

Practically speaking, an enclave will be coupled with an untrusted
userspace runtime, i.e. it's loader.  Enclaves are also measured
as part of their build process, and so the enclave loader needs to
know which pages to add to the measurement, and in what order.  I
guess technically speaking an enclave could have zero pages added
to its measurement, but that'd probably be a big red flag that said
enclave is up to something fishy.


Re: RFC: userspace exception fixups

2018-11-07 Thread Andy Lutomirski
On Wed, Nov 7, 2018 at 1:28 PM Rich Felker  wrote:
>
> On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote:
> > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  
> > > > wrote:
> > > > >
> > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > > > I almost feel like the right solution is to call into SGX on its own
> > > > > > private stack or maybe even its own private address space.
> > > > >
> > > > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat 
> > > > > the
> > > > > enclave like its own "thread" with its own stack and its own set of
> > > > > registers and context?  That seems like a much more workable model 
> > > > > than
> > > > > trying to weave it together with the EENTER context.
> > > >
> > > > So maybe the API should be, roughly
> > > >
> > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > > > host_state *state);
> > > > sgx_exit_reason_t sgx_resume_enclave(same args);
> > > >
> > > > where host_state is something like:
> > > >
> > > > struct host_state {
> > > >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > > > };
> > > >
> > > > and the values in host_state explicitly have nothing to do with the
> > > > actual host registers.  So, if you want to use the outcall mechanism,
> > > > you'd allocate some memory, point sp to that memory, call
> > > > sgx_enter_enclave(), and then read that memory to do the outcall.
> > > >
> > > > Actually implementing this would be distinctly nontrivial, and would
> > > > almost certainly need some degree of kernel help to avoid an explosion
> > > > when a signal gets delivered while we have host_state.sp loaded into
> > > > the actual SP register.  Maybe rseq could help with this?
> > > >
> > > > The ISA here is IMO not well thought through.
> > >
> > > Maybe I'm mistaken about some fundamentals here, but my understanding
> > > of SGX is that the whole point is that the host application and the
> > > code running in the enclave are mutually adversarial towards one
> > > another. Do any or all of the proposed protocols here account for this
> > > and fully protect the host application from malicious code in the
> > > enclave? It seems that having control over the register file on exit
> > > from the enclave is fundamentally problematic but I assume there must
> > > be some way I'm missing that this is fixed up.
> >
> > SGX provides protections for the enclave but not the other way around.
> > The kernel has all of its normal non-SGX protections in place, but the
> > enclave can certainly wreak havoc on its userspace process.  The basic
> > design idea is that the enclave is a specialized .so that gets extra
> > security protections but is still effectively part of the overall
> > application, e.g. it has full access to its host userspace process'
> > virtual memory.
>
> In that case it seems like the only way to use SGX that's not a gaping
> security hole is to run the SGX enclave in its own fully-seccomp (or
> equivalent) process, with no host application in the same address
> space. Since the host application can't see the contents of the
> enclave to make any determination of whether it's safe to run, running
> it in the same address space only makes sense if the cpu provides
> protection against unwanted accesses to the host's memory from the
> enclave -- and according to you, it doesn't.
>

I think the theory is that the enclave is shipped with the host application.

That being said, a way to run the enclave in an address space that has
basically nothing else (except an ENCLU instruction as a trampoline)
would be quite nice.


Re: RFC: userspace exception fixups

2018-11-07 Thread Andy Lutomirski
On Wed, Nov 7, 2018 at 1:28 PM Rich Felker  wrote:
>
> On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote:
> > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  
> > > > wrote:
> > > > >
> > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > > > I almost feel like the right solution is to call into SGX on its own
> > > > > > private stack or maybe even its own private address space.
> > > > >
> > > > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat 
> > > > > the
> > > > > enclave like its own "thread" with its own stack and its own set of
> > > > > registers and context?  That seems like a much more workable model 
> > > > > than
> > > > > trying to weave it together with the EENTER context.
> > > >
> > > > So maybe the API should be, roughly
> > > >
> > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > > > host_state *state);
> > > > sgx_exit_reason_t sgx_resume_enclave(same args);
> > > >
> > > > where host_state is something like:
> > > >
> > > > struct host_state {
> > > >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > > > };
> > > >
> > > > and the values in host_state explicitly have nothing to do with the
> > > > actual host registers.  So, if you want to use the outcall mechanism,
> > > > you'd allocate some memory, point sp to that memory, call
> > > > sgx_enter_enclave(), and then read that memory to do the outcall.
> > > >
> > > > Actually implementing this would be distinctly nontrivial, and would
> > > > almost certainly need some degree of kernel help to avoid an explosion
> > > > when a signal gets delivered while we have host_state.sp loaded into
> > > > the actual SP register.  Maybe rseq could help with this?
> > > >
> > > > The ISA here is IMO not well thought through.
> > >
> > > Maybe I'm mistaken about some fundamentals here, but my understanding
> > > of SGX is that the whole point is that the host application and the
> > > code running in the enclave are mutually adversarial towards one
> > > another. Do any or all of the proposed protocols here account for this
> > > and fully protect the host application from malicious code in the
> > > enclave? It seems that having control over the register file on exit
> > > from the enclave is fundamentally problematic but I assume there must
> > > be some way I'm missing that this is fixed up.
> >
> > SGX provides protections for the enclave but not the other way around.
> > The kernel has all of its normal non-SGX protections in place, but the
> > enclave can certainly wreak havoc on its userspace process.  The basic
> > design idea is that the enclave is a specialized .so that gets extra
> > security protections but is still effectively part of the overall
> > application, e.g. it has full access to its host userspace process'
> > virtual memory.
>
> In that case it seems like the only way to use SGX that's not a gaping
> security hole is to run the SGX enclave in its own fully-seccomp (or
> equivalent) process, with no host application in the same address
> space. Since the host application can't see the contents of the
> enclave to make any determination of whether it's safe to run, running
> it in the same address space only makes sense if the cpu provides
> protection against unwanted accesses to the host's memory from the
> enclave -- and according to you, it doesn't.
>

I think the theory is that the enclave is shipped with the host application.

That being said, a way to run the enclave in an address space that has
basically nothing else (except an ENCLU instruction as a trampoline)
would be quite nice.


Re: RFC: userspace exception fixups

2018-11-07 Thread Rich Felker
On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote:
> On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
> > > >
> > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > > I almost feel like the right solution is to call into SGX on its own
> > > > > private stack or maybe even its own private address space.
> > > >
> > > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> > > > enclave like its own "thread" with its own stack and its own set of
> > > > registers and context?  That seems like a much more workable model than
> > > > trying to weave it together with the EENTER context.
> > > 
> > > So maybe the API should be, roughly
> > > 
> > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > > host_state *state);
> > > sgx_exit_reason_t sgx_resume_enclave(same args);
> > > 
> > > where host_state is something like:
> > > 
> > > struct host_state {
> > >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > > };
> > > 
> > > and the values in host_state explicitly have nothing to do with the
> > > actual host registers.  So, if you want to use the outcall mechanism,
> > > you'd allocate some memory, point sp to that memory, call
> > > sgx_enter_enclave(), and then read that memory to do the outcall.
> > > 
> > > Actually implementing this would be distinctly nontrivial, and would
> > > almost certainly need some degree of kernel help to avoid an explosion
> > > when a signal gets delivered while we have host_state.sp loaded into
> > > the actual SP register.  Maybe rseq could help with this?
> > > 
> > > The ISA here is IMO not well thought through.
> > 
> > Maybe I'm mistaken about some fundamentals here, but my understanding
> > of SGX is that the whole point is that the host application and the
> > code running in the enclave are mutually adversarial towards one
> > another. Do any or all of the proposed protocols here account for this
> > and fully protect the host application from malicious code in the
> > enclave? It seems that having control over the register file on exit
> > from the enclave is fundamentally problematic but I assume there must
> > be some way I'm missing that this is fixed up.
> 
> SGX provides protections for the enclave but not the other way around.
> The kernel has all of its normal non-SGX protections in place, but the
> enclave can certainly wreak havoc on its userspace process.  The basic
> design idea is that the enclave is a specialized .so that gets extra
> security protections but is still effectively part of the overall
> application, e.g. it has full access to its host userspace process'
> virtual memory.

In that case it seems like the only way to use SGX that's not a gaping
security hole is to run the SGX enclave in its own fully-seccomp (or
equivalent) process, with no host application in the same address
space. Since the host application can't see the contents of the
enclave to make any determination of whether it's safe to run, running
it in the same address space only makes sense if the cpu provides
protection against unwanted accesses to the host's memory from the
enclave -- and according to you, it doesn't.

Rich


Re: RFC: userspace exception fixups

2018-11-07 Thread Rich Felker
On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote:
> On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
> > > >
> > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > > I almost feel like the right solution is to call into SGX on its own
> > > > > private stack or maybe even its own private address space.
> > > >
> > > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> > > > enclave like its own "thread" with its own stack and its own set of
> > > > registers and context?  That seems like a much more workable model than
> > > > trying to weave it together with the EENTER context.
> > > 
> > > So maybe the API should be, roughly
> > > 
> > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > > host_state *state);
> > > sgx_exit_reason_t sgx_resume_enclave(same args);
> > > 
> > > where host_state is something like:
> > > 
> > > struct host_state {
> > >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > > };
> > > 
> > > and the values in host_state explicitly have nothing to do with the
> > > actual host registers.  So, if you want to use the outcall mechanism,
> > > you'd allocate some memory, point sp to that memory, call
> > > sgx_enter_enclave(), and then read that memory to do the outcall.
> > > 
> > > Actually implementing this would be distinctly nontrivial, and would
> > > almost certainly need some degree of kernel help to avoid an explosion
> > > when a signal gets delivered while we have host_state.sp loaded into
> > > the actual SP register.  Maybe rseq could help with this?
> > > 
> > > The ISA here is IMO not well thought through.
> > 
> > Maybe I'm mistaken about some fundamentals here, but my understanding
> > of SGX is that the whole point is that the host application and the
> > code running in the enclave are mutually adversarial towards one
> > another. Do any or all of the proposed protocols here account for this
> > and fully protect the host application from malicious code in the
> > enclave? It seems that having control over the register file on exit
> > from the enclave is fundamentally problematic but I assume there must
> > be some way I'm missing that this is fixed up.
> 
> SGX provides protections for the enclave but not the other way around.
> The kernel has all of its normal non-SGX protections in place, but the
> enclave can certainly wreak havoc on its userspace process.  The basic
> design idea is that the enclave is a specialized .so that gets extra
> security protections but is still effectively part of the overall
> application, e.g. it has full access to its host userspace process'
> virtual memory.

In that case it seems like the only way to use SGX that's not a gaping
security hole is to run the SGX enclave in its own fully-seccomp (or
equivalent) process, with no host application in the same address
space. Since the host application can't see the contents of the
enclave to make any determination of whether it's safe to run, running
it in the same address space only makes sense if the cpu provides
protection against unwanted accesses to the host's memory from the
enclave -- and according to you, it doesn't.

Rich


Re: RFC: userspace exception fixups

2018-11-07 Thread Dave Hansen
On 11/7/18 11:01 AM, Sean Christopherson wrote:
> Going off comments in similar code related to UMIP, we'd need to figure
> out how to handle protection keys.

There are two options:
1. Don't depend on the userspace mapping.  Do get_user_pages() to find
   the instruction in the kernel direct map, and use that.
2. Do a WRPKRU that allows read access, do the read, then put PKRU back.
   This is a pain because of preemption and all that jazz.

Right now, we just let the prefetch instruction detection fail if you
mark it unreadable with pkeys.  Tough cookies, basically.  But, that's
just the kernel being nice, but you need it for functionality, so it's
tougher.


Re: RFC: userspace exception fixups

2018-11-07 Thread Dave Hansen
On 11/7/18 11:01 AM, Sean Christopherson wrote:
> Going off comments in similar code related to UMIP, we'd need to figure
> out how to handle protection keys.

There are two options:
1. Don't depend on the userspace mapping.  Do get_user_pages() to find
   the instruction in the kernel direct map, and use that.
2. Do a WRPKRU that allows read access, do the read, then put PKRU back.
   This is a pain because of preemption and all that jazz.

Right now, we just let the prefetch instruction detection fail if you
mark it unreadable with pkeys.  Tough cookies, basically.  But, that's
just the kernel being nice, but you need it for functionality, so it's
tougher.


Re: RFC: userspace exception fixups

2018-11-07 Thread Sean Christopherson
On Wed, Nov 07, 2018 at 07:34:52AM -0800, Sean Christopherson wrote:
> On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote:
> > On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
> >  wrote:
> > >
> > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> > > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
> > > >  wrote:
> > > > >
> > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on 
> > > > > ENCLU
> > > > > with a specific (ignored) prefix pattern?  I.e. effectively make the 
> > > > > magic
> > > > > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU 
> > > > > isn't
> > > > > that far off the architecture, e.g. EENTER stuffs RCX with the next 
> > > > > RIP so
> > > > > that the enclave can EEXIT to immediately after the EENTER location.
> > > > >
> > > >
> > > > How does that even work, though?  On an AEX, RIP points to the ERESUME
> > > > instruction, not the EENTER instruction, so if we skip it we just end
> > > > up in lala land.
> > >
> > > Userspace would obviously need to be aware of the fixup behavior, but
> > > it actually works out fairly nicely to have a separate path for ERESUME
> > > fixup since a fault on EENTER is generally fatal, whereas as a fault on
> > > ERESUME might be recoverable.
> > >
> > 
> > Hmm.
> > 
> > >
> > > do_eenter:
> > > mov tcs, %rbx
> > > lea async_exit, %rcx
> > > mov $EENTER, %rax
> > > ENCLU
> > 
> > Or SOME_SILLY_PREFIX ENCLU?
> 
> Yeah, forgot to include that.
> 
> > >
> > > /*
> > >  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
> > >  * fault indicator, e.g. -EFAULT.
> > >  */
> > > eexit_or_eenter_fault:
> > > ret
> > 
> > But userspace wants to know whether it was a fault or not.  So I think
> > we either need two landing pads or we need to hijack a flag bit (are
> > there any known-zeroed flag bits after EEXIT?) to say whether it was a
> > fault.  And, if it was a fault, we should give the vector, the
> > sanitized error code, and possibly CR2.
> 
> As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we
> can use RAX to indicate a fault.  That's what I was trying to imply with
> EFAULT.  Here's the reg stuffing I use for the POC:
> 
>   regs->ax = EFAULT;
>   regs->di = trapnr;
>   regs->si = error_code;
>   regs->dx = address;
> 
> 
> Well-known RAX values also means the kernel fault handlers only need to
> look for SOME_SILLY_PREFIX ENCLU if RAX==2 || RAX==3, i.e. the fault
> occurred on EENTER or in an enclave (RAX is set to ERESUME's leaf as
> part of the asynchronous enlcave exit flow).

POC kernel code, 64-bit only.

Limiting this to 64-bit isn't necessary, but it makes the code prettier
and allows using REX as the magic prefix.  I like the idea of using REX
because it seems least likely to be repurposed for yet another new
feature.  I have no idea if 64-bit only will fly with the SDK folks.

Going off comments in similar code related to UMIP, we'd need to figure
out how to handle protection keys.


/* REX with all bits set, ignored by ENCLU. */
#define SGX_DO_ENCLU_FIXUP  0x4F

#define SGX_ENCLU_OPCODE0   0x0F
#define SGX_ENCLU_OPCODE1   0x01
#define SGX_ENCLU_OPCODE2   0xD7

/* ENCLU is a three-byte opcode, plus one byte for the magic prefix. */
#define SGX_ENCLU_FIXUP_INSN_LEN4

static int sgx_detect_enclu(struct pt_regs *regs)
{
unsigned char buf[SGX_ENCLU_FIXUP_INSN_LEN];

/* Look for EENTER or ERESUME in RAX, 64-bit mode only. */
if (!regs || (regs->ax != 2 && regs->ax != 3) || !user_64bit_mode(regs))
return 0;

if (copy_from_user(buf, (void __user *)(regs->ip), sizeof(buf)))
return 0;

if (buf[0] == SGX_DO_ENCLU_FIXUP &&
buf[1] == SGX_ENCLU_OPCODE0 &&
buf[2] == SGX_ENCLU_OPCODE1 &&
buf[3] == SGX_ENCLU_OPCODE2)
return SGX_ENCLU_FIXUP_INSN_LEN;

return 0;
}

bool sgx_fixup_enclu_fault(struct pt_regs *regs, int trapnr,
   unsigned long error_code, unsigned long address)
{
int insn_len;

insn_len = sgx_detect_enclu(regs);
if (!insn_len)
return false;

regs->ip += insn_len;
regs->ax = EFAULT;
regs->di = trapnr;
regs->si = error_code;
regs->dx = address;
return true;
}


Re: RFC: userspace exception fixups

2018-11-07 Thread Sean Christopherson
On Wed, Nov 07, 2018 at 07:34:52AM -0800, Sean Christopherson wrote:
> On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote:
> > On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
> >  wrote:
> > >
> > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> > > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
> > > >  wrote:
> > > > >
> > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on 
> > > > > ENCLU
> > > > > with a specific (ignored) prefix pattern?  I.e. effectively make the 
> > > > > magic
> > > > > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU 
> > > > > isn't
> > > > > that far off the architecture, e.g. EENTER stuffs RCX with the next 
> > > > > RIP so
> > > > > that the enclave can EEXIT to immediately after the EENTER location.
> > > > >
> > > >
> > > > How does that even work, though?  On an AEX, RIP points to the ERESUME
> > > > instruction, not the EENTER instruction, so if we skip it we just end
> > > > up in lala land.
> > >
> > > Userspace would obviously need to be aware of the fixup behavior, but
> > > it actually works out fairly nicely to have a separate path for ERESUME
> > > fixup since a fault on EENTER is generally fatal, whereas as a fault on
> > > ERESUME might be recoverable.
> > >
> > 
> > Hmm.
> > 
> > >
> > > do_eenter:
> > > mov tcs, %rbx
> > > lea async_exit, %rcx
> > > mov $EENTER, %rax
> > > ENCLU
> > 
> > Or SOME_SILLY_PREFIX ENCLU?
> 
> Yeah, forgot to include that.
> 
> > >
> > > /*
> > >  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
> > >  * fault indicator, e.g. -EFAULT.
> > >  */
> > > eexit_or_eenter_fault:
> > > ret
> > 
> > But userspace wants to know whether it was a fault or not.  So I think
> > we either need two landing pads or we need to hijack a flag bit (are
> > there any known-zeroed flag bits after EEXIT?) to say whether it was a
> > fault.  And, if it was a fault, we should give the vector, the
> > sanitized error code, and possibly CR2.
> 
> As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we
> can use RAX to indicate a fault.  That's what I was trying to imply with
> EFAULT.  Here's the reg stuffing I use for the POC:
> 
>   regs->ax = EFAULT;
>   regs->di = trapnr;
>   regs->si = error_code;
>   regs->dx = address;
> 
> 
> Well-known RAX values also means the kernel fault handlers only need to
> look for SOME_SILLY_PREFIX ENCLU if RAX==2 || RAX==3, i.e. the fault
> occurred on EENTER or in an enclave (RAX is set to ERESUME's leaf as
> part of the asynchronous enlcave exit flow).

POC kernel code, 64-bit only.

Limiting this to 64-bit isn't necessary, but it makes the code prettier
and allows using REX as the magic prefix.  I like the idea of using REX
because it seems least likely to be repurposed for yet another new
feature.  I have no idea if 64-bit only will fly with the SDK folks.

Going off comments in similar code related to UMIP, we'd need to figure
out how to handle protection keys.


/* REX with all bits set, ignored by ENCLU. */
#define SGX_DO_ENCLU_FIXUP  0x4F

#define SGX_ENCLU_OPCODE0   0x0F
#define SGX_ENCLU_OPCODE1   0x01
#define SGX_ENCLU_OPCODE2   0xD7

/* ENCLU is a three-byte opcode, plus one byte for the magic prefix. */
#define SGX_ENCLU_FIXUP_INSN_LEN4

static int sgx_detect_enclu(struct pt_regs *regs)
{
unsigned char buf[SGX_ENCLU_FIXUP_INSN_LEN];

/* Look for EENTER or ERESUME in RAX, 64-bit mode only. */
if (!regs || (regs->ax != 2 && regs->ax != 3) || !user_64bit_mode(regs))
return 0;

if (copy_from_user(buf, (void __user *)(regs->ip), sizeof(buf)))
return 0;

if (buf[0] == SGX_DO_ENCLU_FIXUP &&
buf[1] == SGX_ENCLU_OPCODE0 &&
buf[2] == SGX_ENCLU_OPCODE1 &&
buf[3] == SGX_ENCLU_OPCODE2)
return SGX_ENCLU_FIXUP_INSN_LEN;

return 0;
}

bool sgx_fixup_enclu_fault(struct pt_regs *regs, int trapnr,
   unsigned long error_code, unsigned long address)
{
int insn_len;

insn_len = sgx_detect_enclu(regs);
if (!insn_len)
return false;

regs->ip += insn_len;
regs->ax = EFAULT;
regs->di = trapnr;
regs->si = error_code;
regs->dx = address;
return true;
}


Re: RFC: userspace exception fixups

2018-11-07 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
>  wrote:
> >
> > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
> > >  wrote:
> > > >
> > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> > > > >
> > > > >
> > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > > > > >>  wrote:
> > > > > >>
> > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > > > > >> Sean, how does the current SDK AEX handler decide whether to do
> > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It 
> > > > > >> seems
> > > > > >> like the *CPU* could give a big hint, but I don't see where there 
> > > > > >> is
> > > > > >> any architectural indication of why the AEX code got called or any
> > > > > >> obvious way for the user code to know whether the exit was fixed 
> > > > > >> up by
> > > > > >> the kernel?
> > > > > >
> > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but 
> > > > > > that's
> > > > > > bit misleading because its signal handler may muck with the 
> > > > > > context's
> > > > > > RIP, e.g. to abort the enclave on a fatal fault.
> > > > > >
> > > > > > On an event/exception from within an enclave, the event is 
> > > > > > immediately
> > > > > > delivered after loading synthetic state and changing RIP to the AEP.
> > > > > > In other words, jamming CPU state is essentially a bunch of 
> > > > > > vectoring
> > > > > > ucode preamble, but from software's perspective it's a normal event
> > > > > > that happens to point at the AEP instead of somewhere in the 
> > > > > > enclave.
> > > > > > And because the signals the SDK cares about are all synchronous, the
> > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault 
> > > > > > logic
> > > > > > resides in its signal handler.  IRQs and whatnot simply trampoline 
> > > > > > back
> > > > > > into the enclave.
> > > > > >
> > > > > > Userspace can do something funky instead of ERESUME, but only 
> > > > > > *after*
> > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > > > > case, after the trap handler has run.
> > > > > >
> > > > > > Jumping back a bit, how much do we care about preventing userspace
> > > > > > from doing stupid things?
> > > > >
> > > > > My general feeling is that userspace should be allowed to do 
> > > > > apparently
> > > > > stupid things. For example, as far as the kernel is concerned, Wine 
> > > > > and
> > > > > DOSEMU are just user programs that do stupid things. Linux generally 
> > > > > tries
> > > > > to provide a reasonably complete view of architectural behavior. This 
> > > > > is
> > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE 
> > > > > May
> > > > > cause very odd behavior indeed. So magic fixups that do 
> > > > > non-architectural
> > > > > things are not so great.
> > > >
> > > > Sorry if I'm beating a dead horse, but what if we only did fixup on 
> > > > ENCLU
> > > > with a specific (ignored) prefix pattern?  I.e. effectively make the 
> > > > magic
> > > > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP 
> > > > so
> > > > that the enclave can EEXIT to immediately after the EENTER location.
> > > >
> > >
> > > How does that even work, though?  On an AEX, RIP points to the ERESUME
> > > instruction, not the EENTER instruction, so if we skip it we just end
> > > up in lala land.
> >
> > Userspace would obviously need to be aware of the fixup behavior, but
> > it actually works out fairly nicely to have a separate path for ERESUME
> > fixup since a fault on EENTER is generally fatal, whereas as a fault on
> > ERESUME might be recoverable.
> >
> 
> Hmm.
> 
> >
> > do_eenter:
> > mov tcs, %rbx
> > lea async_exit, %rcx
> > mov $EENTER, %rax
> > ENCLU
> 
> Or SOME_SILLY_PREFIX ENCLU?

Yeah, forgot to include that.

> >
> > /*
> >  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
> >  * fault indicator, e.g. -EFAULT.
> >  */
> > eexit_or_eenter_fault:
> > ret
> 
> But userspace wants to know whether it was a fault or not.  So I think
> we either need two landing pads or we need to hijack a flag bit (are
> there any known-zeroed flag bits after EEXIT?) to say whether it was a
> fault.  And, if it was a fault, we should give the vector, the
> sanitized error code, and possibly CR2.

As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we
can use RAX to indicate a fault.  That's what I was trying to imply with
EFAULT.  Here's the reg stuffing I use for the POC:

regs->ax = EFAULT;
regs->di = trapnr;
regs->si = error_code;
regs->dx = address;


Well-known RAX values also means the kernel fault handlers only need 

Re: RFC: userspace exception fixups

2018-11-07 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
>  wrote:
> >
> > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
> > >  wrote:
> > > >
> > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> > > > >
> > > > >
> > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > > > > >>  wrote:
> > > > > >>
> > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > > > > >> Sean, how does the current SDK AEX handler decide whether to do
> > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It 
> > > > > >> seems
> > > > > >> like the *CPU* could give a big hint, but I don't see where there 
> > > > > >> is
> > > > > >> any architectural indication of why the AEX code got called or any
> > > > > >> obvious way for the user code to know whether the exit was fixed 
> > > > > >> up by
> > > > > >> the kernel?
> > > > > >
> > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but 
> > > > > > that's
> > > > > > bit misleading because its signal handler may muck with the 
> > > > > > context's
> > > > > > RIP, e.g. to abort the enclave on a fatal fault.
> > > > > >
> > > > > > On an event/exception from within an enclave, the event is 
> > > > > > immediately
> > > > > > delivered after loading synthetic state and changing RIP to the AEP.
> > > > > > In other words, jamming CPU state is essentially a bunch of 
> > > > > > vectoring
> > > > > > ucode preamble, but from software's perspective it's a normal event
> > > > > > that happens to point at the AEP instead of somewhere in the 
> > > > > > enclave.
> > > > > > And because the signals the SDK cares about are all synchronous, the
> > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault 
> > > > > > logic
> > > > > > resides in its signal handler.  IRQs and whatnot simply trampoline 
> > > > > > back
> > > > > > into the enclave.
> > > > > >
> > > > > > Userspace can do something funky instead of ERESUME, but only 
> > > > > > *after*
> > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > > > > case, after the trap handler has run.
> > > > > >
> > > > > > Jumping back a bit, how much do we care about preventing userspace
> > > > > > from doing stupid things?
> > > > >
> > > > > My general feeling is that userspace should be allowed to do 
> > > > > apparently
> > > > > stupid things. For example, as far as the kernel is concerned, Wine 
> > > > > and
> > > > > DOSEMU are just user programs that do stupid things. Linux generally 
> > > > > tries
> > > > > to provide a reasonably complete view of architectural behavior. This 
> > > > > is
> > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE 
> > > > > May
> > > > > cause very odd behavior indeed. So magic fixups that do 
> > > > > non-architectural
> > > > > things are not so great.
> > > >
> > > > Sorry if I'm beating a dead horse, but what if we only did fixup on 
> > > > ENCLU
> > > > with a specific (ignored) prefix pattern?  I.e. effectively make the 
> > > > magic
> > > > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP 
> > > > so
> > > > that the enclave can EEXIT to immediately after the EENTER location.
> > > >
> > >
> > > How does that even work, though?  On an AEX, RIP points to the ERESUME
> > > instruction, not the EENTER instruction, so if we skip it we just end
> > > up in lala land.
> >
> > Userspace would obviously need to be aware of the fixup behavior, but
> > it actually works out fairly nicely to have a separate path for ERESUME
> > fixup since a fault on EENTER is generally fatal, whereas as a fault on
> > ERESUME might be recoverable.
> >
> 
> Hmm.
> 
> >
> > do_eenter:
> > mov tcs, %rbx
> > lea async_exit, %rcx
> > mov $EENTER, %rax
> > ENCLU
> 
> Or SOME_SILLY_PREFIX ENCLU?

Yeah, forgot to include that.

> >
> > /*
> >  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
> >  * fault indicator, e.g. -EFAULT.
> >  */
> > eexit_or_eenter_fault:
> > ret
> 
> But userspace wants to know whether it was a fault or not.  So I think
> we either need two landing pads or we need to hijack a flag bit (are
> there any known-zeroed flag bits after EEXIT?) to say whether it was a
> fault.  And, if it was a fault, we should give the vector, the
> sanitized error code, and possibly CR2.

As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we
can use RAX to indicate a fault.  That's what I was trying to imply with
EFAULT.  Here's the reg stuffing I use for the POC:

regs->ax = EFAULT;
regs->di = trapnr;
regs->si = error_code;
regs->dx = address;


Well-known RAX values also means the kernel fault handlers only need 

Re: RFC: userspace exception fixups

2018-11-06 Thread Jethro Beekman

On 2018-11-07 02:17, Andy Lutomirski wrote:

On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
 wrote:


/*
  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
  * fault indicator, e.g. -EFAULT.
  */
eexit_or_eenter_fault:
 ret


But userspace wants to know whether it was a fault or not.  So I think
we either need two landing pads or we need to hijack a flag bit (are
there any known-zeroed flag bits after EEXIT?) to say whether it was a
fault.  And, if it was a fault, we should give the vector, the
sanitized error code, and possibly CR2.


On AEX, %rax will contain ENCLU_LEAF_ERESUME (0x3). On EEXIT, %rax will 
contain ENCLU_LEAF_EEXIT (0x4).


--
Jethro Beekman | Fortanix



smime.p7s
Description: S/MIME Cryptographic Signature


Re: RFC: userspace exception fixups

2018-11-06 Thread Jethro Beekman

On 2018-11-07 02:17, Andy Lutomirski wrote:

On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
 wrote:


/*
  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
  * fault indicator, e.g. -EFAULT.
  */
eexit_or_eenter_fault:
 ret


But userspace wants to know whether it was a fault or not.  So I think
we either need two landing pads or we need to hijack a flag bit (are
there any known-zeroed flag bits after EEXIT?) to say whether it was a
fault.  And, if it was a fault, we should give the vector, the
sanitized error code, and possibly CR2.


On AEX, %rax will contain ENCLU_LEAF_ERESUME (0x3). On EEXIT, %rax will 
contain ENCLU_LEAF_EEXIT (0x4).


--
Jethro Beekman | Fortanix



smime.p7s
Description: S/MIME Cryptographic Signature


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
 wrote:
>
> On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
> >  wrote:
> > >
> > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> > > >
> > > >
> > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > > > >>  wrote:
> > > > >>
> > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > > > >> Sean, how does the current SDK AEX handler decide whether to do
> > > > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It 
> > > > >> seems
> > > > >> like the *CPU* could give a big hint, but I don't see where there is
> > > > >> any architectural indication of why the AEX code got called or any
> > > > >> obvious way for the user code to know whether the exit was fixed up 
> > > > >> by
> > > > >> the kernel?
> > > > >
> > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > > > > bit misleading because its signal handler may muck with the context's
> > > > > RIP, e.g. to abort the enclave on a fatal fault.
> > > > >
> > > > > On an event/exception from within an enclave, the event is immediately
> > > > > delivered after loading synthetic state and changing RIP to the AEP.
> > > > > In other words, jamming CPU state is essentially a bunch of vectoring
> > > > > ucode preamble, but from software's perspective it's a normal event
> > > > > that happens to point at the AEP instead of somewhere in the enclave.
> > > > > And because the signals the SDK cares about are all synchronous, the
> > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault 
> > > > > logic
> > > > > resides in its signal handler.  IRQs and whatnot simply trampoline 
> > > > > back
> > > > > into the enclave.
> > > > >
> > > > > Userspace can do something funky instead of ERESUME, but only *after*
> > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > > > case, after the trap handler has run.
> > > > >
> > > > > Jumping back a bit, how much do we care about preventing userspace
> > > > > from doing stupid things?
> > > >
> > > > My general feeling is that userspace should be allowed to do apparently
> > > > stupid things. For example, as far as the kernel is concerned, Wine and
> > > > DOSEMU are just user programs that do stupid things. Linux generally 
> > > > tries
> > > > to provide a reasonably complete view of architectural behavior. This is
> > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE 
> > > > May
> > > > cause very odd behavior indeed. So magic fixups that do 
> > > > non-architectural
> > > > things are not so great.
> > >
> > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
> > > with a specific (ignored) prefix pattern?  I.e. effectively make the magic
> > > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
> > > that the enclave can EEXIT to immediately after the EENTER location.
> > >
> >
> > How does that even work, though?  On an AEX, RIP points to the ERESUME
> > instruction, not the EENTER instruction, so if we skip it we just end
> > up in lala land.
>
> Userspace would obviously need to be aware of the fixup behavior, but
> it actually works out fairly nicely to have a separate path for ERESUME
> fixup since a fault on EENTER is generally fatal, whereas as a fault on
> ERESUME might be recoverable.
>

Hmm.

>
> do_eenter:
> mov tcs, %rbx
> lea async_exit, %rcx
> mov $EENTER, %rax
> ENCLU

Or SOME_SILLY_PREFIX ENCLU?

>
> /*
>  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
>  * fault indicator, e.g. -EFAULT.
>  */
> eexit_or_eenter_fault:
> ret

But userspace wants to know whether it was a fault or not.  So I think
we either need two landing pads or we need to hijack a flag bit (are
there any known-zeroed flag bits after EEXIT?) to say whether it was a
fault.  And, if it was a fault, we should give the vector, the
sanitized error code, and possibly CR2.

>
> async_exit:
> ENCLU

Same prefix here, right?

>
> fixup_handler:
> 

This whole thing is a bit odd, but not necessarily a terrible idea.

>
> > How averse would everyone be to making enclave entry be a syscall?
> > The user code would do sys_sgx_enter_enclave(), and the kernel would
> > stash away the register state (vm86()-style), point RIP to the vDSO's
> > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and
> > SYSRET.  The trap handlers would understand what's going on and
> > restore register state accordingly.
>
> Wouldn't that blast away any stack changes made by the enclave?

Yes, but I was imagining that it would stash the registers into the
struct host_state thing I made up :)


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
 wrote:
>
> On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
> >  wrote:
> > >
> > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> > > >
> > > >
> > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > > > >>  wrote:
> > > > >>
> > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > > > >> Sean, how does the current SDK AEX handler decide whether to do
> > > > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It 
> > > > >> seems
> > > > >> like the *CPU* could give a big hint, but I don't see where there is
> > > > >> any architectural indication of why the AEX code got called or any
> > > > >> obvious way for the user code to know whether the exit was fixed up 
> > > > >> by
> > > > >> the kernel?
> > > > >
> > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > > > > bit misleading because its signal handler may muck with the context's
> > > > > RIP, e.g. to abort the enclave on a fatal fault.
> > > > >
> > > > > On an event/exception from within an enclave, the event is immediately
> > > > > delivered after loading synthetic state and changing RIP to the AEP.
> > > > > In other words, jamming CPU state is essentially a bunch of vectoring
> > > > > ucode preamble, but from software's perspective it's a normal event
> > > > > that happens to point at the AEP instead of somewhere in the enclave.
> > > > > And because the signals the SDK cares about are all synchronous, the
> > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault 
> > > > > logic
> > > > > resides in its signal handler.  IRQs and whatnot simply trampoline 
> > > > > back
> > > > > into the enclave.
> > > > >
> > > > > Userspace can do something funky instead of ERESUME, but only *after*
> > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > > > case, after the trap handler has run.
> > > > >
> > > > > Jumping back a bit, how much do we care about preventing userspace
> > > > > from doing stupid things?
> > > >
> > > > My general feeling is that userspace should be allowed to do apparently
> > > > stupid things. For example, as far as the kernel is concerned, Wine and
> > > > DOSEMU are just user programs that do stupid things. Linux generally 
> > > > tries
> > > > to provide a reasonably complete view of architectural behavior. This is
> > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE 
> > > > May
> > > > cause very odd behavior indeed. So magic fixups that do 
> > > > non-architectural
> > > > things are not so great.
> > >
> > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
> > > with a specific (ignored) prefix pattern?  I.e. effectively make the magic
> > > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
> > > that the enclave can EEXIT to immediately after the EENTER location.
> > >
> >
> > How does that even work, though?  On an AEX, RIP points to the ERESUME
> > instruction, not the EENTER instruction, so if we skip it we just end
> > up in lala land.
>
> Userspace would obviously need to be aware of the fixup behavior, but
> it actually works out fairly nicely to have a separate path for ERESUME
> fixup since a fault on EENTER is generally fatal, whereas as a fault on
> ERESUME might be recoverable.
>

Hmm.

>
> do_eenter:
> mov tcs, %rbx
> lea async_exit, %rcx
> mov $EENTER, %rax
> ENCLU

Or SOME_SILLY_PREFIX ENCLU?

>
> /*
>  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
>  * fault indicator, e.g. -EFAULT.
>  */
> eexit_or_eenter_fault:
> ret

But userspace wants to know whether it was a fault or not.  So I think
we either need two landing pads or we need to hijack a flag bit (are
there any known-zeroed flag bits after EEXIT?) to say whether it was a
fault.  And, if it was a fault, we should give the vector, the
sanitized error code, and possibly CR2.

>
> async_exit:
> ENCLU

Same prefix here, right?

>
> fixup_handler:
> 

This whole thing is a bit odd, but not necessarily a terrible idea.

>
> > How averse would everyone be to making enclave entry be a syscall?
> > The user code would do sys_sgx_enter_enclave(), and the kernel would
> > stash away the register state (vm86()-style), point RIP to the vDSO's
> > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and
> > SYSRET.  The trap handlers would understand what's going on and
> > restore register state accordingly.
>
> Wouldn't that blast away any stack changes made by the enclave?

Yes, but I was imagining that it would stash the registers into the
struct host_state thing I made up :)


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
>  wrote:
> >
> > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> > >
> > >
> > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > > >>  wrote:
> > > >>
> > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > > >> Sean, how does the current SDK AEX handler decide whether to do
> > > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> > > >> like the *CPU* could give a big hint, but I don't see where there is
> > > >> any architectural indication of why the AEX code got called or any
> > > >> obvious way for the user code to know whether the exit was fixed up by
> > > >> the kernel?
> > > >
> > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > > > bit misleading because its signal handler may muck with the context's
> > > > RIP, e.g. to abort the enclave on a fatal fault.
> > > >
> > > > On an event/exception from within an enclave, the event is immediately
> > > > delivered after loading synthetic state and changing RIP to the AEP.
> > > > In other words, jamming CPU state is essentially a bunch of vectoring
> > > > ucode preamble, but from software's perspective it's a normal event
> > > > that happens to point at the AEP instead of somewhere in the enclave.
> > > > And because the signals the SDK cares about are all synchronous, the
> > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> > > > resides in its signal handler.  IRQs and whatnot simply trampoline back
> > > > into the enclave.
> > > >
> > > > Userspace can do something funky instead of ERESUME, but only *after*
> > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > > case, after the trap handler has run.
> > > >
> > > > Jumping back a bit, how much do we care about preventing userspace
> > > > from doing stupid things?
> > >
> > > My general feeling is that userspace should be allowed to do apparently
> > > stupid things. For example, as far as the kernel is concerned, Wine and
> > > DOSEMU are just user programs that do stupid things. Linux generally tries
> > > to provide a reasonably complete view of architectural behavior. This is
> > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May
> > > cause very odd behavior indeed. So magic fixups that do non-architectural
> > > things are not so great.
> >
> > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
> > with a specific (ignored) prefix pattern?  I.e. effectively make the magic
> > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
> > that the enclave can EEXIT to immediately after the EENTER location.
> >
> 
> How does that even work, though?  On an AEX, RIP points to the ERESUME
> instruction, not the EENTER instruction, so if we skip it we just end
> up in lala land.

Userspace would obviously need to be aware of the fixup behavior, but
it actually works out fairly nicely to have a separate path for ERESUME
fixup since a fault on EENTER is generally fatal, whereas as a fault on
ERESUME might be recoverable.


do_eenter:
mov tcs, %rbx
lea async_exit, %rcx 
mov $EENTER, %rax
ENCLU

/*
 * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
 * fault indicator, e.g. -EFAULT.
 */
eexit_or_eenter_fault:
ret

async_exit:
ENCLU

fixup_handler:

 
> How averse would everyone be to making enclave entry be a syscall?
> The user code would do sys_sgx_enter_enclave(), and the kernel would
> stash away the register state (vm86()-style), point RIP to the vDSO's
> ENCLU instruction, point RCX to another vDSO ENCLU instruction, and
> SYSRET.  The trap handlers would understand what's going on and
> restore register state accordingly.

Wouldn't that blast away any stack changes made by the enclave?


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
>  wrote:
> >
> > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> > >
> > >
> > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > > >>  wrote:
> > > >>
> > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > > >> Sean, how does the current SDK AEX handler decide whether to do
> > > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> > > >> like the *CPU* could give a big hint, but I don't see where there is
> > > >> any architectural indication of why the AEX code got called or any
> > > >> obvious way for the user code to know whether the exit was fixed up by
> > > >> the kernel?
> > > >
> > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > > > bit misleading because its signal handler may muck with the context's
> > > > RIP, e.g. to abort the enclave on a fatal fault.
> > > >
> > > > On an event/exception from within an enclave, the event is immediately
> > > > delivered after loading synthetic state and changing RIP to the AEP.
> > > > In other words, jamming CPU state is essentially a bunch of vectoring
> > > > ucode preamble, but from software's perspective it's a normal event
> > > > that happens to point at the AEP instead of somewhere in the enclave.
> > > > And because the signals the SDK cares about are all synchronous, the
> > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> > > > resides in its signal handler.  IRQs and whatnot simply trampoline back
> > > > into the enclave.
> > > >
> > > > Userspace can do something funky instead of ERESUME, but only *after*
> > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > > case, after the trap handler has run.
> > > >
> > > > Jumping back a bit, how much do we care about preventing userspace
> > > > from doing stupid things?
> > >
> > > My general feeling is that userspace should be allowed to do apparently
> > > stupid things. For example, as far as the kernel is concerned, Wine and
> > > DOSEMU are just user programs that do stupid things. Linux generally tries
> > > to provide a reasonably complete view of architectural behavior. This is
> > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May
> > > cause very odd behavior indeed. So magic fixups that do non-architectural
> > > things are not so great.
> >
> > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
> > with a specific (ignored) prefix pattern?  I.e. effectively make the magic
> > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
> > that the enclave can EEXIT to immediately after the EENTER location.
> >
> 
> How does that even work, though?  On an AEX, RIP points to the ERESUME
> instruction, not the EENTER instruction, so if we skip it we just end
> up in lala land.

Userspace would obviously need to be aware of the fixup behavior, but
it actually works out fairly nicely to have a separate path for ERESUME
fixup since a fault on EENTER is generally fatal, whereas as a fault on
ERESUME might be recoverable.


do_eenter:
mov tcs, %rbx
lea async_exit, %rcx 
mov $EENTER, %rax
ENCLU

/*
 * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
 * fault indicator, e.g. -EFAULT.
 */
eexit_or_eenter_fault:
ret

async_exit:
ENCLU

fixup_handler:

 
> How averse would everyone be to making enclave entry be a syscall?
> The user code would do sys_sgx_enter_enclave(), and the kernel would
> stash away the register state (vm86()-style), point RIP to the vDSO's
> ENCLU instruction, point RCX to another vDSO ENCLU instruction, and
> SYSRET.  The trap handlers would understand what's going on and
> restore register state accordingly.

Wouldn't that blast away any stack changes made by the enclave?


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
 wrote:
>
> On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> >
> >
> > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > >>  wrote:
> > >>
> > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > >> Sean, how does the current SDK AEX handler decide whether to do
> > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> > >> like the *CPU* could give a big hint, but I don't see where there is
> > >> any architectural indication of why the AEX code got called or any
> > >> obvious way for the user code to know whether the exit was fixed up by
> > >> the kernel?
> > >
> > > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > > bit misleading because its signal handler may muck with the context's
> > > RIP, e.g. to abort the enclave on a fatal fault.
> > >
> > > On an event/exception from within an enclave, the event is immediately
> > > delivered after loading synthetic state and changing RIP to the AEP.
> > > In other words, jamming CPU state is essentially a bunch of vectoring
> > > ucode preamble, but from software's perspective it's a normal event
> > > that happens to point at the AEP instead of somewhere in the enclave.
> > > And because the signals the SDK cares about are all synchronous, the
> > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> > > resides in its signal handler.  IRQs and whatnot simply trampoline back
> > > into the enclave.
> > >
> > > Userspace can do something funky instead of ERESUME, but only *after*
> > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > case, after the trap handler has run.
> > >
> > > Jumping back a bit, how much do we care about preventing userspace
> > > from doing stupid things?
> >
> > My general feeling is that userspace should be allowed to do apparently
> > stupid things. For example, as far as the kernel is concerned, Wine and
> > DOSEMU are just user programs that do stupid things. Linux generally tries
> > to provide a reasonably complete view of architectural behavior. This is
> > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May
> > cause very odd behavior indeed. So magic fixups that do non-architectural
> > things are not so great.
>
> Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
> with a specific (ignored) prefix pattern?  I.e. effectively make the magic
> fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
> that the enclave can EEXIT to immediately after the EENTER location.
>

How does that even work, though?  On an AEX, RIP points to the ERESUME
instruction, not the EENTER instruction, so if we skip it we just end
up in lala land.

How averse would everyone be to making enclave entry be a syscall?
The user code would do sys_sgx_enter_enclave(), and the kernel would
stash away the register state (vm86()-style), point RIP to the vDSO's
ENCLU instruction, point RCX to another vDSO ENCLU instruction, and
SYSRET.  The trap handlers would understand what's going on and
restore register state accordingly.

On non-Meltdown hardware (hah!) this would even be fairly fast.

--Andy


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
 wrote:
>
> On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> >
> >
> > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> > >>  wrote:
> > >>
> > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > >> Sean, how does the current SDK AEX handler decide whether to do
> > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> > >> like the *CPU* could give a big hint, but I don't see where there is
> > >> any architectural indication of why the AEX code got called or any
> > >> obvious way for the user code to know whether the exit was fixed up by
> > >> the kernel?
> > >
> > > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > > bit misleading because its signal handler may muck with the context's
> > > RIP, e.g. to abort the enclave on a fatal fault.
> > >
> > > On an event/exception from within an enclave, the event is immediately
> > > delivered after loading synthetic state and changing RIP to the AEP.
> > > In other words, jamming CPU state is essentially a bunch of vectoring
> > > ucode preamble, but from software's perspective it's a normal event
> > > that happens to point at the AEP instead of somewhere in the enclave.
> > > And because the signals the SDK cares about are all synchronous, the
> > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> > > resides in its signal handler.  IRQs and whatnot simply trampoline back
> > > into the enclave.
> > >
> > > Userspace can do something funky instead of ERESUME, but only *after*
> > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > case, after the trap handler has run.
> > >
> > > Jumping back a bit, how much do we care about preventing userspace
> > > from doing stupid things?
> >
> > My general feeling is that userspace should be allowed to do apparently
> > stupid things. For example, as far as the kernel is concerned, Wine and
> > DOSEMU are just user programs that do stupid things. Linux generally tries
> > to provide a reasonably complete view of architectural behavior. This is
> > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May
> > cause very odd behavior indeed. So magic fixups that do non-architectural
> > things are not so great.
>
> Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
> with a specific (ignored) prefix pattern?  I.e. effectively make the magic
> fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
> that the enclave can EEXIT to immediately after the EENTER location.
>

How does that even work, though?  On an AEX, RIP points to the ERESUME
instruction, not the EENTER instruction, so if we skip it we just end
up in lala land.

How averse would everyone be to making enclave entry be a syscall?
The user code would do sys_sgx_enter_enclave(), and the kernel would
stash away the register state (vm86()-style), point RIP to the vDSO's
ENCLU instruction, point RCX to another vDSO ENCLU instruction, and
SYSRET.  The trap handlers would understand what's going on and
restore register state accordingly.

On non-Meltdown hardware (hah!) this would even be fairly fast.

--Andy


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> 
> 
> >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> >>  wrote:
> >> 
> >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> >> Sean, how does the current SDK AEX handler decide whether to do
> >> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> >> like the *CPU* could give a big hint, but I don't see where there is
> >> any architectural indication of why the AEX code got called or any
> >> obvious way for the user code to know whether the exit was fixed up by
> >> the kernel?
> > 
> > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > bit misleading because its signal handler may muck with the context's
> > RIP, e.g. to abort the enclave on a fatal fault.
> > 
> > On an event/exception from within an enclave, the event is immediately
> > delivered after loading synthetic state and changing RIP to the AEP.
> > In other words, jamming CPU state is essentially a bunch of vectoring
> > ucode preamble, but from software's perspective it's a normal event
> > that happens to point at the AEP instead of somewhere in the enclave.
> > And because the signals the SDK cares about are all synchronous, the
> > SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> > resides in its signal handler.  IRQs and whatnot simply trampoline back
> > into the enclave.
> > 
> > Userspace can do something funky instead of ERESUME, but only *after*
> > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > case, after the trap handler has run.
> > 
> > Jumping back a bit, how much do we care about preventing userspace
> > from doing stupid things? 
> 
> My general feeling is that userspace should be allowed to do apparently
> stupid things. For example, as far as the kernel is concerned, Wine and
> DOSEMU are just user programs that do stupid things. Linux generally tries
> to provide a reasonably complete view of architectural behavior. This is
> in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May
> cause very odd behavior indeed. So magic fixups that do non-architectural
> things are not so great.

Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
with a specific (ignored) prefix pattern?  I.e. effectively make the magic
fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
that the enclave can EEXIT to immediately after the EENTER location.

> (How does the Windows case work?  If there’s an exception after the untrusted
> stack allocation and before EEXIT and SEH tries to handle it, how does the
> unwinder figure out where to start?)

No clue, I'll ask and report back.


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> 
> 
> >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
> >>  wrote:
> >> 
> >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> >> Sean, how does the current SDK AEX handler decide whether to do
> >> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> >> like the *CPU* could give a big hint, but I don't see where there is
> >> any architectural indication of why the AEX code got called or any
> >> obvious way for the user code to know whether the exit was fixed up by
> >> the kernel?
> > 
> > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > bit misleading because its signal handler may muck with the context's
> > RIP, e.g. to abort the enclave on a fatal fault.
> > 
> > On an event/exception from within an enclave, the event is immediately
> > delivered after loading synthetic state and changing RIP to the AEP.
> > In other words, jamming CPU state is essentially a bunch of vectoring
> > ucode preamble, but from software's perspective it's a normal event
> > that happens to point at the AEP instead of somewhere in the enclave.
> > And because the signals the SDK cares about are all synchronous, the
> > SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> > resides in its signal handler.  IRQs and whatnot simply trampoline back
> > into the enclave.
> > 
> > Userspace can do something funky instead of ERESUME, but only *after*
> > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > case, after the trap handler has run.
> > 
> > Jumping back a bit, how much do we care about preventing userspace
> > from doing stupid things? 
> 
> My general feeling is that userspace should be allowed to do apparently
> stupid things. For example, as far as the kernel is concerned, Wine and
> DOSEMU are just user programs that do stupid things. Linux generally tries
> to provide a reasonably complete view of architectural behavior. This is
> in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May
> cause very odd behavior indeed. So magic fixups that do non-architectural
> things are not so great.

Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
with a specific (ignored) prefix pattern?  I.e. effectively make the magic
fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
that the enclave can EEXIT to immediately after the EENTER location.

> (How does the Windows case work?  If there’s an exception after the untrusted
> stack allocation and before EEXIT and SEH tries to handle it, how does the
> unwinder figure out where to start?)

No clue, I'll ask and report back.


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
> > >
> > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > I almost feel like the right solution is to call into SGX on its own
> > > > private stack or maybe even its own private address space.
> > >
> > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> > > enclave like its own "thread" with its own stack and its own set of
> > > registers and context?  That seems like a much more workable model than
> > > trying to weave it together with the EENTER context.
> > 
> > So maybe the API should be, roughly
> > 
> > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > host_state *state);
> > sgx_exit_reason_t sgx_resume_enclave(same args);
> > 
> > where host_state is something like:
> > 
> > struct host_state {
> >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > };
> > 
> > and the values in host_state explicitly have nothing to do with the
> > actual host registers.  So, if you want to use the outcall mechanism,
> > you'd allocate some memory, point sp to that memory, call
> > sgx_enter_enclave(), and then read that memory to do the outcall.
> > 
> > Actually implementing this would be distinctly nontrivial, and would
> > almost certainly need some degree of kernel help to avoid an explosion
> > when a signal gets delivered while we have host_state.sp loaded into
> > the actual SP register.  Maybe rseq could help with this?
> > 
> > The ISA here is IMO not well thought through.
> 
> Maybe I'm mistaken about some fundamentals here, but my understanding
> of SGX is that the whole point is that the host application and the
> code running in the enclave are mutually adversarial towards one
> another. Do any or all of the proposed protocols here account for this
> and fully protect the host application from malicious code in the
> enclave? It seems that having control over the register file on exit
> from the enclave is fundamentally problematic but I assume there must
> be some way I'm missing that this is fixed up.

SGX provides protections for the enclave but not the other way around.
The kernel has all of its normal non-SGX protections in place, but the
enclave can certainly wreak havoc on its userspace process.  The basic
design idea is that the enclave is a specialized .so that gets extra
security protections but is still effectively part of the overall
application, e.g. it has full access to its host userspace process'
virtual memory.


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote:
> On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
> > >
> > > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > > I almost feel like the right solution is to call into SGX on its own
> > > > private stack or maybe even its own private address space.
> > >
> > > Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> > > enclave like its own "thread" with its own stack and its own set of
> > > registers and context?  That seems like a much more workable model than
> > > trying to weave it together with the EENTER context.
> > 
> > So maybe the API should be, roughly
> > 
> > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> > host_state *state);
> > sgx_exit_reason_t sgx_resume_enclave(same args);
> > 
> > where host_state is something like:
> > 
> > struct host_state {
> >   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> > };
> > 
> > and the values in host_state explicitly have nothing to do with the
> > actual host registers.  So, if you want to use the outcall mechanism,
> > you'd allocate some memory, point sp to that memory, call
> > sgx_enter_enclave(), and then read that memory to do the outcall.
> > 
> > Actually implementing this would be distinctly nontrivial, and would
> > almost certainly need some degree of kernel help to avoid an explosion
> > when a signal gets delivered while we have host_state.sp loaded into
> > the actual SP register.  Maybe rseq could help with this?
> > 
> > The ISA here is IMO not well thought through.
> 
> Maybe I'm mistaken about some fundamentals here, but my understanding
> of SGX is that the whole point is that the host application and the
> code running in the enclave are mutually adversarial towards one
> another. Do any or all of the proposed protocols here account for this
> and fully protect the host application from malicious code in the
> enclave? It seems that having control over the register file on exit
> from the enclave is fundamentally problematic but I assume there must
> be some way I'm missing that this is fixed up.

SGX provides protections for the enclave but not the other way around.
The kernel has all of its normal non-SGX protections in place, but the
enclave can certainly wreak havoc on its userspace process.  The basic
design idea is that the enclave is a specialized .so that gets extra
security protections but is still effectively part of the overall
application, e.g. it has full access to its host userspace process'
virtual memory.


Re: RFC: userspace exception fixups

2018-11-06 Thread Rich Felker
On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
> >
> > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > I almost feel like the right solution is to call into SGX on its own
> > > private stack or maybe even its own private address space.
> >
> > Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> > enclave like its own "thread" with its own stack and its own set of
> > registers and context?  That seems like a much more workable model than
> > trying to weave it together with the EENTER context.
> 
> So maybe the API should be, roughly
> 
> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> host_state *state);
> sgx_exit_reason_t sgx_resume_enclave(same args);
> 
> where host_state is something like:
> 
> struct host_state {
>   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> };
> 
> and the values in host_state explicitly have nothing to do with the
> actual host registers.  So, if you want to use the outcall mechanism,
> you'd allocate some memory, point sp to that memory, call
> sgx_enter_enclave(), and then read that memory to do the outcall.
> 
> Actually implementing this would be distinctly nontrivial, and would
> almost certainly need some degree of kernel help to avoid an explosion
> when a signal gets delivered while we have host_state.sp loaded into
> the actual SP register.  Maybe rseq could help with this?
> 
> The ISA here is IMO not well thought through.

Maybe I'm mistaken about some fundamentals here, but my understanding
of SGX is that the whole point is that the host application and the
code running in the enclave are mutually adversarial towards one
another. Do any or all of the proposed protocols here account for this
and fully protect the host application from malicious code in the
enclave? It seems that having control over the register file on exit
from the enclave is fundamentally problematic but I assume there must
be some way I'm missing that this is fixed up.

Rich


Re: RFC: userspace exception fixups

2018-11-06 Thread Rich Felker
On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
> >
> > On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > > I almost feel like the right solution is to call into SGX on its own
> > > private stack or maybe even its own private address space.
> >
> > Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> > enclave like its own "thread" with its own stack and its own set of
> > registers and context?  That seems like a much more workable model than
> > trying to weave it together with the EENTER context.
> 
> So maybe the API should be, roughly
> 
> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> host_state *state);
> sgx_exit_reason_t sgx_resume_enclave(same args);
> 
> where host_state is something like:
> 
> struct host_state {
>   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> };
> 
> and the values in host_state explicitly have nothing to do with the
> actual host registers.  So, if you want to use the outcall mechanism,
> you'd allocate some memory, point sp to that memory, call
> sgx_enter_enclave(), and then read that memory to do the outcall.
> 
> Actually implementing this would be distinctly nontrivial, and would
> almost certainly need some degree of kernel help to avoid an explosion
> when a signal gets delivered while we have host_state.sp loaded into
> the actual SP register.  Maybe rseq could help with this?
> 
> The ISA here is IMO not well thought through.

Maybe I'm mistaken about some fundamentals here, but my understanding
of SGX is that the whole point is that the host application and the
code running in the enclave are mutually adversarial towards one
another. Do any or all of the proposed protocols here account for this
and fully protect the host application from malicious code in the
enclave? It seems that having control over the register file on exit
from the enclave is fundamentally problematic but I assume there must
be some way I'm missing that this is fixed up.

Rich


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski



>> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
>>  wrote:
>> 
>>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
 On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski  wrote:
 
 
> On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> 
> 
> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> True, but what if we have a nasty enclave that writes to memory just
> below SP *before* decrementing SP?
 Yeah, that would be unfortunate.  If an enclave did this (roughly):
 
1. EENTER
2. Hardware sets eenter_hwframe->sp = %sp
3. Enclave runs... wants to do out-call
4. Enclave sets up parameters:
memcpy(_hwframe->sp[-offset], arg1, size);
...
5. Enclave sets eenter_hwframe->sp -= offset
 
 If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
 was on the stack.  The enclave could easily fix this by moving ->sp first.
 
 But, this is one of those "fun" parts of the ABI that I think we need to
 talk about.  If we do this, we also basically require that the code
 which handles asynchronous exits must *not* write to the stack.  That's
 not hard because it's typically just a single ERESUME instruction, but
 it *is* a requirement.
>>> I was assuming that the async exit stuff was completely hidden by the API. 
>>> The AEP code would decide whether the exit got fixed up by the kernel 
>>> (which may or may not be easy to tell — can the
>>> code even tell without kernel help whether it was, say, an IRQ vs #UD?) and 
>>> then either do ERESUME or cause sgx_enter_enclave() to return with an 
>>> appropriate return value.
>> Sean, how does the current SDK AEX handler decide whether to do
>> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
>> like the *CPU* could give a big hint, but I don't see where there is
>> any architectural indication of why the AEX code got called or any
>> obvious way for the user code to know whether the exit was fixed up by
>> the kernel?
> 
> The SDK "unconditionally" does ERESUME at the AEP location, but that's
> bit misleading because its signal handler may muck with the context's
> RIP, e.g. to abort the enclave on a fatal fault.
> 
> On an event/exception from within an enclave, the event is immediately
> delivered after loading synthetic state and changing RIP to the AEP.
> In other words, jamming CPU state is essentially a bunch of vectoring
> ucode preamble, but from software's perspective it's a normal event
> that happens to point at the AEP instead of somewhere in the enclave.
> And because the signals the SDK cares about are all synchronous, the
> SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> resides in its signal handler.  IRQs and whatnot simply trampoline back
> into the enclave.
> 
> Userspace can do something funky instead of ERESUME, but only *after*
> IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> case, after the trap handler has run.
> 
> Jumping back a bit, how much do we care about preventing userspace
> from doing stupid things? 

My general feeling is that userspace should be allowed to do apparently stupid 
things. For example, as far as the kernel is concerned, Wine and DOSEMU are 
just user programs that do stupid things. Linux generally tries to provide a 
reasonably complete view of architectural behavior. This is in contrast to, 
say, Windows, where IIUC doing an unapproved WRFSBASE May cause very odd 
behavior indeed. So magic fixups that do non-architectural things are not so 
great.

The flip side, of course, is that the architecture is arguably inherently 
erratic here, and it’s apparently impossible to have an SGX library with sane 
semantics without some kernel assistance.

So if we can make my straw man API work, perhaps with vDSO or rseq-like help, 
then the official SDK can use it, but less well behaved programs can still 
mostly work.  (Modulo Linux’s non-support for EINITTOKEN, of course.)

Thinking about it some more, the major sticking point may be finding the RIP 
and stack frame of EENTER in the AEP code or in its fixup. The vDSO can’t use 
TLS without serious hackery.  We could massively abuse WRFSBASE, but that’s 
really ugly.

(How does the Windows case work?  If there’s an exception after the untrusted 
stack allocation and before EEXIT and SEH tries to handle it, how does the 
unwinder figure out where to start?)

>  I did a quick POC on the idea of hardcoding
> fixup for the ENCLU opcode, and the basic idea checks out.  The code
> is fairly minimal and doesn't impact the core functionality of the SDK.
> They'd need to redo their trap handling to move it from the signal
> handler to inline, but their stack shenanigans won't be any more broken
> than they already are.


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski



>> On Nov 6, 2018, at 1:59 PM, Sean Christopherson 
>>  wrote:
>> 
>>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
 On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski  wrote:
 
 
> On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> 
> 
> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> True, but what if we have a nasty enclave that writes to memory just
> below SP *before* decrementing SP?
 Yeah, that would be unfortunate.  If an enclave did this (roughly):
 
1. EENTER
2. Hardware sets eenter_hwframe->sp = %sp
3. Enclave runs... wants to do out-call
4. Enclave sets up parameters:
memcpy(_hwframe->sp[-offset], arg1, size);
...
5. Enclave sets eenter_hwframe->sp -= offset
 
 If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
 was on the stack.  The enclave could easily fix this by moving ->sp first.
 
 But, this is one of those "fun" parts of the ABI that I think we need to
 talk about.  If we do this, we also basically require that the code
 which handles asynchronous exits must *not* write to the stack.  That's
 not hard because it's typically just a single ERESUME instruction, but
 it *is* a requirement.
>>> I was assuming that the async exit stuff was completely hidden by the API. 
>>> The AEP code would decide whether the exit got fixed up by the kernel 
>>> (which may or may not be easy to tell — can the
>>> code even tell without kernel help whether it was, say, an IRQ vs #UD?) and 
>>> then either do ERESUME or cause sgx_enter_enclave() to return with an 
>>> appropriate return value.
>> Sean, how does the current SDK AEX handler decide whether to do
>> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
>> like the *CPU* could give a big hint, but I don't see where there is
>> any architectural indication of why the AEX code got called or any
>> obvious way for the user code to know whether the exit was fixed up by
>> the kernel?
> 
> The SDK "unconditionally" does ERESUME at the AEP location, but that's
> bit misleading because its signal handler may muck with the context's
> RIP, e.g. to abort the enclave on a fatal fault.
> 
> On an event/exception from within an enclave, the event is immediately
> delivered after loading synthetic state and changing RIP to the AEP.
> In other words, jamming CPU state is essentially a bunch of vectoring
> ucode preamble, but from software's perspective it's a normal event
> that happens to point at the AEP instead of somewhere in the enclave.
> And because the signals the SDK cares about are all synchronous, the
> SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> resides in its signal handler.  IRQs and whatnot simply trampoline back
> into the enclave.
> 
> Userspace can do something funky instead of ERESUME, but only *after*
> IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> case, after the trap handler has run.
> 
> Jumping back a bit, how much do we care about preventing userspace
> from doing stupid things? 

My general feeling is that userspace should be allowed to do apparently stupid 
things. For example, as far as the kernel is concerned, Wine and DOSEMU are 
just user programs that do stupid things. Linux generally tries to provide a 
reasonably complete view of architectural behavior. This is in contrast to, 
say, Windows, where IIUC doing an unapproved WRFSBASE May cause very odd 
behavior indeed. So magic fixups that do non-architectural things are not so 
great.

The flip side, of course, is that the architecture is arguably inherently 
erratic here, and it’s apparently impossible to have an SGX library with sane 
semantics without some kernel assistance.

So if we can make my straw man API work, perhaps with vDSO or rseq-like help, 
then the official SDK can use it, but less well behaved programs can still 
mostly work.  (Modulo Linux’s non-support for EINITTOKEN, of course.)

Thinking about it some more, the major sticking point may be finding the RIP 
and stack frame of EENTER in the AEP code or in its fixup. The vDSO can’t use 
TLS without serious hackery.  We could massively abuse WRFSBASE, but that’s 
really ugly.

(How does the Windows case work?  If there’s an exception after the untrusted 
stack allocation and before EEXIT and SEH tries to handle it, how does the 
unwinder figure out where to start?)

>  I did a quick POC on the idea of hardcoding
> fixup for the ENCLU opcode, and the basic idea checks out.  The code
> is fairly minimal and doesn't impact the core functionality of the SDK.
> They'd need to redo their trap handling to move it from the signal
> handler to inline, but their stack shenanigans won't be any more broken
> than they already are.


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski  wrote:
> > 
> > > 
> > > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> > > 
> > > > 
> > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> > > > True, but what if we have a nasty enclave that writes to memory just
> > > > below SP *before* decrementing SP?
> > > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > > 
> > >    1. EENTER
> > >    2. Hardware sets eenter_hwframe->sp = %sp
> > >    3. Enclave runs... wants to do out-call
> > >    4. Enclave sets up parameters:
> > >    memcpy(_hwframe->sp[-offset], arg1, size);
> > >    ...
> > >    5. Enclave sets eenter_hwframe->sp -= offset
> > > 
> > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > > 
> > > But, this is one of those "fun" parts of the ABI that I think we need to
> > > talk about.  If we do this, we also basically require that the code
> > > which handles asynchronous exits must *not* write to the stack.  That's
> > > not hard because it's typically just a single ERESUME instruction, but
> > > it *is* a requirement.
> > > 
> > I was assuming that the async exit stuff was completely hidden by the API. 
> > The AEP code would decide whether the exit got fixed up by the kernel 
> > (which may or may not be easy to tell — can the
> > code even tell without kernel help whether it was, say, an IRQ vs #UD?) and 
> > then either do ERESUME or cause sgx_enter_enclave() to return with an 
> > appropriate return value.
> > 
> > 
> Sean, how does the current SDK AEX handler decide whether to do
> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> like the *CPU* could give a big hint, but I don't see where there is
> any architectural indication of why the AEX code got called or any
> obvious way for the user code to know whether the exit was fixed up by
> the kernel?

The SDK "unconditionally" does ERESUME at the AEP location, but that's
bit misleading because its signal handler may muck with the context's
RIP, e.g. to abort the enclave on a fatal fault.

On an event/exception from within an enclave, the event is immediately
delivered after loading synthetic state and changing RIP to the AEP.
In other words, jamming CPU state is essentially a bunch of vectoring
ucode preamble, but from software's perspective it's a normal event
that happens to point at the AEP instead of somewhere in the enclave.
And because the signals the SDK cares about are all synchronous, the
SDK can simply hardcode ERESUME at the AEP since all of the fault logic
resides in its signal handler.  IRQs and whatnot simply trampoline back
into the enclave.

Userspace can do something funky instead of ERESUME, but only *after*
IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
case, after the trap handler has run.

Jumping back a bit, how much do we care about preventing userspace
from doing stupid things?  I did a quick POC on the idea of hardcoding
fixup for the ENCLU opcode, and the basic idea checks out.  The code
is fairly minimal and doesn't impact the core functionality of the SDK.
They'd need to redo their trap handling to move it from the signal
handler to inline, but their stack shenanigans won't be any more broken
than they already are.


Re: RFC: userspace exception fixups

2018-11-06 Thread Sean Christopherson
On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski  wrote:
> > 
> > > 
> > > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> > > 
> > > > 
> > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> > > > True, but what if we have a nasty enclave that writes to memory just
> > > > below SP *before* decrementing SP?
> > > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > > 
> > >    1. EENTER
> > >    2. Hardware sets eenter_hwframe->sp = %sp
> > >    3. Enclave runs... wants to do out-call
> > >    4. Enclave sets up parameters:
> > >    memcpy(_hwframe->sp[-offset], arg1, size);
> > >    ...
> > >    5. Enclave sets eenter_hwframe->sp -= offset
> > > 
> > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > > 
> > > But, this is one of those "fun" parts of the ABI that I think we need to
> > > talk about.  If we do this, we also basically require that the code
> > > which handles asynchronous exits must *not* write to the stack.  That's
> > > not hard because it's typically just a single ERESUME instruction, but
> > > it *is* a requirement.
> > > 
> > I was assuming that the async exit stuff was completely hidden by the API. 
> > The AEP code would decide whether the exit got fixed up by the kernel 
> > (which may or may not be easy to tell — can the
> > code even tell without kernel help whether it was, say, an IRQ vs #UD?) and 
> > then either do ERESUME or cause sgx_enter_enclave() to return with an 
> > appropriate return value.
> > 
> > 
> Sean, how does the current SDK AEX handler decide whether to do
> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> like the *CPU* could give a big hint, but I don't see where there is
> any architectural indication of why the AEX code got called or any
> obvious way for the user code to know whether the exit was fixed up by
> the kernel?

The SDK "unconditionally" does ERESUME at the AEP location, but that's
bit misleading because its signal handler may muck with the context's
RIP, e.g. to abort the enclave on a fatal fault.

On an event/exception from within an enclave, the event is immediately
delivered after loading synthetic state and changing RIP to the AEP.
In other words, jamming CPU state is essentially a bunch of vectoring
ucode preamble, but from software's perspective it's a normal event
that happens to point at the AEP instead of somewhere in the enclave.
And because the signals the SDK cares about are all synchronous, the
SDK can simply hardcode ERESUME at the AEP since all of the fault logic
resides in its signal handler.  IRQs and whatnot simply trampoline back
into the enclave.

Userspace can do something funky instead of ERESUME, but only *after*
IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
case, after the trap handler has run.

Jumping back a bit, how much do we care about preventing userspace
from doing stupid things?  I did a quick POC on the idea of hardcoding
fixup for the ENCLU opcode, and the basic idea checks out.  The code
is fairly minimal and doesn't impact the core functionality of the SDK.
They'd need to redo their trap handling to move it from the signal
handler to inline, but their stack shenanigans won't be any more broken
than they already are.


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski  wrote:
>
>
>
> > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> >
> >> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> >> True, but what if we have a nasty enclave that writes to memory just
> >> below SP *before* decrementing SP?
> >
> > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> >
> >1. EENTER
> >2. Hardware sets eenter_hwframe->sp = %sp
> >3. Enclave runs... wants to do out-call
> >4. Enclave sets up parameters:
> >memcpy(_hwframe->sp[-offset], arg1, size);
> >...
> >5. Enclave sets eenter_hwframe->sp -= offset
> >
> > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > was on the stack.  The enclave could easily fix this by moving ->sp first.
> >
> > But, this is one of those "fun" parts of the ABI that I think we need to
> > talk about.  If we do this, we also basically require that the code
> > which handles asynchronous exits must *not* write to the stack.  That's
> > not hard because it's typically just a single ERESUME instruction, but
> > it *is* a requirement.
> >
>
> I was assuming that the async exit stuff was completely hidden by the API. 
> The AEP code would decide whether the exit got fixed up by the kernel (which 
> may or may not be easy to tell — can the code even tell without kernel help 
> whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause 
> sgx_enter_enclave() to return with an appropriate return value.
>
>

Sean, how does the current SDK AEX handler decide whether to do
EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
like the *CPU* could give a big hint, but I don't see where there is
any architectural indication of why the AEX code got called or any
obvious way for the user code to know whether the exit was fixed up by
the kernel?


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski  wrote:
>
>
>
> > On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> >
> >> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> >> True, but what if we have a nasty enclave that writes to memory just
> >> below SP *before* decrementing SP?
> >
> > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> >
> >1. EENTER
> >2. Hardware sets eenter_hwframe->sp = %sp
> >3. Enclave runs... wants to do out-call
> >4. Enclave sets up parameters:
> >memcpy(_hwframe->sp[-offset], arg1, size);
> >...
> >5. Enclave sets eenter_hwframe->sp -= offset
> >
> > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > was on the stack.  The enclave could easily fix this by moving ->sp first.
> >
> > But, this is one of those "fun" parts of the ABI that I think we need to
> > talk about.  If we do this, we also basically require that the code
> > which handles asynchronous exits must *not* write to the stack.  That's
> > not hard because it's typically just a single ERESUME instruction, but
> > it *is* a requirement.
> >
>
> I was assuming that the async exit stuff was completely hidden by the API. 
> The AEP code would decide whether the exit got fixed up by the kernel (which 
> may or may not be easy to tell — can the code even tell without kernel help 
> whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause 
> sgx_enter_enclave() to return with an appropriate return value.
>
>

Sean, how does the current SDK AEX handler decide whether to do
EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
like the *CPU* could give a big hint, but I don't see where there is
any architectural indication of why the AEX code got called or any
obvious way for the user code to know whether the exit was fixed up by
the kernel?


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski



> On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> 
>> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
>> True, but what if we have a nasty enclave that writes to memory just
>> below SP *before* decrementing SP?
> 
> Yeah, that would be unfortunate.  If an enclave did this (roughly):
> 
>1. EENTER
>2. Hardware sets eenter_hwframe->sp = %sp
>3. Enclave runs... wants to do out-call
>4. Enclave sets up parameters:
>memcpy(_hwframe->sp[-offset], arg1, size);
>...
>5. Enclave sets eenter_hwframe->sp -= offset
> 
> If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> was on the stack.  The enclave could easily fix this by moving ->sp first.
> 
> But, this is one of those "fun" parts of the ABI that I think we need to
> talk about.  If we do this, we also basically require that the code
> which handles asynchronous exits must *not* write to the stack.  That's
> not hard because it's typically just a single ERESUME instruction, but
> it *is* a requirement.
> 

I was assuming that the async exit stuff was completely hidden by the API. The 
AEP code would decide whether the exit got fixed up by the kernel (which may or 
may not be easy to tell — can the code even tell without kernel help whether it 
was, say, an IRQ vs #UD?) and then either do ERESUME or cause 
sgx_enter_enclave() to return with an appropriate return value.




Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski



> On Nov 6, 2018, at 1:00 PM, Dave Hansen  wrote:
> 
>> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
>> True, but what if we have a nasty enclave that writes to memory just
>> below SP *before* decrementing SP?
> 
> Yeah, that would be unfortunate.  If an enclave did this (roughly):
> 
>1. EENTER
>2. Hardware sets eenter_hwframe->sp = %sp
>3. Enclave runs... wants to do out-call
>4. Enclave sets up parameters:
>memcpy(_hwframe->sp[-offset], arg1, size);
>...
>5. Enclave sets eenter_hwframe->sp -= offset
> 
> If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> was on the stack.  The enclave could easily fix this by moving ->sp first.
> 
> But, this is one of those "fun" parts of the ABI that I think we need to
> talk about.  If we do this, we also basically require that the code
> which handles asynchronous exits must *not* write to the stack.  That's
> not hard because it's typically just a single ERESUME instruction, but
> it *is* a requirement.
> 

I was assuming that the async exit stuff was completely hidden by the API. The 
AEP code would decide whether the exit got fixed up by the kernel (which may or 
may not be easy to tell — can the code even tell without kernel help whether it 
was, say, an IRQ vs #UD?) and then either do ERESUME or cause 
sgx_enter_enclave() to return with an appropriate return value.




Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> True, but what if we have a nasty enclave that writes to memory just
> below SP *before* decrementing SP?

Yeah, that would be unfortunate.  If an enclave did this (roughly):

1. EENTER
2. Hardware sets eenter_hwframe->sp = %sp
3. Enclave runs... wants to do out-call
4. Enclave sets up parameters:
memcpy(_hwframe->sp[-offset], arg1, size);
...
5. Enclave sets eenter_hwframe->sp -= offset

If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
was on the stack.  The enclave could easily fix this by moving ->sp first.

But, this is one of those "fun" parts of the ABI that I think we need to
talk about.  If we do this, we also basically require that the code
which handles asynchronous exits must *not* write to the stack.  That's
not hard because it's typically just a single ERESUME instruction, but
it *is* a requirement.

It means fun stuff like that you absolutely can't just async-exit to C code.


Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> True, but what if we have a nasty enclave that writes to memory just
> below SP *before* decrementing SP?

Yeah, that would be unfortunate.  If an enclave did this (roughly):

1. EENTER
2. Hardware sets eenter_hwframe->sp = %sp
3. Enclave runs... wants to do out-call
4. Enclave sets up parameters:
memcpy(_hwframe->sp[-offset], arg1, size);
...
5. Enclave sets eenter_hwframe->sp -= offset

If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
was on the stack.  The enclave could easily fix this by moving ->sp first.

But, this is one of those "fun" parts of the ABI that I think we need to
talk about.  If we do this, we also basically require that the code
which handles asynchronous exits must *not* write to the stack.  That's
not hard because it's typically just a single ERESUME instruction, but
it *is* a requirement.

It means fun stuff like that you absolutely can't just async-exit to C code.


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski



> On Nov 6, 2018, at 11:22 AM, Dave Hansen  wrote:
> 
>> On 11/6/18 11:02 AM, Andy Lutomirski wrote:
>>> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
>>> 
 On 11/6/18 10:20 AM, Andy Lutomirski wrote:
 I almost feel like the right solution is to call into SGX on its own
 private stack or maybe even its own private address space.
>>> 
>>> Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
>>> enclave like its own "thread" with its own stack and its own set of
>>> registers and context?  That seems like a much more workable model than
>>> trying to weave it together with the EENTER context.
>> 
>> So maybe the API should be, roughly
>> 
>> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
>> host_state *state);
>> sgx_exit_reason_t sgx_resume_enclave(same args);
>> 
>> where host_state is something like:
>> 
>> struct host_state {
>>  unsigned long bp, sp, ax, bx, cx, dx, si, di;
>> };
>> 
>> and the values in host_state explicitly have nothing to do with the
>> actual host registers.  So, if you want to use the outcall mechanism,
>> you'd allocate some memory, point sp to that memory, call
>> sgx_enter_enclave(), and then read that memory to do the outcall.
> 
> Ah, so instead of the enclave rudely "hijacking" the EENTER context, we
> have it nicely return and nicely _hint_ to the calling context what it
> would like to do.  Then, the EENTER context can make a controlled
> transition over to the requested context.

Exactly. And existing enclaves keep working — their rudeness is just magically 
translated into a hint!

> 
>> Actually implementing this would be distinctly nontrivial, and would
>> almost certainly need some degree of kernel help to avoid an explosion
>> when a signal gets delivered while we have host_state.sp loaded into
>> the actual SP register.  Maybe rseq could help with this?
> 
> As long as the memory pointed to by host_state.sp is valid and can hold
> the signal frame (grows down without clobbering anything), what goes
> boom?  The signal handling would push a signal frame and call the
> handler.  It would have a shallow-looking stack, but the handler could
> just do its normal business and return from the signal where the frame
> would get popped and continue with %rsp=host_state.sp, blissfully
> unaware of the signal ever having happened.

True, but what if we have a nasty enclave that writes to memory just below SP 
*before* decrementing SP?

I suspect that rseq really can be used for this with only minimal-ish 
modifications.  Or we could stick this in the vDSO with some appropriate fixups 
in the kernel.

Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski



> On Nov 6, 2018, at 11:22 AM, Dave Hansen  wrote:
> 
>> On 11/6/18 11:02 AM, Andy Lutomirski wrote:
>>> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
>>> 
 On 11/6/18 10:20 AM, Andy Lutomirski wrote:
 I almost feel like the right solution is to call into SGX on its own
 private stack or maybe even its own private address space.
>>> 
>>> Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
>>> enclave like its own "thread" with its own stack and its own set of
>>> registers and context?  That seems like a much more workable model than
>>> trying to weave it together with the EENTER context.
>> 
>> So maybe the API should be, roughly
>> 
>> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
>> host_state *state);
>> sgx_exit_reason_t sgx_resume_enclave(same args);
>> 
>> where host_state is something like:
>> 
>> struct host_state {
>>  unsigned long bp, sp, ax, bx, cx, dx, si, di;
>> };
>> 
>> and the values in host_state explicitly have nothing to do with the
>> actual host registers.  So, if you want to use the outcall mechanism,
>> you'd allocate some memory, point sp to that memory, call
>> sgx_enter_enclave(), and then read that memory to do the outcall.
> 
> Ah, so instead of the enclave rudely "hijacking" the EENTER context, we
> have it nicely return and nicely _hint_ to the calling context what it
> would like to do.  Then, the EENTER context can make a controlled
> transition over to the requested context.

Exactly. And existing enclaves keep working — their rudeness is just magically 
translated into a hint!

> 
>> Actually implementing this would be distinctly nontrivial, and would
>> almost certainly need some degree of kernel help to avoid an explosion
>> when a signal gets delivered while we have host_state.sp loaded into
>> the actual SP register.  Maybe rseq could help with this?
> 
> As long as the memory pointed to by host_state.sp is valid and can hold
> the signal frame (grows down without clobbering anything), what goes
> boom?  The signal handling would push a signal frame and call the
> handler.  It would have a shallow-looking stack, but the handler could
> just do its normal business and return from the signal where the frame
> would get popped and continue with %rsp=host_state.sp, blissfully
> unaware of the signal ever having happened.

True, but what if we have a nasty enclave that writes to memory just below SP 
*before* decrementing SP?

I suspect that rseq really can be used for this with only minimal-ish 
modifications.  Or we could stick this in the vDSO with some appropriate fixups 
in the kernel.

Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 11:02 AM, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
>>
>> On 11/6/18 10:20 AM, Andy Lutomirski wrote:
>>> I almost feel like the right solution is to call into SGX on its own
>>> private stack or maybe even its own private address space.
>>
>> Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
>> enclave like its own "thread" with its own stack and its own set of
>> registers and context?  That seems like a much more workable model than
>> trying to weave it together with the EENTER context.
> 
> So maybe the API should be, roughly
> 
> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> host_state *state);
> sgx_exit_reason_t sgx_resume_enclave(same args);
> 
> where host_state is something like:
> 
> struct host_state {
>   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> };
> 
> and the values in host_state explicitly have nothing to do with the
> actual host registers.  So, if you want to use the outcall mechanism,
> you'd allocate some memory, point sp to that memory, call
> sgx_enter_enclave(), and then read that memory to do the outcall.

Ah, so instead of the enclave rudely "hijacking" the EENTER context, we
have it nicely return and nicely _hint_ to the calling context what it
would like to do.  Then, the EENTER context can make a controlled
transition over to the requested context.

> Actually implementing this would be distinctly nontrivial, and would
> almost certainly need some degree of kernel help to avoid an explosion
> when a signal gets delivered while we have host_state.sp loaded into
> the actual SP register.  Maybe rseq could help with this?

As long as the memory pointed to by host_state.sp is valid and can hold
the signal frame (grows down without clobbering anything), what goes
boom?  The signal handling would push a signal frame and call the
handler.  It would have a shallow-looking stack, but the handler could
just do its normal business and return from the signal where the frame
would get popped and continue with %rsp=host_state.sp, blissfully
unaware of the signal ever having happened.


Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 11:02 AM, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
>>
>> On 11/6/18 10:20 AM, Andy Lutomirski wrote:
>>> I almost feel like the right solution is to call into SGX on its own
>>> private stack or maybe even its own private address space.
>>
>> Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
>> enclave like its own "thread" with its own stack and its own set of
>> registers and context?  That seems like a much more workable model than
>> trying to weave it together with the EENTER context.
> 
> So maybe the API should be, roughly
> 
> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
> host_state *state);
> sgx_exit_reason_t sgx_resume_enclave(same args);
> 
> where host_state is something like:
> 
> struct host_state {
>   unsigned long bp, sp, ax, bx, cx, dx, si, di;
> };
> 
> and the values in host_state explicitly have nothing to do with the
> actual host registers.  So, if you want to use the outcall mechanism,
> you'd allocate some memory, point sp to that memory, call
> sgx_enter_enclave(), and then read that memory to do the outcall.

Ah, so instead of the enclave rudely "hijacking" the EENTER context, we
have it nicely return and nicely _hint_ to the calling context what it
would like to do.  Then, the EENTER context can make a controlled
transition over to the requested context.

> Actually implementing this would be distinctly nontrivial, and would
> almost certainly need some degree of kernel help to avoid an explosion
> when a signal gets delivered while we have host_state.sp loaded into
> the actual SP register.  Maybe rseq could help with this?

As long as the memory pointed to by host_state.sp is valid and can hold
the signal frame (grows down without clobbering anything), what goes
boom?  The signal handling would push a signal frame and call the
handler.  It would have a shallow-looking stack, but the handler could
just do its normal business and return from the signal where the frame
would get popped and continue with %rsp=host_state.sp, blissfully
unaware of the signal ever having happened.


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
>
> On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > I almost feel like the right solution is to call into SGX on its own
> > private stack or maybe even its own private address space.
>
> Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> enclave like its own "thread" with its own stack and its own set of
> registers and context?  That seems like a much more workable model than
> trying to weave it together with the EENTER context.

So maybe the API should be, roughly

sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
host_state *state);
sgx_exit_reason_t sgx_resume_enclave(same args);

where host_state is something like:

struct host_state {
  unsigned long bp, sp, ax, bx, cx, dx, si, di;
};

and the values in host_state explicitly have nothing to do with the
actual host registers.  So, if you want to use the outcall mechanism,
you'd allocate some memory, point sp to that memory, call
sgx_enter_enclave(), and then read that memory to do the outcall.

Actually implementing this would be distinctly nontrivial, and would
almost certainly need some degree of kernel help to avoid an explosion
when a signal gets delivered while we have host_state.sp loaded into
the actual SP register.  Maybe rseq could help with this?

The ISA here is IMO not well thought through.


Re: RFC: userspace exception fixups

2018-11-06 Thread Andy Lutomirski
On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen  wrote:
>
> On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> > I almost feel like the right solution is to call into SGX on its own
> > private stack or maybe even its own private address space.
>
> Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
> enclave like its own "thread" with its own stack and its own set of
> registers and context?  That seems like a much more workable model than
> trying to weave it together with the EENTER context.

So maybe the API should be, roughly

sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct
host_state *state);
sgx_exit_reason_t sgx_resume_enclave(same args);

where host_state is something like:

struct host_state {
  unsigned long bp, sp, ax, bx, cx, dx, si, di;
};

and the values in host_state explicitly have nothing to do with the
actual host registers.  So, if you want to use the outcall mechanism,
you'd allocate some memory, point sp to that memory, call
sgx_enter_enclave(), and then read that memory to do the outcall.

Actually implementing this would be distinctly nontrivial, and would
almost certainly need some degree of kernel help to avoid an explosion
when a signal gets delivered while we have host_state.sp loaded into
the actual SP register.  Maybe rseq could help with this?

The ISA here is IMO not well thought through.


Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> I almost feel like the right solution is to call into SGX on its own
> private stack or maybe even its own private address space.

Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
enclave like its own "thread" with its own stack and its own set of
registers and context?  That seems like a much more workable model than
trying to weave it together with the EENTER context.


Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 10:20 AM, Andy Lutomirski wrote:
> I almost feel like the right solution is to call into SGX on its own
> private stack or maybe even its own private address space.

Yeah, I had the same gut feeling.  Couldn't the debugger even treat the
enclave like its own "thread" with its own stack and its own set of
registers and context?  That seems like a much more workable model than
trying to weave it together with the EENTER context.


  1   2   >