Re: RFC: userspace exception fixups
On Mon, Nov 26, 2018 at 06:35:34AM -0800, Sean Christopherson wrote: > And how would you determine the #UD is related to SGX? Hardware doesn't > provide any indication that a #UD (or any other fault) is related to SGX > or occurred in an enclave. The only fault that is special-cased in a > non-virtualized environment is #PF signaled by the EPCM, which gets the > PF_SGX bit set in the error code. Could you not detect #UD from address where it happened? Kernel knows where enclaves are mapped. BTW, how does Intel run-time emulate opcodes currently? Anyway, I've fully discarded the whole idea because implementing single stepping w/o well defined AEP handler is nasty. I think vDSO's are the only viable path that at least I'm aware off... /Jarkko
Re: RFC: userspace exception fixups
On Mon, Nov 26, 2018 at 06:35:34AM -0800, Sean Christopherson wrote: > And how would you determine the #UD is related to SGX? Hardware doesn't > provide any indication that a #UD (or any other fault) is related to SGX > or occurred in an enclave. The only fault that is special-cased in a > non-virtualized environment is #PF signaled by the EPCM, which gets the > PF_SGX bit set in the error code. Could you not detect #UD from address where it happened? Kernel knows where enclaves are mapped. BTW, how does Intel run-time emulate opcodes currently? Anyway, I've fully discarded the whole idea because implementing single stepping w/o well defined AEP handler is nasty. I think vDSO's are the only viable path that at least I'm aware off... /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote: > On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote: > > Jarkko, can you please explain you solution in detail? The CPU receives an > > exception. This will be handled by the kernel exception handler. What > > information does the kernel exception handler use to determine whether to > > deliver the exception as a regular signal to the process, or whether to set > > the special registers values for userspace and just continue executing the > > process manually? > > Now we throw SIGSEGV when PF_SGX set, right? In my solution that would > be turned just doing iret to AEP with the extra that three registers get > exception data (type, reason, addr). No decoding or RIP adjusting > involved. > > That would mean that you would actually have to implement AEP handler > than just have enclu there. > > I've also proposed that perhaps for SGX also #UD should be propagated > this way because for some instructions you need outside help to emulate > "non-enclave" environment. And how would you determine the #UD is related to SGX? Hardware doesn't provide any indication that a #UD (or any other fault) is related to SGX or occurred in an enclave. The only fault that is special-cased in a non-virtualized environment is #PF signaled by the EPCM, which gets the PF_SGX bit set in the error code. > That is all I have drafted together so far. I'll try to finish v18 this > week with other stuff and refine further next week (unless someone gives > obvious reason why this doesn't work, which might well be because I > haven't went too deep with my analysis yet because of lack of time). > > /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote: > On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote: > > Jarkko, can you please explain you solution in detail? The CPU receives an > > exception. This will be handled by the kernel exception handler. What > > information does the kernel exception handler use to determine whether to > > deliver the exception as a regular signal to the process, or whether to set > > the special registers values for userspace and just continue executing the > > process manually? > > Now we throw SIGSEGV when PF_SGX set, right? In my solution that would > be turned just doing iret to AEP with the extra that three registers get > exception data (type, reason, addr). No decoding or RIP adjusting > involved. > > That would mean that you would actually have to implement AEP handler > than just have enclu there. > > I've also proposed that perhaps for SGX also #UD should be propagated > this way because for some instructions you need outside help to emulate > "non-enclave" environment. And how would you determine the #UD is related to SGX? Hardware doesn't provide any indication that a #UD (or any other fault) is related to SGX or occurred in an enclave. The only fault that is special-cased in a non-virtualized environment is #PF signaled by the EPCM, which gets the PF_SGX bit set in the error code. > That is all I have drafted together so far. I'll try to finish v18 this > week with other stuff and refine further next week (unless someone gives > obvious reason why this doesn't work, which might well be because I > haven't went too deep with my analysis yet because of lack of time). > > /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote: > On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote: > > Jarkko, can you please explain you solution in detail? The CPU receives an > > exception. This will be handled by the kernel exception handler. What > > information does the kernel exception handler use to determine whether to > > deliver the exception as a regular signal to the process, or whether to set > > the special registers values for userspace and just continue executing the > > process manually? > > Now we throw SIGSEGV when PF_SGX set, right? In my solution that would > be turned just doing iret to AEP with the extra that three registers get > exception data (type, reason, addr). No decoding or RIP adjusting > involved. > > That would mean that you would actually have to implement AEP handler > than just have enclu there. > > I've also proposed that perhaps for SGX also #UD should be propagated > this way because for some instructions you need outside help to emulate > "non-enclave" environment. > > That is all I have drafted together so far. I'll try to finish v18 this > week with other stuff and refine further next week (unless someone gives > obvious reason why this doesn't work, which might well be because I > haven't went too deep with my analysis yet because of lack of time). The obvious con in this approach is that if you single step the code, the whole AEP handler would single stepped also everytime. Probably big enough con that it is better to go with the vDSO approach anyhow... /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 21, 2018 at 05:17:34PM +0200, Jarkko Sakkinen wrote: > On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote: > > Jarkko, can you please explain you solution in detail? The CPU receives an > > exception. This will be handled by the kernel exception handler. What > > information does the kernel exception handler use to determine whether to > > deliver the exception as a regular signal to the process, or whether to set > > the special registers values for userspace and just continue executing the > > process manually? > > Now we throw SIGSEGV when PF_SGX set, right? In my solution that would > be turned just doing iret to AEP with the extra that three registers get > exception data (type, reason, addr). No decoding or RIP adjusting > involved. > > That would mean that you would actually have to implement AEP handler > than just have enclu there. > > I've also proposed that perhaps for SGX also #UD should be propagated > this way because for some instructions you need outside help to emulate > "non-enclave" environment. > > That is all I have drafted together so far. I'll try to finish v18 this > week with other stuff and refine further next week (unless someone gives > obvious reason why this doesn't work, which might well be because I > haven't went too deep with my analysis yet because of lack of time). The obvious con in this approach is that if you single step the code, the whole AEP handler would single stepped also everytime. Probably big enough con that it is better to go with the vDSO approach anyhow... /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote: > Jarkko, can you please explain you solution in detail? The CPU receives an > exception. This will be handled by the kernel exception handler. What > information does the kernel exception handler use to determine whether to > deliver the exception as a regular signal to the process, or whether to set > the special registers values for userspace and just continue executing the > process manually? Now we throw SIGSEGV when PF_SGX set, right? In my solution that would be turned just doing iret to AEP with the extra that three registers get exception data (type, reason, addr). No decoding or RIP adjusting involved. That would mean that you would actually have to implement AEP handler than just have enclu there. I've also proposed that perhaps for SGX also #UD should be propagated this way because for some instructions you need outside help to emulate "non-enclave" environment. That is all I have drafted together so far. I'll try to finish v18 this week with other stuff and refine further next week (unless someone gives obvious reason why this doesn't work, which might well be because I haven't went too deep with my analysis yet because of lack of time). /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 21, 2018 at 05:17:32AM +, Jethro Beekman wrote: > Jarkko, can you please explain you solution in detail? The CPU receives an > exception. This will be handled by the kernel exception handler. What > information does the kernel exception handler use to determine whether to > deliver the exception as a regular signal to the process, or whether to set > the special registers values for userspace and just continue executing the > process manually? Now we throw SIGSEGV when PF_SGX set, right? In my solution that would be turned just doing iret to AEP with the extra that three registers get exception data (type, reason, addr). No decoding or RIP adjusting involved. That would mean that you would actually have to implement AEP handler than just have enclu there. I've also proposed that perhaps for SGX also #UD should be propagated this way because for some instructions you need outside help to emulate "non-enclave" environment. That is all I have drafted together so far. I'll try to finish v18 this week with other stuff and refine further next week (unless someone gives obvious reason why this doesn't work, which might well be because I haven't went too deep with my analysis yet because of lack of time). /Jarkko
Re: RFC: userspace exception fixups
On 2018-11-21 04:25, Jarkko Sakkinen wrote: On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote: general by mucking with some regs and retrying -- that will infinite loop and confuse everyone. I'm not even 100% convinced that decoding the insn stream is useful -- AEP can point to something that isn't ENCLU. In my return-to-AEP approach to whole point was not to do any decoding but instead have something else always in the AEP handler than just ENCLU. No instruction decoding. No RIP manipulation. IOW the kernel needs to know *when* to apply this special behavior. Sadly there is no bit in the exception frame that says "came from SGX". Jarkko, can you please explain you solution in detail? The CPU receives an exception. This will be handled by the kernel exception handler. What information does the kernel exception handler use to determine whether to deliver the exception as a regular signal to the process, or whether to set the special registers values for userspace and just continue executing the process manually? -- Jethro Beekman | Fortanix smime.p7s Description: S/MIME Cryptographic Signature
Re: RFC: userspace exception fixups
On 2018-11-21 04:25, Jarkko Sakkinen wrote: On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote: general by mucking with some regs and retrying -- that will infinite loop and confuse everyone. I'm not even 100% convinced that decoding the insn stream is useful -- AEP can point to something that isn't ENCLU. In my return-to-AEP approach to whole point was not to do any decoding but instead have something else always in the AEP handler than just ENCLU. No instruction decoding. No RIP manipulation. IOW the kernel needs to know *when* to apply this special behavior. Sadly there is no bit in the exception frame that says "came from SGX". Jarkko, can you please explain you solution in detail? The CPU receives an exception. This will be handled by the kernel exception handler. What information does the kernel exception handler use to determine whether to deliver the exception as a regular signal to the process, or whether to set the special registers values for userspace and just continue executing the process manually? -- Jethro Beekman | Fortanix smime.p7s Description: S/MIME Cryptographic Signature
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote: > What is "#GP with EPCM"? We certainly don't want to react to #UD in A typo. Meant #PF with PF_SGX set i.e. EPCM conflict. > general by mucking with some regs and retrying -- that will infinite > loop and confuse everyone. I'm not even 100% convinced that decoding > the insn stream is useful -- AEP can point to something that isn't > ENCLU. In my return-to-AEP approach to whole point was not to do any decoding but instead have something else always in the AEP handler than just ENCLU. No instruction decoding. No RIP manipulation. > IOW the kernel needs to know *when* to apply this special behavior. > Sadly there is no bit in the exception frame that says "came from > SGX". /Jarkko
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 07:19:37AM -0800, Andy Lutomirski wrote: > What is "#GP with EPCM"? We certainly don't want to react to #UD in A typo. Meant #PF with PF_SGX set i.e. EPCM conflict. > general by mucking with some regs and retrying -- that will infinite > loop and confuse everyone. I'm not even 100% convinced that decoding > the insn stream is useful -- AEP can point to something that isn't > ENCLU. In my return-to-AEP approach to whole point was not to do any decoding but instead have something else always in the AEP handler than just ENCLU. No instruction decoding. No RIP manipulation. > IOW the kernel needs to know *when* to apply this special behavior. > Sadly there is no bit in the exception frame that says "came from > SGX". /Jarkko
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > > wrote: > > > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > > Decoding the instruction stream and doing it to all exceptions that > > > > hit an ENCLU instruction seems like a poor design. > > > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > > would work the same way as for exceptions that the kernel can deal with > > > except filling the exception information to registers. > > > > Sure, but how does the kernel know when to do that and when to send a > > signal? I don't really like decoding the instruction stream to figure > > it out. > > Hmm... why you have to decode instruction stream to find that out? Would > just depend on exception type (#GP with EPCM, #UD). Or are you saying > that kernel should need to SIGSEGV if there is in fact ENCLU so that > there is no infinite trap loop? Sorry, I'm a bit lost here that where > does this decoding requirement comes from in the first place. I > understand how it is used in Sean's proposal... > > Anyway, this option can be probably discarded without further > consideration because apparently single stepping can cause #DB SS fault > if AEP handler is anything else than a single instruction. > > For me it seems that by ruling out options, vDSO option is what is > left. I don't like it but at least it works... The section relevant in the SDM is 43.2.6 but I started to think that why in dumbed down return-to-AEP that would even be a problem? If you are single step debugging isn't that what you want? Continue single stepping in the AEP handler... I still don't understand the part where the need for decoding instruction stream comes in this dumbed down approach. There's not RIP manipulation or anything involved at all. With this reconsideration I would keep this as one option at least. /Jarkko
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > > wrote: > > > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > > Decoding the instruction stream and doing it to all exceptions that > > > > hit an ENCLU instruction seems like a poor design. > > > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > > would work the same way as for exceptions that the kernel can deal with > > > except filling the exception information to registers. > > > > Sure, but how does the kernel know when to do that and when to send a > > signal? I don't really like decoding the instruction stream to figure > > it out. > > Hmm... why you have to decode instruction stream to find that out? Would > just depend on exception type (#GP with EPCM, #UD). Or are you saying > that kernel should need to SIGSEGV if there is in fact ENCLU so that > there is no infinite trap loop? Sorry, I'm a bit lost here that where > does this decoding requirement comes from in the first place. I > understand how it is used in Sean's proposal... > > Anyway, this option can be probably discarded without further > consideration because apparently single stepping can cause #DB SS fault > if AEP handler is anything else than a single instruction. > > For me it seems that by ruling out options, vDSO option is what is > left. I don't like it but at least it works... The section relevant in the SDM is 43.2.6 but I started to think that why in dumbed down return-to-AEP that would even be a problem? If you are single step debugging isn't that what you want? Continue single stepping in the AEP handler... I still don't understand the part where the need for decoding instruction stream comes in this dumbed down approach. There's not RIP manipulation or anything involved at all. With this reconsideration I would keep this as one option at least. /Jarkko
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > > wrote: > > > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > > Decoding the instruction stream and doing it to all exceptions that > > > > hit an ENCLU instruction seems like a poor design. > > > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > > would work the same way as for exceptions that the kernel can deal with > > > except filling the exception information to registers. > > > > Sure, but how does the kernel know when to do that and when to send a > > signal? I don't really like decoding the instruction stream to figure > > it out. > > Hmm... why you have to decode instruction stream to find that out? Would > just depend on exception type (#GP with EPCM, #UD). #PF w/ PFEC_SGX is the only exception that indicates a fault is related to SGX. Theoretically we could avoid decoding by using a magic value for the AEP itself and doing even more magic fixup, but that wouldn't help for faults that occur on EENTER, which can be generic #GPs due to loss of EPC on SGX1 systems. > Or are you saying > that kernel should need to SIGSEGV if there is in fact ENCLU so that > there is no infinite trap loop? Sorry, I'm a bit lost here that where > does this decoding requirement comes from in the first place. I > understand how it is used in Sean's proposal... > > Anyway, this option can be probably discarded without further > consideration because apparently single stepping can cause #DB SS fault > if AEP handler is anything else than a single instruction. Not that it matters, but we could satisfy the "one instruction" requirement if the fixup changed RIP to point at an ENCLU for #DBs. > For me it seems that by ruling out options, vDSO option is what is > left. I don't like it but at least it works... > > /Jarkko
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 12:11:33PM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > > wrote: > > > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > > Decoding the instruction stream and doing it to all exceptions that > > > > hit an ENCLU instruction seems like a poor design. > > > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > > would work the same way as for exceptions that the kernel can deal with > > > except filling the exception information to registers. > > > > Sure, but how does the kernel know when to do that and when to send a > > signal? I don't really like decoding the instruction stream to figure > > it out. > > Hmm... why you have to decode instruction stream to find that out? Would > just depend on exception type (#GP with EPCM, #UD). #PF w/ PFEC_SGX is the only exception that indicates a fault is related to SGX. Theoretically we could avoid decoding by using a magic value for the AEP itself and doing even more magic fixup, but that wouldn't help for faults that occur on EENTER, which can be generic #GPs due to loss of EPC on SGX1 systems. > Or are you saying > that kernel should need to SIGSEGV if there is in fact ENCLU so that > there is no infinite trap loop? Sorry, I'm a bit lost here that where > does this decoding requirement comes from in the first place. I > understand how it is used in Sean's proposal... > > Anyway, this option can be probably discarded without further > consideration because apparently single stepping can cause #DB SS fault > if AEP handler is anything else than a single instruction. Not that it matters, but we could satisfy the "one instruction" requirement if the fixup changed RIP to point at an ENCLU for #DBs. > For me it seems that by ruling out options, vDSO option is what is > left. I don't like it but at least it works... > > /Jarkko
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 2:11 AM Jarkko Sakkinen wrote: > > On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > > wrote: > > > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > > Decoding the instruction stream and doing it to all exceptions that > > > > hit an ENCLU instruction seems like a poor design. > > > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > > would work the same way as for exceptions that the kernel can deal with > > > except filling the exception information to registers. > > > > Sure, but how does the kernel know when to do that and when to send a > > signal? I don't really like decoding the instruction stream to figure > > it out. > > Hmm... why you have to decode instruction stream to find that out? Would > just depend on exception type (#GP with EPCM, #UD). What is "#GP with EPCM"? We certainly don't want to react to #UD in general by mucking with some regs and retrying -- that will infinite loop and confuse everyone. I'm not even 100% convinced that decoding the insn stream is useful -- AEP can point to something that isn't ENCLU. IOW the kernel needs to know *when* to apply this special behavior. Sadly there is no bit in the exception frame that says "came from SGX".
Re: RFC: userspace exception fixups
On Tue, Nov 20, 2018 at 2:11 AM Jarkko Sakkinen wrote: > > On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > > wrote: > > > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > > Decoding the instruction stream and doing it to all exceptions that > > > > hit an ENCLU instruction seems like a poor design. > > > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > > would work the same way as for exceptions that the kernel can deal with > > > except filling the exception information to registers. > > > > Sure, but how does the kernel know when to do that and when to send a > > signal? I don't really like decoding the instruction stream to figure > > it out. > > Hmm... why you have to decode instruction stream to find that out? Would > just depend on exception type (#GP with EPCM, #UD). What is "#GP with EPCM"? We certainly don't want to react to #UD in general by mucking with some regs and retrying -- that will infinite loop and confuse everyone. I'm not even 100% convinced that decoding the insn stream is useful -- AEP can point to something that isn't ENCLU. IOW the kernel needs to know *when* to apply this special behavior. Sadly there is no bit in the exception frame that says "came from SGX".
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > wrote: > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > Decoding the instruction stream and doing it to all exceptions that > > > hit an ENCLU instruction seems like a poor design. > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > would work the same way as for exceptions that the kernel can deal with > > except filling the exception information to registers. > > Sure, but how does the kernel know when to do that and when to send a > signal? I don't really like decoding the instruction stream to figure > it out. Hmm... why you have to decode instruction stream to find that out? Would just depend on exception type (#GP with EPCM, #UD). Or are you saying that kernel should need to SIGSEGV if there is in fact ENCLU so that there is no infinite trap loop? Sorry, I'm a bit lost here that where does this decoding requirement comes from in the first place. I understand how it is used in Sean's proposal... Anyway, this option can be probably discarded without further consideration because apparently single stepping can cause #DB SS fault if AEP handler is anything else than a single instruction. For me it seems that by ruling out options, vDSO option is what is left. I don't like it but at least it works... /Jarkko
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 09:00:08AM -0800, Andy Lutomirski wrote: > On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen > wrote: > > > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > > 1. The kernel needs some way to know *when* to apply this fixup. > > > Decoding the instruction stream and doing it to all exceptions that > > > hit an ENCLU instruction seems like a poor design. > > > > I'm not sure why you would ever need to do any type of fixup as the idea > > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > > would work the same way as for exceptions that the kernel can deal with > > except filling the exception information to registers. > > Sure, but how does the kernel know when to do that and when to send a > signal? I don't really like decoding the instruction stream to figure > it out. Hmm... why you have to decode instruction stream to find that out? Would just depend on exception type (#GP with EPCM, #UD). Or are you saying that kernel should need to SIGSEGV if there is in fact ENCLU so that there is no infinite trap loop? Sorry, I'm a bit lost here that where does this decoding requirement comes from in the first place. I understand how it is used in Sean's proposal... Anyway, this option can be probably discarded without further consideration because apparently single stepping can cause #DB SS fault if AEP handler is anything else than a single instruction. For me it seems that by ruling out options, vDSO option is what is left. I don't like it but at least it works... /Jarkko
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen wrote: > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > 1. The kernel needs some way to know *when* to apply this fixup. > > Decoding the instruction stream and doing it to all exceptions that > > hit an ENCLU instruction seems like a poor design. > > I'm not sure why you would ever need to do any type of fixup as the idea > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > would work the same way as for exceptions that the kernel can deal with > except filling the exception information to registers. Sure, but how does the kernel know when to do that and when to send a signal? I don't really like decoding the instruction stream to figure it out.
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 8:02 AM Jarkko Sakkinen wrote: > > On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > > 1. The kernel needs some way to know *when* to apply this fixup. > > Decoding the instruction stream and doing it to all exceptions that > > hit an ENCLU instruction seems like a poor design. > > I'm not sure why you would ever need to do any type of fixup as the idea > is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP > would work the same way as for exceptions that the kernel can deal with > except filling the exception information to registers. Sure, but how does the kernel know when to do that and when to send a signal? I don't really like decoding the instruction stream to figure it out.
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > 1. The kernel needs some way to know *when* to apply this fixup. > Decoding the instruction stream and doing it to all exceptions that > hit an ENCLU instruction seems like a poor design. I'm not sure why you would ever need to do any type of fixup as the idea is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP would work the same way as for exceptions that the kernel can deal with except filling the exception information to registers. > 2. It starts exposing what looks like a more generic exception > handling mechanism to userspace, except that it's nonsensical for > anything other than ENCLU. Well, I see the user space and namely the run-time the host for the enclave i.e. middle-man to provide services for emulating instructions etc. /Jarkko
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 07:29:36AM -0800, Andy Lutomirski wrote: > 1. The kernel needs some way to know *when* to apply this fixup. > Decoding the instruction stream and doing it to all exceptions that > hit an ENCLU instruction seems like a poor design. I'm not sure why you would ever need to do any type of fixup as the idea is to just return to AEP i.e. from chosen exceptions (EPCM, #UD) the AEP would work the same way as for exceptions that the kernel can deal with except filling the exception information to registers. > 2. It starts exposing what looks like a more generic exception > handling mechanism to userspace, except that it's nonsensical for > anything other than ENCLU. Well, I see the user space and namely the run-time the host for the enclave i.e. middle-man to provide services for emulating instructions etc. /Jarkko
Re: RFC: userspace exception fixups
On Sat, Nov 17, 2018 at 11:16 PM Jarkko Sakkinen wrote: > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > Hi all- > > > > The people working on SGX enablement are grappling with a somewhat > > annoying issue: the x86 EENTER instruction is used from user code and > > can, as part of its normal-ish operation, raise an exception. It is > > also highly likely to be used from a library, and signal handling in > > libraries is unpleasant at best. > > > > There's been some discussion of adding a vDSO entry point to wrap > > EENTER and do something sensible with the exceptions, but I'm > > wondering if a more general mechanism would be helpful. > > I haven't really followed all of this discussion because I've been busy > working on the patch set but for me all of these approaches look awfully > complicated. > > I'll throw my own suggestion and apologize if this has been already > suggested and discarded: return-to-AEP. > > My idea is to do just a small extension to SGX AEX handling. At the > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > fill extend this by filling other three spare registers with exception > information. I have two issues with this approach: 1. The kernel needs some way to know *when* to apply this fixup. Decoding the instruction stream and doing it to all exceptions that hit an ENCLU instruction seems like a poor design. 2. It starts exposing what looks like a more generic exception handling mechanism to userspace, except that it's nonsensical for anything other than ENCLU.
Re: RFC: userspace exception fixups
On Sat, Nov 17, 2018 at 11:16 PM Jarkko Sakkinen wrote: > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > Hi all- > > > > The people working on SGX enablement are grappling with a somewhat > > annoying issue: the x86 EENTER instruction is used from user code and > > can, as part of its normal-ish operation, raise an exception. It is > > also highly likely to be used from a library, and signal handling in > > libraries is unpleasant at best. > > > > There's been some discussion of adding a vDSO entry point to wrap > > EENTER and do something sensible with the exceptions, but I'm > > wondering if a more general mechanism would be helpful. > > I haven't really followed all of this discussion because I've been busy > working on the patch set but for me all of these approaches look awfully > complicated. > > I'll throw my own suggestion and apologize if this has been already > suggested and discarded: return-to-AEP. > > My idea is to do just a small extension to SGX AEX handling. At the > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > fill extend this by filling other three spare registers with exception > information. I have two issues with this approach: 1. The kernel needs some way to know *when* to apply this fixup. Decoding the instruction stream and doing it to all exceptions that hit an ENCLU instruction seems like a poor design. 2. It starts exposing what looks like a more generic exception handling mechanism to userspace, except that it's nonsensical for anything other than ENCLU.
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 04:05:43PM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote: > > On 2018-11-18 18:32, Jarkko Sakkinen wrote: > > > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > > > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > > > > Hi all- > > > > > > > > > > The people working on SGX enablement are grappling with a somewhat > > > > > annoying issue: the x86 EENTER instruction is used from user code and > > > > > can, as part of its normal-ish operation, raise an exception. It is > > > > > also highly likely to be used from a library, and signal handling in > > > > > libraries is unpleasant at best. > > > > > > > > > > There's been some discussion of adding a vDSO entry point to wrap > > > > > EENTER and do something sensible with the exceptions, but I'm > > > > > wondering if a more general mechanism would be helpful. > > > > > > > > I haven't really followed all of this discussion because I've been busy > > > > working on the patch set but for me all of these approaches look awfully > > > > complicated. > > > > > > > > I'll throw my own suggestion and apologize if this has been already > > > > suggested and discarded: return-to-AEP. > > > > > > > > My idea is to do just a small extension to SGX AEX handling. At the > > > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > > > > fill extend this by filling other three spare registers with exception > > > > information. > > > > > > > > AEP handler can then do whatever it wants to do with this information > > > > or just do ERESUME. > > > > > > A correction here. In practice this will add a requirement to have a bit > > > more complicated AEP code (check the regs for exceptions) than before > > > and not just bytes for ENCLU. > > > > > > e.g. AEP handler should be along the lines > > > > > > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot > > > handle the exception and returns back to user space i.e. to the > > > AEP handler. > > > 2. Check the registers containing exception information. If they have > > > been filled, take whatever actions user space wants to take. > > > 3. Otherwise, just ERESUME. > > > > > > From my point of view this is making the AEP parameter useful. Its > > > standard use is just weird (always point to a place just containing > > > ENCLU bytes, why the heck it even exists). > > > > I like this solution. Keeps things simple. One question: when an exception > > occurs, how does the kernel know whether to set special registers or send a > > signal? > > Yes, and AFAIK people do in many cases people want to do something else > than just direct ERESUME in AEP handler so would neither be a major > bummer for user space. If I remember correctly you have such? > > You can check the cases that we have for SIGSEGV (namely EPCM conflict) > from Sean's patch 08/23. > > I'm open for expanding the scope. It is the easy part after there is > consensus for the handling mechanism :-) Not sure if it a good idea or not but maybe even have new ioctl in addition to the enclave construction ioctls that you use to specify per enclave what you want to get. SIGSEGV could be the fallback behavior if you do not "register" to any exceptions. /Jarkko
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 04:05:43PM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote: > > On 2018-11-18 18:32, Jarkko Sakkinen wrote: > > > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > > > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > > > > Hi all- > > > > > > > > > > The people working on SGX enablement are grappling with a somewhat > > > > > annoying issue: the x86 EENTER instruction is used from user code and > > > > > can, as part of its normal-ish operation, raise an exception. It is > > > > > also highly likely to be used from a library, and signal handling in > > > > > libraries is unpleasant at best. > > > > > > > > > > There's been some discussion of adding a vDSO entry point to wrap > > > > > EENTER and do something sensible with the exceptions, but I'm > > > > > wondering if a more general mechanism would be helpful. > > > > > > > > I haven't really followed all of this discussion because I've been busy > > > > working on the patch set but for me all of these approaches look awfully > > > > complicated. > > > > > > > > I'll throw my own suggestion and apologize if this has been already > > > > suggested and discarded: return-to-AEP. > > > > > > > > My idea is to do just a small extension to SGX AEX handling. At the > > > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > > > > fill extend this by filling other three spare registers with exception > > > > information. > > > > > > > > AEP handler can then do whatever it wants to do with this information > > > > or just do ERESUME. > > > > > > A correction here. In practice this will add a requirement to have a bit > > > more complicated AEP code (check the regs for exceptions) than before > > > and not just bytes for ENCLU. > > > > > > e.g. AEP handler should be along the lines > > > > > > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot > > > handle the exception and returns back to user space i.e. to the > > > AEP handler. > > > 2. Check the registers containing exception information. If they have > > > been filled, take whatever actions user space wants to take. > > > 3. Otherwise, just ERESUME. > > > > > > From my point of view this is making the AEP parameter useful. Its > > > standard use is just weird (always point to a place just containing > > > ENCLU bytes, why the heck it even exists). > > > > I like this solution. Keeps things simple. One question: when an exception > > occurs, how does the kernel know whether to set special registers or send a > > signal? > > Yes, and AFAIK people do in many cases people want to do something else > than just direct ERESUME in AEP handler so would neither be a major > bummer for user space. If I remember correctly you have such? > > You can check the cases that we have for SIGSEGV (namely EPCM conflict) > from Sean's patch 08/23. > > I'm open for expanding the scope. It is the easy part after there is > consensus for the handling mechanism :-) Not sure if it a good idea or not but maybe even have new ioctl in addition to the enclave construction ioctls that you use to specify per enclave what you want to get. SIGSEGV could be the fallback behavior if you do not "register" to any exceptions. /Jarkko
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote: > On 2018-11-18 18:32, Jarkko Sakkinen wrote: > > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > > > Hi all- > > > > > > > > The people working on SGX enablement are grappling with a somewhat > > > > annoying issue: the x86 EENTER instruction is used from user code and > > > > can, as part of its normal-ish operation, raise an exception. It is > > > > also highly likely to be used from a library, and signal handling in > > > > libraries is unpleasant at best. > > > > > > > > There's been some discussion of adding a vDSO entry point to wrap > > > > EENTER and do something sensible with the exceptions, but I'm > > > > wondering if a more general mechanism would be helpful. > > > > > > I haven't really followed all of this discussion because I've been busy > > > working on the patch set but for me all of these approaches look awfully > > > complicated. > > > > > > I'll throw my own suggestion and apologize if this has been already > > > suggested and discarded: return-to-AEP. > > > > > > My idea is to do just a small extension to SGX AEX handling. At the > > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > > > fill extend this by filling other three spare registers with exception > > > information. > > > > > > AEP handler can then do whatever it wants to do with this information > > > or just do ERESUME. > > > > A correction here. In practice this will add a requirement to have a bit > > more complicated AEP code (check the regs for exceptions) than before > > and not just bytes for ENCLU. > > > > e.g. AEP handler should be along the lines > > > > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot > > handle the exception and returns back to user space i.e. to the > > AEP handler. > > 2. Check the registers containing exception information. If they have > > been filled, take whatever actions user space wants to take. > > 3. Otherwise, just ERESUME. > > > > From my point of view this is making the AEP parameter useful. Its > > standard use is just weird (always point to a place just containing > > ENCLU bytes, why the heck it even exists). > > I like this solution. Keeps things simple. One question: when an exception > occurs, how does the kernel know whether to set special registers or send a > signal? Yes, and AFAIK people do in many cases people want to do something else than just direct ERESUME in AEP handler so would neither be a major bummer for user space. If I remember correctly you have such? You can check the cases that we have for SIGSEGV (namely EPCM conflict) from Sean's patch 08/23. I'm open for expanding the scope. It is the easy part after there is consensus for the handling mechanism :-) /Jarkko
Re: RFC: userspace exception fixups
On Mon, Nov 19, 2018 at 05:17:26AM +, Jethro Beekman wrote: > On 2018-11-18 18:32, Jarkko Sakkinen wrote: > > On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > > > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > > > Hi all- > > > > > > > > The people working on SGX enablement are grappling with a somewhat > > > > annoying issue: the x86 EENTER instruction is used from user code and > > > > can, as part of its normal-ish operation, raise an exception. It is > > > > also highly likely to be used from a library, and signal handling in > > > > libraries is unpleasant at best. > > > > > > > > There's been some discussion of adding a vDSO entry point to wrap > > > > EENTER and do something sensible with the exceptions, but I'm > > > > wondering if a more general mechanism would be helpful. > > > > > > I haven't really followed all of this discussion because I've been busy > > > working on the patch set but for me all of these approaches look awfully > > > complicated. > > > > > > I'll throw my own suggestion and apologize if this has been already > > > suggested and discarded: return-to-AEP. > > > > > > My idea is to do just a small extension to SGX AEX handling. At the > > > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > > > fill extend this by filling other three spare registers with exception > > > information. > > > > > > AEP handler can then do whatever it wants to do with this information > > > or just do ERESUME. > > > > A correction here. In practice this will add a requirement to have a bit > > more complicated AEP code (check the regs for exceptions) than before > > and not just bytes for ENCLU. > > > > e.g. AEP handler should be along the lines > > > > 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot > > handle the exception and returns back to user space i.e. to the > > AEP handler. > > 2. Check the registers containing exception information. If they have > > been filled, take whatever actions user space wants to take. > > 3. Otherwise, just ERESUME. > > > > From my point of view this is making the AEP parameter useful. Its > > standard use is just weird (always point to a place just containing > > ENCLU bytes, why the heck it even exists). > > I like this solution. Keeps things simple. One question: when an exception > occurs, how does the kernel know whether to set special registers or send a > signal? Yes, and AFAIK people do in many cases people want to do something else than just direct ERESUME in AEP handler so would neither be a major bummer for user space. If I remember correctly you have such? You can check the cases that we have for SIGSEGV (namely EPCM conflict) from Sean's patch 08/23. I'm open for expanding the scope. It is the easy part after there is consensus for the handling mechanism :-) /Jarkko
Re: RFC: userspace exception fixups
On 2018-11-18 18:32, Jarkko Sakkinen wrote: On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: Hi all- The people working on SGX enablement are grappling with a somewhat annoying issue: the x86 EENTER instruction is used from user code and can, as part of its normal-ish operation, raise an exception. It is also highly likely to be used from a library, and signal handling in libraries is unpleasant at best. There's been some discussion of adding a vDSO entry point to wrap EENTER and do something sensible with the exceptions, but I'm wondering if a more general mechanism would be helpful. I haven't really followed all of this discussion because I've been busy working on the patch set but for me all of these approaches look awfully complicated. I'll throw my own suggestion and apologize if this has been already suggested and discarded: return-to-AEP. My idea is to do just a small extension to SGX AEX handling. At the moment hardware will RAX, RBX and RCX with ERESUME parameters. We can fill extend this by filling other three spare registers with exception information. AEP handler can then do whatever it wants to do with this information or just do ERESUME. A correction here. In practice this will add a requirement to have a bit more complicated AEP code (check the regs for exceptions) than before and not just bytes for ENCLU. e.g. AEP handler should be along the lines 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot handle the exception and returns back to user space i.e. to the AEP handler. 2. Check the registers containing exception information. If they have been filled, take whatever actions user space wants to take. 3. Otherwise, just ERESUME. From my point of view this is making the AEP parameter useful. Its standard use is just weird (always point to a place just containing ENCLU bytes, why the heck it even exists). I like this solution. Keeps things simple. One question: when an exception occurs, how does the kernel know whether to set special registers or send a signal? -- Jethro Beekman | Fortanix smime.p7s Description: S/MIME Cryptographic Signature
Re: RFC: userspace exception fixups
On 2018-11-18 18:32, Jarkko Sakkinen wrote: On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: Hi all- The people working on SGX enablement are grappling with a somewhat annoying issue: the x86 EENTER instruction is used from user code and can, as part of its normal-ish operation, raise an exception. It is also highly likely to be used from a library, and signal handling in libraries is unpleasant at best. There's been some discussion of adding a vDSO entry point to wrap EENTER and do something sensible with the exceptions, but I'm wondering if a more general mechanism would be helpful. I haven't really followed all of this discussion because I've been busy working on the patch set but for me all of these approaches look awfully complicated. I'll throw my own suggestion and apologize if this has been already suggested and discarded: return-to-AEP. My idea is to do just a small extension to SGX AEX handling. At the moment hardware will RAX, RBX and RCX with ERESUME parameters. We can fill extend this by filling other three spare registers with exception information. AEP handler can then do whatever it wants to do with this information or just do ERESUME. A correction here. In practice this will add a requirement to have a bit more complicated AEP code (check the regs for exceptions) than before and not just bytes for ENCLU. e.g. AEP handler should be along the lines 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot handle the exception and returns back to user space i.e. to the AEP handler. 2. Check the registers containing exception information. If they have been filled, take whatever actions user space wants to take. 3. Otherwise, just ERESUME. From my point of view this is making the AEP parameter useful. Its standard use is just weird (always point to a place just containing ENCLU bytes, why the heck it even exists). I like this solution. Keeps things simple. One question: when an exception occurs, how does the kernel know whether to set special registers or send a signal? -- Jethro Beekman | Fortanix smime.p7s Description: S/MIME Cryptographic Signature
Re: RFC: userspace exception fixups
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > Hi all- > > > > The people working on SGX enablement are grappling with a somewhat > > annoying issue: the x86 EENTER instruction is used from user code and > > can, as part of its normal-ish operation, raise an exception. It is > > also highly likely to be used from a library, and signal handling in > > libraries is unpleasant at best. > > > > There's been some discussion of adding a vDSO entry point to wrap > > EENTER and do something sensible with the exceptions, but I'm > > wondering if a more general mechanism would be helpful. > > I haven't really followed all of this discussion because I've been busy > working on the patch set but for me all of these approaches look awfully > complicated. > > I'll throw my own suggestion and apologize if this has been already > suggested and discarded: return-to-AEP. > > My idea is to do just a small extension to SGX AEX handling. At the > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > fill extend this by filling other three spare registers with exception > information. > > AEP handler can then do whatever it wants to do with this information > or just do ERESUME. A correction here. In practice this will add a requirement to have a bit more complicated AEP code (check the regs for exceptions) than before and not just bytes for ENCLU. e.g. AEP handler should be along the lines 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot handle the exception and returns back to user space i.e. to the AEP handler. 2. Check the registers containing exception information. If they have been filled, take whatever actions user space wants to take. 3. Otherwise, just ERESUME. >From my point of view this is making the AEP parameter useful. Its standard use is just weird (always point to a place just containing ENCLU bytes, why the heck it even exists). /Jarkko
Re: RFC: userspace exception fixups
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > Hi all- > > > > The people working on SGX enablement are grappling with a somewhat > > annoying issue: the x86 EENTER instruction is used from user code and > > can, as part of its normal-ish operation, raise an exception. It is > > also highly likely to be used from a library, and signal handling in > > libraries is unpleasant at best. > > > > There's been some discussion of adding a vDSO entry point to wrap > > EENTER and do something sensible with the exceptions, but I'm > > wondering if a more general mechanism would be helpful. > > I haven't really followed all of this discussion because I've been busy > working on the patch set but for me all of these approaches look awfully > complicated. > > I'll throw my own suggestion and apologize if this has been already > suggested and discarded: return-to-AEP. > > My idea is to do just a small extension to SGX AEX handling. At the > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > fill extend this by filling other three spare registers with exception > information. > > AEP handler can then do whatever it wants to do with this information > or just do ERESUME. A correction here. In practice this will add a requirement to have a bit more complicated AEP code (check the regs for exceptions) than before and not just bytes for ENCLU. e.g. AEP handler should be along the lines 1. #PF (or #UD or) happens. Kernel fills the registers when it cannot handle the exception and returns back to user space i.e. to the AEP handler. 2. Check the registers containing exception information. If they have been filled, take whatever actions user space wants to take. 3. Otherwise, just ERESUME. >From my point of view this is making the AEP parameter useful. Its standard use is just weird (always point to a place just containing ENCLU bytes, why the heck it even exists). /Jarkko
Re: RFC: userspace exception fixups
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > Hi all- > > > > The people working on SGX enablement are grappling with a somewhat > > annoying issue: the x86 EENTER instruction is used from user code and > > can, as part of its normal-ish operation, raise an exception. It is > > also highly likely to be used from a library, and signal handling in > > libraries is unpleasant at best. > > > > There's been some discussion of adding a vDSO entry point to wrap > > EENTER and do something sensible with the exceptions, but I'm > > wondering if a more general mechanism would be helpful. > > I haven't really followed all of this discussion because I've been busy > working on the patch set but for me all of these approaches look awfully > complicated. > > I'll throw my own suggestion and apologize if this has been already > suggested and discarded: return-to-AEP. > > My idea is to do just a small extension to SGX AEX handling. At the > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > fill extend this by filling other three spare registers with exception s/fill extend/extend/ /Jarkko
Re: RFC: userspace exception fixups
On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote: > On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > > Hi all- > > > > The people working on SGX enablement are grappling with a somewhat > > annoying issue: the x86 EENTER instruction is used from user code and > > can, as part of its normal-ish operation, raise an exception. It is > > also highly likely to be used from a library, and signal handling in > > libraries is unpleasant at best. > > > > There's been some discussion of adding a vDSO entry point to wrap > > EENTER and do something sensible with the exceptions, but I'm > > wondering if a more general mechanism would be helpful. > > I haven't really followed all of this discussion because I've been busy > working on the patch set but for me all of these approaches look awfully > complicated. > > I'll throw my own suggestion and apologize if this has been already > suggested and discarded: return-to-AEP. > > My idea is to do just a small extension to SGX AEX handling. At the > moment hardware will RAX, RBX and RCX with ERESUME parameters. We can > fill extend this by filling other three spare registers with exception s/fill extend/extend/ /Jarkko
Re: RFC: userspace exception fixups
On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > Hi all- > > The people working on SGX enablement are grappling with a somewhat > annoying issue: the x86 EENTER instruction is used from user code and > can, as part of its normal-ish operation, raise an exception. It is > also highly likely to be used from a library, and signal handling in > libraries is unpleasant at best. > > There's been some discussion of adding a vDSO entry point to wrap > EENTER and do something sensible with the exceptions, but I'm > wondering if a more general mechanism would be helpful. I haven't really followed all of this discussion because I've been busy working on the patch set but for me all of these approaches look awfully complicated. I'll throw my own suggestion and apologize if this has been already suggested and discarded: return-to-AEP. My idea is to do just a small extension to SGX AEX handling. At the moment hardware will RAX, RBX and RCX with ERESUME parameters. We can fill extend this by filling other three spare registers with exception information. AEP handler can then do whatever it wants to do with this information or just do ERESUME. In some ways this dummied version of Sean's suggestion. I think whatever the solution is it should be lightweight and this is such solution. Why? Because exception handling could be then used to implement other stuff than just error hadling like syscall wrapper for the enclaves in nice and lean way. /Jarkko
Re: RFC: userspace exception fixups
On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote: > Hi all- > > The people working on SGX enablement are grappling with a somewhat > annoying issue: the x86 EENTER instruction is used from user code and > can, as part of its normal-ish operation, raise an exception. It is > also highly likely to be used from a library, and signal handling in > libraries is unpleasant at best. > > There's been some discussion of adding a vDSO entry point to wrap > EENTER and do something sensible with the exceptions, but I'm > wondering if a more general mechanism would be helpful. I haven't really followed all of this discussion because I've been busy working on the patch set but for me all of these approaches look awfully complicated. I'll throw my own suggestion and apologize if this has been already suggested and discarded: return-to-AEP. My idea is to do just a small extension to SGX AEX handling. At the moment hardware will RAX, RBX and RCX with ERESUME parameters. We can fill extend this by filling other three spare registers with exception information. AEP handler can then do whatever it wants to do with this information or just do ERESUME. In some ways this dummied version of Sean's suggestion. I think whatever the solution is it should be lightweight and this is such solution. Why? Because exception handling could be then used to implement other stuff than just error hadling like syscall wrapper for the enclaves in nice and lean way. /Jarkko
Re: RFC: userspace exception fixups
On Thu, Nov 08, 2018 at 12:05:42PM -0800, Andy Lutomirski wrote: > This whole thing is a mess. I'm starting to think that the cleanest > solution would be to provide a way to just tell the kernel that > certain RIP values have exception fixups. The bay far cleanest solution would be to say that SGX is sich a mess that we are not going to support it at all. It's not like it is a must have a feature to start with.
Re: RFC: userspace exception fixups
On Thu, Nov 08, 2018 at 12:05:42PM -0800, Andy Lutomirski wrote: > This whole thing is a mess. I'm starting to think that the cleanest > solution would be to provide a way to just tell the kernel that > certain RIP values have exception fixups. The bay far cleanest solution would be to say that SGX is sich a mess that we are not going to support it at all. It's not like it is a must have a feature to start with.
Re: RFC: userspace exception fixups
On Thu, Nov 08, 2018 at 01:50:31PM -0800, Dave Hansen wrote: > On 11/8/18 1:16 PM, Sean Christopherson wrote: > > On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote: > >> On 11/8/18 12:05 PM, Andy Lutomirski wrote: > >>> Hmm. The idea being that the SDK preserves RBP but not RSP. That's > >>> not the most terrible thing in the world. But could the SDK live with > >>> something more like my suggestion where the vDSO supplies a normal > >>> function that takes a struct containing registers that are visible to > >>> the enclave? This would make it extremely awkward for the enclave to > >>> use the untrusted stack per se, but it would make it quite easy (I > >>> think) for the untrusted part of the SDK to allocate some extra memory > >>> and just tell the enclave that *that* memory is the stack. > >> > >> I really think the enclave should keep its grubby mitts off the > >> untrusted stack. There are lots of ways to get memory, even with > >> stack-like semantics, that don't involve mucking with the stack itself. > >> > >> I have not heard a good, hard argument for why there is an absolute > >> *need* to store things on the actual untrusted stack. > > > > Convenience and performance are the only arguments I've heard, e.g. so > > that allocating memory doesn't require an extra EEXIT->EENTER round trip. > > Well, for the first access, it's going to cost a bunch asynchronous > exits to fault in all the stack pages. Instead of that, if you had a > single area, or an explicit out-call to allocate and populate the area, > you could do it in a single EEXIT and zero asynchronous exits for demand > page faults. > > So, it might be convenient, but I'm rather suspicious of any performance > arguments. Ya, I meant versus doing an EEXIT on every allocation, i.e. a very naive allocation scheme.
Re: RFC: userspace exception fixups
On Thu, Nov 08, 2018 at 01:50:31PM -0800, Dave Hansen wrote: > On 11/8/18 1:16 PM, Sean Christopherson wrote: > > On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote: > >> On 11/8/18 12:05 PM, Andy Lutomirski wrote: > >>> Hmm. The idea being that the SDK preserves RBP but not RSP. That's > >>> not the most terrible thing in the world. But could the SDK live with > >>> something more like my suggestion where the vDSO supplies a normal > >>> function that takes a struct containing registers that are visible to > >>> the enclave? This would make it extremely awkward for the enclave to > >>> use the untrusted stack per se, but it would make it quite easy (I > >>> think) for the untrusted part of the SDK to allocate some extra memory > >>> and just tell the enclave that *that* memory is the stack. > >> > >> I really think the enclave should keep its grubby mitts off the > >> untrusted stack. There are lots of ways to get memory, even with > >> stack-like semantics, that don't involve mucking with the stack itself. > >> > >> I have not heard a good, hard argument for why there is an absolute > >> *need* to store things on the actual untrusted stack. > > > > Convenience and performance are the only arguments I've heard, e.g. so > > that allocating memory doesn't require an extra EEXIT->EENTER round trip. > > Well, for the first access, it's going to cost a bunch asynchronous > exits to fault in all the stack pages. Instead of that, if you had a > single area, or an explicit out-call to allocate and populate the area, > you could do it in a single EEXIT and zero asynchronous exits for demand > page faults. > > So, it might be convenient, but I'm rather suspicious of any performance > arguments. Ya, I meant versus doing an EEXIT on every allocation, i.e. a very naive allocation scheme.
Re: RFC: userspace exception fixups
On 11/8/18 1:16 PM, Sean Christopherson wrote: > On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote: >> On 11/8/18 12:05 PM, Andy Lutomirski wrote: >>> Hmm. The idea being that the SDK preserves RBP but not RSP. That's >>> not the most terrible thing in the world. But could the SDK live with >>> something more like my suggestion where the vDSO supplies a normal >>> function that takes a struct containing registers that are visible to >>> the enclave? This would make it extremely awkward for the enclave to >>> use the untrusted stack per se, but it would make it quite easy (I >>> think) for the untrusted part of the SDK to allocate some extra memory >>> and just tell the enclave that *that* memory is the stack. >> >> I really think the enclave should keep its grubby mitts off the >> untrusted stack. There are lots of ways to get memory, even with >> stack-like semantics, that don't involve mucking with the stack itself. >> >> I have not heard a good, hard argument for why there is an absolute >> *need* to store things on the actual untrusted stack. > > Convenience and performance are the only arguments I've heard, e.g. so > that allocating memory doesn't require an extra EEXIT->EENTER round trip. Well, for the first access, it's going to cost a bunch asynchronous exits to fault in all the stack pages. Instead of that, if you had a single area, or an explicit out-call to allocate and populate the area, you could do it in a single EEXIT and zero asynchronous exits for demand page faults. So, it might be convenient, but I'm rather suspicious of any performance arguments.
Re: RFC: userspace exception fixups
On 11/8/18 1:16 PM, Sean Christopherson wrote: > On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote: >> On 11/8/18 12:05 PM, Andy Lutomirski wrote: >>> Hmm. The idea being that the SDK preserves RBP but not RSP. That's >>> not the most terrible thing in the world. But could the SDK live with >>> something more like my suggestion where the vDSO supplies a normal >>> function that takes a struct containing registers that are visible to >>> the enclave? This would make it extremely awkward for the enclave to >>> use the untrusted stack per se, but it would make it quite easy (I >>> think) for the untrusted part of the SDK to allocate some extra memory >>> and just tell the enclave that *that* memory is the stack. >> >> I really think the enclave should keep its grubby mitts off the >> untrusted stack. There are lots of ways to get memory, even with >> stack-like semantics, that don't involve mucking with the stack itself. >> >> I have not heard a good, hard argument for why there is an absolute >> *need* to store things on the actual untrusted stack. > > Convenience and performance are the only arguments I've heard, e.g. so > that allocating memory doesn't require an extra EEXIT->EENTER round trip. Well, for the first access, it's going to cost a bunch asynchronous exits to fault in all the stack pages. Instead of that, if you had a single area, or an explicit out-call to allocate and populate the area, you could do it in a single EEXIT and zero asynchronous exits for demand page faults. So, it might be convenient, but I'm rather suspicious of any performance arguments.
Re: RFC: userspace exception fixups
On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote: > On 11/8/18 12:05 PM, Andy Lutomirski wrote: > > Hmm. The idea being that the SDK preserves RBP but not RSP. That's > > not the most terrible thing in the world. But could the SDK live with > > something more like my suggestion where the vDSO supplies a normal > > function that takes a struct containing registers that are visible to > > the enclave? This would make it extremely awkward for the enclave to > > use the untrusted stack per se, but it would make it quite easy (I > > think) for the untrusted part of the SDK to allocate some extra memory > > and just tell the enclave that *that* memory is the stack. > > I really think the enclave should keep its grubby mitts off the > untrusted stack. There are lots of ways to get memory, even with > stack-like semantics, that don't involve mucking with the stack itself. > > I have not heard a good, hard argument for why there is an absolute > *need* to store things on the actual untrusted stack. Convenience and performance are the only arguments I've heard, e.g. so that allocating memory doesn't require an extra EEXIT->EENTER round trip. > We could quite easily have the untrusted code just promise to allocate a > stack-sized virtual area (even derived from the stack rlimit size) and > pass that into the enclave for parameter use. I agree more and more the further I dig. AFAIK there is no need to for the enclave to actually load %rsp. The initial EENTER can pass in the base/top of the pseudo-stack and from there the enclave can manage it purely in software.
Re: RFC: userspace exception fixups
On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote: > On 11/8/18 12:05 PM, Andy Lutomirski wrote: > > Hmm. The idea being that the SDK preserves RBP but not RSP. That's > > not the most terrible thing in the world. But could the SDK live with > > something more like my suggestion where the vDSO supplies a normal > > function that takes a struct containing registers that are visible to > > the enclave? This would make it extremely awkward for the enclave to > > use the untrusted stack per se, but it would make it quite easy (I > > think) for the untrusted part of the SDK to allocate some extra memory > > and just tell the enclave that *that* memory is the stack. > > I really think the enclave should keep its grubby mitts off the > untrusted stack. There are lots of ways to get memory, even with > stack-like semantics, that don't involve mucking with the stack itself. > > I have not heard a good, hard argument for why there is an absolute > *need* to store things on the actual untrusted stack. Convenience and performance are the only arguments I've heard, e.g. so that allocating memory doesn't require an extra EEXIT->EENTER round trip. > We could quite easily have the untrusted code just promise to allocate a > stack-sized virtual area (even derived from the stack rlimit size) and > pass that into the enclave for parameter use. I agree more and more the further I dig. AFAIK there is no need to for the enclave to actually load %rsp. The initial EENTER can pass in the base/top of the pseudo-stack and from there the enclave can manage it purely in software.
Re: RFC: userspace exception fixups
On 11/8/18 12:05 PM, Andy Lutomirski wrote: > Hmm. The idea being that the SDK preserves RBP but not RSP. That's > not the most terrible thing in the world. But could the SDK live with > something more like my suggestion where the vDSO supplies a normal > function that takes a struct containing registers that are visible to > the enclave? This would make it extremely awkward for the enclave to > use the untrusted stack per se, but it would make it quite easy (I > think) for the untrusted part of the SDK to allocate some extra memory > and just tell the enclave that *that* memory is the stack. I really think the enclave should keep its grubby mitts off the untrusted stack. There are lots of ways to get memory, even with stack-like semantics, that don't involve mucking with the stack itself. I have not heard a good, hard argument for why there is an absolute *need* to store things on the actual untrusted stack. We could quite easily have the untrusted code just promise to allocate a stack-sized virtual area (even derived from the stack rlimit size) and pass that into the enclave for parameter use.
Re: RFC: userspace exception fixups
On 11/8/18 12:05 PM, Andy Lutomirski wrote: > Hmm. The idea being that the SDK preserves RBP but not RSP. That's > not the most terrible thing in the world. But could the SDK live with > something more like my suggestion where the vDSO supplies a normal > function that takes a struct containing registers that are visible to > the enclave? This would make it extremely awkward for the enclave to > use the untrusted stack per se, but it would make it quite easy (I > think) for the untrusted part of the SDK to allocate some extra memory > and just tell the enclave that *that* memory is the stack. I really think the enclave should keep its grubby mitts off the untrusted stack. There are lots of ways to get memory, even with stack-like semantics, that don't involve mucking with the stack itself. I have not heard a good, hard argument for why there is an absolute *need* to store things on the actual untrusted stack. We could quite easily have the untrusted code just promise to allocate a stack-sized virtual area (even derived from the stack rlimit size) and pass that into the enclave for parameter use.
Re: RFC: userspace exception fixups
On Thu, Nov 8, 2018 at 11:54 AM Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote: > > > > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > > > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: > > >> True, but what if we have a nasty enclave that writes to memory just > > >> below SP *before* decrementing SP? > > > > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > > > >1. EENTER > > >2. Hardware sets eenter_hwframe->sp = %sp > > >3. Enclave runs... wants to do out-call > > >4. Enclave sets up parameters: > > >memcpy(_hwframe->sp[-offset], arg1, size); > > >... > > >5. Enclave sets eenter_hwframe->sp -= offset > > > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > > talk about. If we do this, we also basically require that the code > > > which handles asynchronous exits must *not* write to the stack. That's > > > not hard because it's typically just a single ERESUME instruction, but > > > it *is* a requirement. > > > > > > > I was assuming that the async exit stuff was completely hidden by the > > API. The AEP code would decide whether the exit got fixed up by the > > kernel (which may or may not be easy to tell — can the code even tell > > without kernel help whether it was, say, an IRQ vs #UD?) and then either > > do ERESUME or cause sgx_enter_enclave() to return with an appropriate > > return value. > > Ok, SDK folks came up with an idea that would allow them to use vDSO, > albeit with a bit of ugliness and potentially a ROP-attack issue. > Definitely some weirdness, but the weirdness is well contained, unlike > the magic prefix approach. > > Provide two enter_enclave() vDSO "functions". The first is a normal > function with a normal C interface. The second is a blob of code that > is "called" and "returns" via indirect jmp, and can be used by SGX > runtimes that want to use the untrusted stack for out-calls from the > enclave. > > For the indirect jmp "function", use %rbp to stash the return address > of the caller (either in %rbp itself or in memory pointed to by %rbp). > It works because hardware also saves/restores %rbp along with %rsp when > doing enclave transitions, and the SDK can live with %rbp being > off-limits. Fault info is passed via registers. Hmm. The idea being that the SDK preserves RBP but not RSP. That's not the most terrible thing in the world. But could the SDK live with something more like my suggestion where the vDSO supplies a normal function that takes a struct containing registers that are visible to the enclave? This would make it extremely awkward for the enclave to use the untrusted stack per se, but it would make it quite easy (I think) for the untrusted part of the SDK to allocate some extra memory and just tell the enclave that *that* memory is the stack. AFAFICS we do have two registers that genuinely are preserved: FSBASE and GSBASE. Which is a good thing, because otherwise SGX enablement would currently be a privilege escalation issue due to making GSBASE writable when it should not be. This whole thing is a mess. I'm starting to think that the cleanest solution would be to provide a way to just tell the kernel that certain RIP values have exception fixups.
Re: RFC: userspace exception fixups
On Thu, Nov 8, 2018 at 11:54 AM Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote: > > > > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > > > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: > > >> True, but what if we have a nasty enclave that writes to memory just > > >> below SP *before* decrementing SP? > > > > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > > > >1. EENTER > > >2. Hardware sets eenter_hwframe->sp = %sp > > >3. Enclave runs... wants to do out-call > > >4. Enclave sets up parameters: > > >memcpy(_hwframe->sp[-offset], arg1, size); > > >... > > >5. Enclave sets eenter_hwframe->sp -= offset > > > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > > talk about. If we do this, we also basically require that the code > > > which handles asynchronous exits must *not* write to the stack. That's > > > not hard because it's typically just a single ERESUME instruction, but > > > it *is* a requirement. > > > > > > > I was assuming that the async exit stuff was completely hidden by the > > API. The AEP code would decide whether the exit got fixed up by the > > kernel (which may or may not be easy to tell — can the code even tell > > without kernel help whether it was, say, an IRQ vs #UD?) and then either > > do ERESUME or cause sgx_enter_enclave() to return with an appropriate > > return value. > > Ok, SDK folks came up with an idea that would allow them to use vDSO, > albeit with a bit of ugliness and potentially a ROP-attack issue. > Definitely some weirdness, but the weirdness is well contained, unlike > the magic prefix approach. > > Provide two enter_enclave() vDSO "functions". The first is a normal > function with a normal C interface. The second is a blob of code that > is "called" and "returns" via indirect jmp, and can be used by SGX > runtimes that want to use the untrusted stack for out-calls from the > enclave. > > For the indirect jmp "function", use %rbp to stash the return address > of the caller (either in %rbp itself or in memory pointed to by %rbp). > It works because hardware also saves/restores %rbp along with %rsp when > doing enclave transitions, and the SDK can live with %rbp being > off-limits. Fault info is passed via registers. Hmm. The idea being that the SDK preserves RBP but not RSP. That's not the most terrible thing in the world. But could the SDK live with something more like my suggestion where the vDSO supplies a normal function that takes a struct containing registers that are visible to the enclave? This would make it extremely awkward for the enclave to use the untrusted stack per se, but it would make it quite easy (I think) for the untrusted part of the SDK to allocate some extra memory and just tell the enclave that *that* memory is the stack. AFAFICS we do have two registers that genuinely are preserved: FSBASE and GSBASE. Which is a good thing, because otherwise SGX enablement would currently be a privilege escalation issue due to making GSBASE writable when it should not be. This whole thing is a mess. I'm starting to think that the cleanest solution would be to provide a way to just tell the kernel that certain RIP values have exception fixups.
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote: > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: > >> True, but what if we have a nasty enclave that writes to memory just > >> below SP *before* decrementing SP? > > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > >1. EENTER > >2. Hardware sets eenter_hwframe->sp = %sp > >3. Enclave runs... wants to do out-call > >4. Enclave sets up parameters: > >memcpy(_hwframe->sp[-offset], arg1, size); > >... > >5. Enclave sets eenter_hwframe->sp -= offset > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > talk about. If we do this, we also basically require that the code > > which handles asynchronous exits must *not* write to the stack. That's > > not hard because it's typically just a single ERESUME instruction, but > > it *is* a requirement. > > > > I was assuming that the async exit stuff was completely hidden by the > API. The AEP code would decide whether the exit got fixed up by the > kernel (which may or may not be easy to tell — can the code even tell > without kernel help whether it was, say, an IRQ vs #UD?) and then either > do ERESUME or cause sgx_enter_enclave() to return with an appropriate > return value. Ok, SDK folks came up with an idea that would allow them to use vDSO, albeit with a bit of ugliness and potentially a ROP-attack issue. Definitely some weirdness, but the weirdness is well contained, unlike the magic prefix approach. Provide two enter_enclave() vDSO "functions". The first is a normal function with a normal C interface. The second is a blob of code that is "called" and "returns" via indirect jmp, and can be used by SGX runtimes that want to use the untrusted stack for out-calls from the enclave. For the indirect jmp "function", use %rbp to stash the return address of the caller (either in %rbp itself or in memory pointed to by %rbp). It works because hardware also saves/restores %rbp along with %rsp when doing enclave transitions, and the SDK can live with %rbp being off-limits. Fault info is passed via registers. Basic idea for the "functions" below. The fixup stuff is obviously not wired up correctly, just trying to convey the concept. struct enclu_fault_info { unsigned intleaf; unsigned inttrapnr; unsigned interror_code; unsigned long address; }; int __vdso_enter_enclave(void *tcs, struct enclu_fault_info *fault_info) { unsigned int leaf, trapnr; asm volatile ( "lea2f(%%rip), %%rcx\n\t" "1: enclu\n\t" "jmp3f\n\t" /* ERESUME trampoline */ "2: enclu\n\t" "ud2\n\t" /* out: */ "3:\n" /* EENTER fixup */ ".pushsection .fixup,\"ax\"\n\t" "4:\n\t" "mov%%eax, %%edi\n\t" "movl $"__stringify(SGX_EENTER)", %%eax\n\t" "jmp3b\n\t" ".popsection\n\t" _ASM_EXTABLE_FAULT(1b, 4b) /* ERESUME FIXUP */ ".pushsection .fixup,\"ax\"\n\t" "5:\n\t" "mov%%eax, %%edi\n\t" "movl $"__stringify(SGX_ERESUME)", %%eax\n\t" "jmp3b\n\t" ".popsection\n\t" _ASM_EXTABLE_FAULT(2b, 5b) : "=a"(leaf), "=D" (trapnr) : "a" (SGX_EENTER), "b" (tcs) : "cc", "memory", "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15" ); if (leaf == SGX_EEXIT) return 0; if (fault_info) { fault_info->leaf = leaf; fault_info->trapnr = trapnr; fault_info->error_code = 0; fault_info->address = 0; } return -EFAULT; } GLOBAL(__vdso_enter_enclave_no_stack) endbr64 /* %rbp = return target, %rbx = tcs */ leaq3f(%rip), %rcx movl$2, %eax 1: enclu /* "return" to "caller" */ 2: jmp *%rbp /* ERESUME trampoline */ 3: enclu ud2 /* EENTER fixup handler */ 4: movq%rax, %rdi movl$2, %eax /* %rsi = error code, %rdx = address */ jmp 2b /* ERESUME fixup handler */ 5: movq%rax, %rdi movl$3, %eax /* %rsi = error code, %rdx = address */ jmp 2b
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote: > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: > >> True, but what if we have a nasty enclave that writes to memory just > >> below SP *before* decrementing SP? > > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > >1. EENTER > >2. Hardware sets eenter_hwframe->sp = %sp > >3. Enclave runs... wants to do out-call > >4. Enclave sets up parameters: > >memcpy(_hwframe->sp[-offset], arg1, size); > >... > >5. Enclave sets eenter_hwframe->sp -= offset > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > talk about. If we do this, we also basically require that the code > > which handles asynchronous exits must *not* write to the stack. That's > > not hard because it's typically just a single ERESUME instruction, but > > it *is* a requirement. > > > > I was assuming that the async exit stuff was completely hidden by the > API. The AEP code would decide whether the exit got fixed up by the > kernel (which may or may not be easy to tell — can the code even tell > without kernel help whether it was, say, an IRQ vs #UD?) and then either > do ERESUME or cause sgx_enter_enclave() to return with an appropriate > return value. Ok, SDK folks came up with an idea that would allow them to use vDSO, albeit with a bit of ugliness and potentially a ROP-attack issue. Definitely some weirdness, but the weirdness is well contained, unlike the magic prefix approach. Provide two enter_enclave() vDSO "functions". The first is a normal function with a normal C interface. The second is a blob of code that is "called" and "returns" via indirect jmp, and can be used by SGX runtimes that want to use the untrusted stack for out-calls from the enclave. For the indirect jmp "function", use %rbp to stash the return address of the caller (either in %rbp itself or in memory pointed to by %rbp). It works because hardware also saves/restores %rbp along with %rsp when doing enclave transitions, and the SDK can live with %rbp being off-limits. Fault info is passed via registers. Basic idea for the "functions" below. The fixup stuff is obviously not wired up correctly, just trying to convey the concept. struct enclu_fault_info { unsigned intleaf; unsigned inttrapnr; unsigned interror_code; unsigned long address; }; int __vdso_enter_enclave(void *tcs, struct enclu_fault_info *fault_info) { unsigned int leaf, trapnr; asm volatile ( "lea2f(%%rip), %%rcx\n\t" "1: enclu\n\t" "jmp3f\n\t" /* ERESUME trampoline */ "2: enclu\n\t" "ud2\n\t" /* out: */ "3:\n" /* EENTER fixup */ ".pushsection .fixup,\"ax\"\n\t" "4:\n\t" "mov%%eax, %%edi\n\t" "movl $"__stringify(SGX_EENTER)", %%eax\n\t" "jmp3b\n\t" ".popsection\n\t" _ASM_EXTABLE_FAULT(1b, 4b) /* ERESUME FIXUP */ ".pushsection .fixup,\"ax\"\n\t" "5:\n\t" "mov%%eax, %%edi\n\t" "movl $"__stringify(SGX_ERESUME)", %%eax\n\t" "jmp3b\n\t" ".popsection\n\t" _ASM_EXTABLE_FAULT(2b, 5b) : "=a"(leaf), "=D" (trapnr) : "a" (SGX_EENTER), "b" (tcs) : "cc", "memory", "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15" ); if (leaf == SGX_EEXIT) return 0; if (fault_info) { fault_info->leaf = leaf; fault_info->trapnr = trapnr; fault_info->error_code = 0; fault_info->address = 0; } return -EFAULT; } GLOBAL(__vdso_enter_enclave_no_stack) endbr64 /* %rbp = return target, %rbx = tcs */ leaq3f(%rip), %rcx movl$2, %eax 1: enclu /* "return" to "caller" */ 2: jmp *%rbp /* ERESUME trampoline */ 3: enclu ud2 /* EENTER fixup handler */ 4: movq%rax, %rdi movl$2, %eax /* %rsi = error code, %rdx = address */ jmp 2b /* ERESUME fixup handler */ 5: movq%rax, %rdi movl$3, %eax /* %rsi = error code, %rdx = address */ jmp 2b
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 01:40:59PM -0800, Sean Christopherson wrote: > > In that case it seems like the only way to use SGX that's not a gaping > > security hole is to run the SGX enclave in its own fully-seccomp (or > > equivalent) process, with no host application in the same address > > space. Since the host application can't see the contents of the > > enclave to make any determination of whether it's safe to run, running > > it in the same address space only makes sense if the cpu provides > > protection against unwanted accesses to the host's memory from the > > enclave -- and according to you, it doesn't. > > The enclave's code (and any initial data) isn't encrypted until the > pages are loaded into the Enclave Page Cache (EPC), which can only > be done by the kernel (via ENCLS[EADD]). In other words, both the > kernel and userspace can vet the code/data before running an enclave. > > Practically speaking, an enclave will be coupled with an untrusted > userspace runtime, i.e. it's loader. Enclaves are also measured > as part of their build process, and so the enclave loader needs to > know which pages to add to the measurement, and in what order. I > guess technically speaking an enclave could have zero pages added > to its measurement, but that'd probably be a big red flag that said > enclave is up to something fishy. IMHO the whole idea adds too much policy into kernel even if it would be doable. You can easily spawn untrusted run-time and enclave to its own process. Seccomp limits the syscall space and enclaves cannot do syscalls in the first place. It is the URT that will do them behalf of the enclave. /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 01:40:59PM -0800, Sean Christopherson wrote: > > In that case it seems like the only way to use SGX that's not a gaping > > security hole is to run the SGX enclave in its own fully-seccomp (or > > equivalent) process, with no host application in the same address > > space. Since the host application can't see the contents of the > > enclave to make any determination of whether it's safe to run, running > > it in the same address space only makes sense if the cpu provides > > protection against unwanted accesses to the host's memory from the > > enclave -- and according to you, it doesn't. > > The enclave's code (and any initial data) isn't encrypted until the > pages are loaded into the Enclave Page Cache (EPC), which can only > be done by the kernel (via ENCLS[EADD]). In other words, both the > kernel and userspace can vet the code/data before running an enclave. > > Practically speaking, an enclave will be coupled with an untrusted > userspace runtime, i.e. it's loader. Enclaves are also measured > as part of their build process, and so the enclave loader needs to > know which pages to add to the measurement, and in what order. I > guess technically speaking an enclave could have zero pages added > to its measurement, but that'd probably be a big red flag that said > enclave is up to something fishy. IMHO the whole idea adds too much policy into kernel even if it would be doable. You can easily spawn untrusted run-time and enclave to its own process. Seccomp limits the syscall space and enclaves cannot do syscalls in the first place. It is the URT that will do them behalf of the enclave. /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 12:56:58PM -0800, Dave Hansen wrote: > On 11/7/18 11:01 AM, Sean Christopherson wrote: > > Going off comments in similar code related to UMIP, we'd need to figure > > out how to handle protection keys. > > There are two options: > 1. Don't depend on the userspace mapping. Do get_user_pages() to find >the instruction in the kernel direct map, and use that. > 2. Do a WRPKRU that allows read access, do the read, then put PKRU back. >This is a pain because of preemption and all that jazz. > > Right now, we just let the prefetch instruction detection fail if you > mark it unreadable with pkeys. Tough cookies, basically. But, that's > just the kernel being nice, but you need it for functionality, so it's > tougher. I would go with one because it is the stable way to do it and we are 100% sure to not conflict with pk's. /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 12:56:58PM -0800, Dave Hansen wrote: > On 11/7/18 11:01 AM, Sean Christopherson wrote: > > Going off comments in similar code related to UMIP, we'd need to figure > > out how to handle protection keys. > > There are two options: > 1. Don't depend on the userspace mapping. Do get_user_pages() to find >the instruction in the kernel direct map, and use that. > 2. Do a WRPKRU that allows read access, do the read, then put PKRU back. >This is a pain because of preemption and all that jazz. > > Right now, we just let the prefetch instruction detection fail if you > mark it unreadable with pkeys. Tough cookies, basically. But, that's > just the kernel being nice, but you need it for functionality, so it's > tougher. I would go with one because it is the stable way to do it and we are 100% sure to not conflict with pk's. /Jarkko
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 04:27:58PM -0500, Rich Felker wrote: > On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen > > > > wrote: > > > > > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > > > I almost feel like the right solution is to call into SGX on its own > > > > > > private stack or maybe even its own private address space. > > > > > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat > > > > > the > > > > > enclave like its own "thread" with its own stack and its own set of > > > > > registers and context? That seems like a much more workable model > > > > > than > > > > > trying to weave it together with the EENTER context. > > > > > > > > So maybe the API should be, roughly > > > > > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > > > host_state *state); > > > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > > > > > where host_state is something like: > > > > > > > > struct host_state { > > > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > > > }; > > > > > > > > and the values in host_state explicitly have nothing to do with the > > > > actual host registers. So, if you want to use the outcall mechanism, > > > > you'd allocate some memory, point sp to that memory, call > > > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > > > > > Actually implementing this would be distinctly nontrivial, and would > > > > almost certainly need some degree of kernel help to avoid an explosion > > > > when a signal gets delivered while we have host_state.sp loaded into > > > > the actual SP register. Maybe rseq could help with this? > > > > > > > > The ISA here is IMO not well thought through. > > > > > > Maybe I'm mistaken about some fundamentals here, but my understanding > > > of SGX is that the whole point is that the host application and the > > > code running in the enclave are mutually adversarial towards one > > > another. Do any or all of the proposed protocols here account for this > > > and fully protect the host application from malicious code in the > > > enclave? It seems that having control over the register file on exit > > > from the enclave is fundamentally problematic but I assume there must > > > be some way I'm missing that this is fixed up. > > > > SGX provides protections for the enclave but not the other way around. > > The kernel has all of its normal non-SGX protections in place, but the > > enclave can certainly wreak havoc on its userspace process. The basic > > design idea is that the enclave is a specialized .so that gets extra > > security protections but is still effectively part of the overall > > application, e.g. it has full access to its host userspace process' > > virtual memory. > > In that case it seems like the only way to use SGX that's not a gaping > security hole is to run the SGX enclave in its own fully-seccomp (or > equivalent) process, with no host application in the same address > space. Since the host application can't see the contents of the > enclave to make any determination of whether it's safe to run, running > it in the same address space only makes sense if the cpu provides > protection against unwanted accesses to the host's memory from the > enclave -- and according to you, it doesn't. The enclave's code (and any initial data) isn't encrypted until the pages are loaded into the Enclave Page Cache (EPC), which can only be done by the kernel (via ENCLS[EADD]). In other words, both the kernel and userspace can vet the code/data before running an enclave. Practically speaking, an enclave will be coupled with an untrusted userspace runtime, i.e. it's loader. Enclaves are also measured as part of their build process, and so the enclave loader needs to know which pages to add to the measurement, and in what order. I guess technically speaking an enclave could have zero pages added to its measurement, but that'd probably be a big red flag that said enclave is up to something fishy.
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 04:27:58PM -0500, Rich Felker wrote: > On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen > > > > wrote: > > > > > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > > > I almost feel like the right solution is to call into SGX on its own > > > > > > private stack or maybe even its own private address space. > > > > > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat > > > > > the > > > > > enclave like its own "thread" with its own stack and its own set of > > > > > registers and context? That seems like a much more workable model > > > > > than > > > > > trying to weave it together with the EENTER context. > > > > > > > > So maybe the API should be, roughly > > > > > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > > > host_state *state); > > > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > > > > > where host_state is something like: > > > > > > > > struct host_state { > > > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > > > }; > > > > > > > > and the values in host_state explicitly have nothing to do with the > > > > actual host registers. So, if you want to use the outcall mechanism, > > > > you'd allocate some memory, point sp to that memory, call > > > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > > > > > Actually implementing this would be distinctly nontrivial, and would > > > > almost certainly need some degree of kernel help to avoid an explosion > > > > when a signal gets delivered while we have host_state.sp loaded into > > > > the actual SP register. Maybe rseq could help with this? > > > > > > > > The ISA here is IMO not well thought through. > > > > > > Maybe I'm mistaken about some fundamentals here, but my understanding > > > of SGX is that the whole point is that the host application and the > > > code running in the enclave are mutually adversarial towards one > > > another. Do any or all of the proposed protocols here account for this > > > and fully protect the host application from malicious code in the > > > enclave? It seems that having control over the register file on exit > > > from the enclave is fundamentally problematic but I assume there must > > > be some way I'm missing that this is fixed up. > > > > SGX provides protections for the enclave but not the other way around. > > The kernel has all of its normal non-SGX protections in place, but the > > enclave can certainly wreak havoc on its userspace process. The basic > > design idea is that the enclave is a specialized .so that gets extra > > security protections but is still effectively part of the overall > > application, e.g. it has full access to its host userspace process' > > virtual memory. > > In that case it seems like the only way to use SGX that's not a gaping > security hole is to run the SGX enclave in its own fully-seccomp (or > equivalent) process, with no host application in the same address > space. Since the host application can't see the contents of the > enclave to make any determination of whether it's safe to run, running > it in the same address space only makes sense if the cpu provides > protection against unwanted accesses to the host's memory from the > enclave -- and according to you, it doesn't. The enclave's code (and any initial data) isn't encrypted until the pages are loaded into the Enclave Page Cache (EPC), which can only be done by the kernel (via ENCLS[EADD]). In other words, both the kernel and userspace can vet the code/data before running an enclave. Practically speaking, an enclave will be coupled with an untrusted userspace runtime, i.e. it's loader. Enclaves are also measured as part of their build process, and so the enclave loader needs to know which pages to add to the measurement, and in what order. I guess technically speaking an enclave could have zero pages added to its measurement, but that'd probably be a big red flag that said enclave is up to something fishy.
Re: RFC: userspace exception fixups
On Wed, Nov 7, 2018 at 1:28 PM Rich Felker wrote: > > On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen > > > > wrote: > > > > > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > > > I almost feel like the right solution is to call into SGX on its own > > > > > > private stack or maybe even its own private address space. > > > > > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat > > > > > the > > > > > enclave like its own "thread" with its own stack and its own set of > > > > > registers and context? That seems like a much more workable model > > > > > than > > > > > trying to weave it together with the EENTER context. > > > > > > > > So maybe the API should be, roughly > > > > > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > > > host_state *state); > > > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > > > > > where host_state is something like: > > > > > > > > struct host_state { > > > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > > > }; > > > > > > > > and the values in host_state explicitly have nothing to do with the > > > > actual host registers. So, if you want to use the outcall mechanism, > > > > you'd allocate some memory, point sp to that memory, call > > > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > > > > > Actually implementing this would be distinctly nontrivial, and would > > > > almost certainly need some degree of kernel help to avoid an explosion > > > > when a signal gets delivered while we have host_state.sp loaded into > > > > the actual SP register. Maybe rseq could help with this? > > > > > > > > The ISA here is IMO not well thought through. > > > > > > Maybe I'm mistaken about some fundamentals here, but my understanding > > > of SGX is that the whole point is that the host application and the > > > code running in the enclave are mutually adversarial towards one > > > another. Do any or all of the proposed protocols here account for this > > > and fully protect the host application from malicious code in the > > > enclave? It seems that having control over the register file on exit > > > from the enclave is fundamentally problematic but I assume there must > > > be some way I'm missing that this is fixed up. > > > > SGX provides protections for the enclave but not the other way around. > > The kernel has all of its normal non-SGX protections in place, but the > > enclave can certainly wreak havoc on its userspace process. The basic > > design idea is that the enclave is a specialized .so that gets extra > > security protections but is still effectively part of the overall > > application, e.g. it has full access to its host userspace process' > > virtual memory. > > In that case it seems like the only way to use SGX that's not a gaping > security hole is to run the SGX enclave in its own fully-seccomp (or > equivalent) process, with no host application in the same address > space. Since the host application can't see the contents of the > enclave to make any determination of whether it's safe to run, running > it in the same address space only makes sense if the cpu provides > protection against unwanted accesses to the host's memory from the > enclave -- and according to you, it doesn't. > I think the theory is that the enclave is shipped with the host application. That being said, a way to run the enclave in an address space that has basically nothing else (except an ENCLU instruction as a trampoline) would be quite nice.
Re: RFC: userspace exception fixups
On Wed, Nov 7, 2018 at 1:28 PM Rich Felker wrote: > > On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen > > > > wrote: > > > > > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > > > I almost feel like the right solution is to call into SGX on its own > > > > > > private stack or maybe even its own private address space. > > > > > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat > > > > > the > > > > > enclave like its own "thread" with its own stack and its own set of > > > > > registers and context? That seems like a much more workable model > > > > > than > > > > > trying to weave it together with the EENTER context. > > > > > > > > So maybe the API should be, roughly > > > > > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > > > host_state *state); > > > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > > > > > where host_state is something like: > > > > > > > > struct host_state { > > > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > > > }; > > > > > > > > and the values in host_state explicitly have nothing to do with the > > > > actual host registers. So, if you want to use the outcall mechanism, > > > > you'd allocate some memory, point sp to that memory, call > > > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > > > > > Actually implementing this would be distinctly nontrivial, and would > > > > almost certainly need some degree of kernel help to avoid an explosion > > > > when a signal gets delivered while we have host_state.sp loaded into > > > > the actual SP register. Maybe rseq could help with this? > > > > > > > > The ISA here is IMO not well thought through. > > > > > > Maybe I'm mistaken about some fundamentals here, but my understanding > > > of SGX is that the whole point is that the host application and the > > > code running in the enclave are mutually adversarial towards one > > > another. Do any or all of the proposed protocols here account for this > > > and fully protect the host application from malicious code in the > > > enclave? It seems that having control over the register file on exit > > > from the enclave is fundamentally problematic but I assume there must > > > be some way I'm missing that this is fixed up. > > > > SGX provides protections for the enclave but not the other way around. > > The kernel has all of its normal non-SGX protections in place, but the > > enclave can certainly wreak havoc on its userspace process. The basic > > design idea is that the enclave is a specialized .so that gets extra > > security protections but is still effectively part of the overall > > application, e.g. it has full access to its host userspace process' > > virtual memory. > > In that case it seems like the only way to use SGX that's not a gaping > security hole is to run the SGX enclave in its own fully-seccomp (or > equivalent) process, with no host application in the same address > space. Since the host application can't see the contents of the > enclave to make any determination of whether it's safe to run, running > it in the same address space only makes sense if the cpu provides > protection against unwanted accesses to the host's memory from the > enclave -- and according to you, it doesn't. > I think the theory is that the enclave is shipped with the host application. That being said, a way to run the enclave in an address space that has basically nothing else (except an ENCLU instruction as a trampoline) would be quite nice.
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote: > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > > I almost feel like the right solution is to call into SGX on its own > > > > > private stack or maybe even its own private address space. > > > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > > > > enclave like its own "thread" with its own stack and its own set of > > > > registers and context? That seems like a much more workable model than > > > > trying to weave it together with the EENTER context. > > > > > > So maybe the API should be, roughly > > > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > > host_state *state); > > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > > > where host_state is something like: > > > > > > struct host_state { > > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > > }; > > > > > > and the values in host_state explicitly have nothing to do with the > > > actual host registers. So, if you want to use the outcall mechanism, > > > you'd allocate some memory, point sp to that memory, call > > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > > > Actually implementing this would be distinctly nontrivial, and would > > > almost certainly need some degree of kernel help to avoid an explosion > > > when a signal gets delivered while we have host_state.sp loaded into > > > the actual SP register. Maybe rseq could help with this? > > > > > > The ISA here is IMO not well thought through. > > > > Maybe I'm mistaken about some fundamentals here, but my understanding > > of SGX is that the whole point is that the host application and the > > code running in the enclave are mutually adversarial towards one > > another. Do any or all of the proposed protocols here account for this > > and fully protect the host application from malicious code in the > > enclave? It seems that having control over the register file on exit > > from the enclave is fundamentally problematic but I assume there must > > be some way I'm missing that this is fixed up. > > SGX provides protections for the enclave but not the other way around. > The kernel has all of its normal non-SGX protections in place, but the > enclave can certainly wreak havoc on its userspace process. The basic > design idea is that the enclave is a specialized .so that gets extra > security protections but is still effectively part of the overall > application, e.g. it has full access to its host userspace process' > virtual memory. In that case it seems like the only way to use SGX that's not a gaping security hole is to run the SGX enclave in its own fully-seccomp (or equivalent) process, with no host application in the same address space. Since the host application can't see the contents of the enclave to make any determination of whether it's safe to run, running it in the same address space only makes sense if the cpu provides protection against unwanted accesses to the host's memory from the enclave -- and according to you, it doesn't. Rich
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 03:26:16PM -0800, Sean Christopherson wrote: > On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > > I almost feel like the right solution is to call into SGX on its own > > > > > private stack or maybe even its own private address space. > > > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > > > > enclave like its own "thread" with its own stack and its own set of > > > > registers and context? That seems like a much more workable model than > > > > trying to weave it together with the EENTER context. > > > > > > So maybe the API should be, roughly > > > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > > host_state *state); > > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > > > where host_state is something like: > > > > > > struct host_state { > > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > > }; > > > > > > and the values in host_state explicitly have nothing to do with the > > > actual host registers. So, if you want to use the outcall mechanism, > > > you'd allocate some memory, point sp to that memory, call > > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > > > Actually implementing this would be distinctly nontrivial, and would > > > almost certainly need some degree of kernel help to avoid an explosion > > > when a signal gets delivered while we have host_state.sp loaded into > > > the actual SP register. Maybe rseq could help with this? > > > > > > The ISA here is IMO not well thought through. > > > > Maybe I'm mistaken about some fundamentals here, but my understanding > > of SGX is that the whole point is that the host application and the > > code running in the enclave are mutually adversarial towards one > > another. Do any or all of the proposed protocols here account for this > > and fully protect the host application from malicious code in the > > enclave? It seems that having control over the register file on exit > > from the enclave is fundamentally problematic but I assume there must > > be some way I'm missing that this is fixed up. > > SGX provides protections for the enclave but not the other way around. > The kernel has all of its normal non-SGX protections in place, but the > enclave can certainly wreak havoc on its userspace process. The basic > design idea is that the enclave is a specialized .so that gets extra > security protections but is still effectively part of the overall > application, e.g. it has full access to its host userspace process' > virtual memory. In that case it seems like the only way to use SGX that's not a gaping security hole is to run the SGX enclave in its own fully-seccomp (or equivalent) process, with no host application in the same address space. Since the host application can't see the contents of the enclave to make any determination of whether it's safe to run, running it in the same address space only makes sense if the cpu provides protection against unwanted accesses to the host's memory from the enclave -- and according to you, it doesn't. Rich
Re: RFC: userspace exception fixups
On 11/7/18 11:01 AM, Sean Christopherson wrote: > Going off comments in similar code related to UMIP, we'd need to figure > out how to handle protection keys. There are two options: 1. Don't depend on the userspace mapping. Do get_user_pages() to find the instruction in the kernel direct map, and use that. 2. Do a WRPKRU that allows read access, do the read, then put PKRU back. This is a pain because of preemption and all that jazz. Right now, we just let the prefetch instruction detection fail if you mark it unreadable with pkeys. Tough cookies, basically. But, that's just the kernel being nice, but you need it for functionality, so it's tougher.
Re: RFC: userspace exception fixups
On 11/7/18 11:01 AM, Sean Christopherson wrote: > Going off comments in similar code related to UMIP, we'd need to figure > out how to handle protection keys. There are two options: 1. Don't depend on the userspace mapping. Do get_user_pages() to find the instruction in the kernel direct map, and use that. 2. Do a WRPKRU that allows read access, do the read, then put PKRU back. This is a pain because of preemption and all that jazz. Right now, we just let the prefetch instruction detection fail if you mark it unreadable with pkeys. Tough cookies, basically. But, that's just the kernel being nice, but you need it for functionality, so it's tougher.
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 07:34:52AM -0800, Sean Christopherson wrote: > On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > > > wrote: > > > > > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on > > > > > ENCLU > > > > > with a specific (ignored) prefix pattern? I.e. effectively make the > > > > > magic > > > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU > > > > > isn't > > > > > that far off the architecture, e.g. EENTER stuffs RCX with the next > > > > > RIP so > > > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > > > instruction, not the EENTER instruction, so if we skip it we just end > > > > up in lala land. > > > > > > Userspace would obviously need to be aware of the fixup behavior, but > > > it actually works out fairly nicely to have a separate path for ERESUME > > > fixup since a fault on EENTER is generally fatal, whereas as a fault on > > > ERESUME might be recoverable. > > > > > > > Hmm. > > > > > > > > do_eenter: > > > mov tcs, %rbx > > > lea async_exit, %rcx > > > mov $EENTER, %rax > > > ENCLU > > > > Or SOME_SILLY_PREFIX ENCLU? > > Yeah, forgot to include that. > > > > > > > /* > > > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > > > * fault indicator, e.g. -EFAULT. > > > */ > > > eexit_or_eenter_fault: > > > ret > > > > But userspace wants to know whether it was a fault or not. So I think > > we either need two landing pads or we need to hijack a flag bit (are > > there any known-zeroed flag bits after EEXIT?) to say whether it was a > > fault. And, if it was a fault, we should give the vector, the > > sanitized error code, and possibly CR2. > > As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we > can use RAX to indicate a fault. That's what I was trying to imply with > EFAULT. Here's the reg stuffing I use for the POC: > > regs->ax = EFAULT; > regs->di = trapnr; > regs->si = error_code; > regs->dx = address; > > > Well-known RAX values also means the kernel fault handlers only need to > look for SOME_SILLY_PREFIX ENCLU if RAX==2 || RAX==3, i.e. the fault > occurred on EENTER or in an enclave (RAX is set to ERESUME's leaf as > part of the asynchronous enlcave exit flow). POC kernel code, 64-bit only. Limiting this to 64-bit isn't necessary, but it makes the code prettier and allows using REX as the magic prefix. I like the idea of using REX because it seems least likely to be repurposed for yet another new feature. I have no idea if 64-bit only will fly with the SDK folks. Going off comments in similar code related to UMIP, we'd need to figure out how to handle protection keys. /* REX with all bits set, ignored by ENCLU. */ #define SGX_DO_ENCLU_FIXUP 0x4F #define SGX_ENCLU_OPCODE0 0x0F #define SGX_ENCLU_OPCODE1 0x01 #define SGX_ENCLU_OPCODE2 0xD7 /* ENCLU is a three-byte opcode, plus one byte for the magic prefix. */ #define SGX_ENCLU_FIXUP_INSN_LEN4 static int sgx_detect_enclu(struct pt_regs *regs) { unsigned char buf[SGX_ENCLU_FIXUP_INSN_LEN]; /* Look for EENTER or ERESUME in RAX, 64-bit mode only. */ if (!regs || (regs->ax != 2 && regs->ax != 3) || !user_64bit_mode(regs)) return 0; if (copy_from_user(buf, (void __user *)(regs->ip), sizeof(buf))) return 0; if (buf[0] == SGX_DO_ENCLU_FIXUP && buf[1] == SGX_ENCLU_OPCODE0 && buf[2] == SGX_ENCLU_OPCODE1 && buf[3] == SGX_ENCLU_OPCODE2) return SGX_ENCLU_FIXUP_INSN_LEN; return 0; } bool sgx_fixup_enclu_fault(struct pt_regs *regs, int trapnr, unsigned long error_code, unsigned long address) { int insn_len; insn_len = sgx_detect_enclu(regs); if (!insn_len) return false; regs->ip += insn_len; regs->ax = EFAULT; regs->di = trapnr; regs->si = error_code; regs->dx = address; return true; }
Re: RFC: userspace exception fixups
On Wed, Nov 07, 2018 at 07:34:52AM -0800, Sean Christopherson wrote: > On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > > > wrote: > > > > > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on > > > > > ENCLU > > > > > with a specific (ignored) prefix pattern? I.e. effectively make the > > > > > magic > > > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU > > > > > isn't > > > > > that far off the architecture, e.g. EENTER stuffs RCX with the next > > > > > RIP so > > > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > > > instruction, not the EENTER instruction, so if we skip it we just end > > > > up in lala land. > > > > > > Userspace would obviously need to be aware of the fixup behavior, but > > > it actually works out fairly nicely to have a separate path for ERESUME > > > fixup since a fault on EENTER is generally fatal, whereas as a fault on > > > ERESUME might be recoverable. > > > > > > > Hmm. > > > > > > > > do_eenter: > > > mov tcs, %rbx > > > lea async_exit, %rcx > > > mov $EENTER, %rax > > > ENCLU > > > > Or SOME_SILLY_PREFIX ENCLU? > > Yeah, forgot to include that. > > > > > > > /* > > > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > > > * fault indicator, e.g. -EFAULT. > > > */ > > > eexit_or_eenter_fault: > > > ret > > > > But userspace wants to know whether it was a fault or not. So I think > > we either need two landing pads or we need to hijack a flag bit (are > > there any known-zeroed flag bits after EEXIT?) to say whether it was a > > fault. And, if it was a fault, we should give the vector, the > > sanitized error code, and possibly CR2. > > As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we > can use RAX to indicate a fault. That's what I was trying to imply with > EFAULT. Here's the reg stuffing I use for the POC: > > regs->ax = EFAULT; > regs->di = trapnr; > regs->si = error_code; > regs->dx = address; > > > Well-known RAX values also means the kernel fault handlers only need to > look for SOME_SILLY_PREFIX ENCLU if RAX==2 || RAX==3, i.e. the fault > occurred on EENTER or in an enclave (RAX is set to ERESUME's leaf as > part of the asynchronous enlcave exit flow). POC kernel code, 64-bit only. Limiting this to 64-bit isn't necessary, but it makes the code prettier and allows using REX as the magic prefix. I like the idea of using REX because it seems least likely to be repurposed for yet another new feature. I have no idea if 64-bit only will fly with the SDK folks. Going off comments in similar code related to UMIP, we'd need to figure out how to handle protection keys. /* REX with all bits set, ignored by ENCLU. */ #define SGX_DO_ENCLU_FIXUP 0x4F #define SGX_ENCLU_OPCODE0 0x0F #define SGX_ENCLU_OPCODE1 0x01 #define SGX_ENCLU_OPCODE2 0xD7 /* ENCLU is a three-byte opcode, plus one byte for the magic prefix. */ #define SGX_ENCLU_FIXUP_INSN_LEN4 static int sgx_detect_enclu(struct pt_regs *regs) { unsigned char buf[SGX_ENCLU_FIXUP_INSN_LEN]; /* Look for EENTER or ERESUME in RAX, 64-bit mode only. */ if (!regs || (regs->ax != 2 && regs->ax != 3) || !user_64bit_mode(regs)) return 0; if (copy_from_user(buf, (void __user *)(regs->ip), sizeof(buf))) return 0; if (buf[0] == SGX_DO_ENCLU_FIXUP && buf[1] == SGX_ENCLU_OPCODE0 && buf[2] == SGX_ENCLU_OPCODE1 && buf[3] == SGX_ENCLU_OPCODE2) return SGX_ENCLU_FIXUP_INSN_LEN; return 0; } bool sgx_fixup_enclu_fault(struct pt_regs *regs, int trapnr, unsigned long error_code, unsigned long address) { int insn_len; insn_len = sgx_detect_enclu(regs); if (!insn_len) return false; regs->ip += insn_len; regs->ax = EFAULT; regs->di = trapnr; regs->si = error_code; regs->dx = address; return true; }
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson > wrote: > > > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > > wrote: > > > > > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > > > > >> wrote: > > > > > >> > > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It > > > > > >> seems > > > > > >> like the *CPU* could give a big hint, but I don't see where there > > > > > >> is > > > > > >> any architectural indication of why the AEX code got called or any > > > > > >> obvious way for the user code to know whether the exit was fixed > > > > > >> up by > > > > > >> the kernel? > > > > > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but > > > > > > that's > > > > > > bit misleading because its signal handler may muck with the > > > > > > context's > > > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > > > > > On an event/exception from within an enclave, the event is > > > > > > immediately > > > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > > > In other words, jamming CPU state is essentially a bunch of > > > > > > vectoring > > > > > > ucode preamble, but from software's perspective it's a normal event > > > > > > that happens to point at the AEP instead of somewhere in the > > > > > > enclave. > > > > > > And because the signals the SDK cares about are all synchronous, the > > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault > > > > > > logic > > > > > > resides in its signal handler. IRQs and whatnot simply trampoline > > > > > > back > > > > > > into the enclave. > > > > > > > > > > > > Userspace can do something funky instead of ERESUME, but only > > > > > > *after* > > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > > > case, after the trap handler has run. > > > > > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > > > from doing stupid things? > > > > > > > > > > My general feeling is that userspace should be allowed to do > > > > > apparently > > > > > stupid things. For example, as far as the kernel is concerned, Wine > > > > > and > > > > > DOSEMU are just user programs that do stupid things. Linux generally > > > > > tries > > > > > to provide a reasonably complete view of architectural behavior. This > > > > > is > > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE > > > > > May > > > > > cause very odd behavior indeed. So magic fixups that do > > > > > non-architectural > > > > > things are not so great. > > > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on > > > > ENCLU > > > > with a specific (ignored) prefix pattern? I.e. effectively make the > > > > magic > > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP > > > > so > > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > > instruction, not the EENTER instruction, so if we skip it we just end > > > up in lala land. > > > > Userspace would obviously need to be aware of the fixup behavior, but > > it actually works out fairly nicely to have a separate path for ERESUME > > fixup since a fault on EENTER is generally fatal, whereas as a fault on > > ERESUME might be recoverable. > > > > Hmm. > > > > > do_eenter: > > mov tcs, %rbx > > lea async_exit, %rcx > > mov $EENTER, %rax > > ENCLU > > Or SOME_SILLY_PREFIX ENCLU? Yeah, forgot to include that. > > > > /* > > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > > * fault indicator, e.g. -EFAULT. > > */ > > eexit_or_eenter_fault: > > ret > > But userspace wants to know whether it was a fault or not. So I think > we either need two landing pads or we need to hijack a flag bit (are > there any known-zeroed flag bits after EEXIT?) to say whether it was a > fault. And, if it was a fault, we should give the vector, the > sanitized error code, and possibly CR2. As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we can use RAX to indicate a fault. That's what I was trying to imply with EFAULT. Here's the reg stuffing I use for the POC: regs->ax = EFAULT; regs->di = trapnr; regs->si = error_code; regs->dx = address; Well-known RAX values also means the kernel fault handlers only need
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson > wrote: > > > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > > wrote: > > > > > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > > > > >> wrote: > > > > > >> > > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It > > > > > >> seems > > > > > >> like the *CPU* could give a big hint, but I don't see where there > > > > > >> is > > > > > >> any architectural indication of why the AEX code got called or any > > > > > >> obvious way for the user code to know whether the exit was fixed > > > > > >> up by > > > > > >> the kernel? > > > > > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but > > > > > > that's > > > > > > bit misleading because its signal handler may muck with the > > > > > > context's > > > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > > > > > On an event/exception from within an enclave, the event is > > > > > > immediately > > > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > > > In other words, jamming CPU state is essentially a bunch of > > > > > > vectoring > > > > > > ucode preamble, but from software's perspective it's a normal event > > > > > > that happens to point at the AEP instead of somewhere in the > > > > > > enclave. > > > > > > And because the signals the SDK cares about are all synchronous, the > > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault > > > > > > logic > > > > > > resides in its signal handler. IRQs and whatnot simply trampoline > > > > > > back > > > > > > into the enclave. > > > > > > > > > > > > Userspace can do something funky instead of ERESUME, but only > > > > > > *after* > > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > > > case, after the trap handler has run. > > > > > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > > > from doing stupid things? > > > > > > > > > > My general feeling is that userspace should be allowed to do > > > > > apparently > > > > > stupid things. For example, as far as the kernel is concerned, Wine > > > > > and > > > > > DOSEMU are just user programs that do stupid things. Linux generally > > > > > tries > > > > > to provide a reasonably complete view of architectural behavior. This > > > > > is > > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE > > > > > May > > > > > cause very odd behavior indeed. So magic fixups that do > > > > > non-architectural > > > > > things are not so great. > > > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on > > > > ENCLU > > > > with a specific (ignored) prefix pattern? I.e. effectively make the > > > > magic > > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP > > > > so > > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > > instruction, not the EENTER instruction, so if we skip it we just end > > > up in lala land. > > > > Userspace would obviously need to be aware of the fixup behavior, but > > it actually works out fairly nicely to have a separate path for ERESUME > > fixup since a fault on EENTER is generally fatal, whereas as a fault on > > ERESUME might be recoverable. > > > > Hmm. > > > > > do_eenter: > > mov tcs, %rbx > > lea async_exit, %rcx > > mov $EENTER, %rax > > ENCLU > > Or SOME_SILLY_PREFIX ENCLU? Yeah, forgot to include that. > > > > /* > > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > > * fault indicator, e.g. -EFAULT. > > */ > > eexit_or_eenter_fault: > > ret > > But userspace wants to know whether it was a fault or not. So I think > we either need two landing pads or we need to hijack a flag bit (are > there any known-zeroed flag bits after EEXIT?) to say whether it was a > fault. And, if it was a fault, we should give the vector, the > sanitized error code, and possibly CR2. As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we can use RAX to indicate a fault. That's what I was trying to imply with EFAULT. Here's the reg stuffing I use for the POC: regs->ax = EFAULT; regs->di = trapnr; regs->si = error_code; regs->dx = address; Well-known RAX values also means the kernel fault handlers only need
Re: RFC: userspace exception fixups
On 2018-11-07 02:17, Andy Lutomirski wrote: On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson wrote: /* * EEXIT or EENTER faulted. In the latter case, %RAX already holds some * fault indicator, e.g. -EFAULT. */ eexit_or_eenter_fault: ret But userspace wants to know whether it was a fault or not. So I think we either need two landing pads or we need to hijack a flag bit (are there any known-zeroed flag bits after EEXIT?) to say whether it was a fault. And, if it was a fault, we should give the vector, the sanitized error code, and possibly CR2. On AEX, %rax will contain ENCLU_LEAF_ERESUME (0x3). On EEXIT, %rax will contain ENCLU_LEAF_EEXIT (0x4). -- Jethro Beekman | Fortanix smime.p7s Description: S/MIME Cryptographic Signature
Re: RFC: userspace exception fixups
On 2018-11-07 02:17, Andy Lutomirski wrote: On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson wrote: /* * EEXIT or EENTER faulted. In the latter case, %RAX already holds some * fault indicator, e.g. -EFAULT. */ eexit_or_eenter_fault: ret But userspace wants to know whether it was a fault or not. So I think we either need two landing pads or we need to hijack a flag bit (are there any known-zeroed flag bits after EEXIT?) to say whether it was a fault. And, if it was a fault, we should give the vector, the sanitized error code, and possibly CR2. On AEX, %rax will contain ENCLU_LEAF_ERESUME (0x3). On EEXIT, %rax will contain ENCLU_LEAF_EEXIT (0x4). -- Jethro Beekman | Fortanix smime.p7s Description: S/MIME Cryptographic Signature
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > > > >> wrote: > > > > >> > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It > > > > >> seems > > > > >> like the *CPU* could give a big hint, but I don't see where there is > > > > >> any architectural indication of why the AEX code got called or any > > > > >> obvious way for the user code to know whether the exit was fixed up > > > > >> by > > > > >> the kernel? > > > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > > > bit misleading because its signal handler may muck with the context's > > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > > > On an event/exception from within an enclave, the event is immediately > > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > > > ucode preamble, but from software's perspective it's a normal event > > > > > that happens to point at the AEP instead of somewhere in the enclave. > > > > > And because the signals the SDK cares about are all synchronous, the > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault > > > > > logic > > > > > resides in its signal handler. IRQs and whatnot simply trampoline > > > > > back > > > > > into the enclave. > > > > > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > > case, after the trap handler has run. > > > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > > from doing stupid things? > > > > > > > > My general feeling is that userspace should be allowed to do apparently > > > > stupid things. For example, as far as the kernel is concerned, Wine and > > > > DOSEMU are just user programs that do stupid things. Linux generally > > > > tries > > > > to provide a reasonably complete view of architectural behavior. This is > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE > > > > May > > > > cause very odd behavior indeed. So magic fixups that do > > > > non-architectural > > > > things are not so great. > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > instruction, not the EENTER instruction, so if we skip it we just end > > up in lala land. > > Userspace would obviously need to be aware of the fixup behavior, but > it actually works out fairly nicely to have a separate path for ERESUME > fixup since a fault on EENTER is generally fatal, whereas as a fault on > ERESUME might be recoverable. > Hmm. > > do_eenter: > mov tcs, %rbx > lea async_exit, %rcx > mov $EENTER, %rax > ENCLU Or SOME_SILLY_PREFIX ENCLU? > > /* > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > * fault indicator, e.g. -EFAULT. > */ > eexit_or_eenter_fault: > ret But userspace wants to know whether it was a fault or not. So I think we either need two landing pads or we need to hijack a flag bit (are there any known-zeroed flag bits after EEXIT?) to say whether it was a fault. And, if it was a fault, we should give the vector, the sanitized error code, and possibly CR2. > > async_exit: > ENCLU Same prefix here, right? > > fixup_handler: > This whole thing is a bit odd, but not necessarily a terrible idea. > > > How averse would everyone be to making enclave entry be a syscall? > > The user code would do sys_sgx_enter_enclave(), and the kernel would > > stash away the register state (vm86()-style), point RIP to the vDSO's > > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and > > SYSRET. The trap handlers would understand what's going on and > > restore register state accordingly. > > Wouldn't that blast away any stack changes made by the enclave? Yes, but I was imagining that it would stash the registers into the struct host_state thing I made up :)
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > > > >> wrote: > > > > >> > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It > > > > >> seems > > > > >> like the *CPU* could give a big hint, but I don't see where there is > > > > >> any architectural indication of why the AEX code got called or any > > > > >> obvious way for the user code to know whether the exit was fixed up > > > > >> by > > > > >> the kernel? > > > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > > > bit misleading because its signal handler may muck with the context's > > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > > > On an event/exception from within an enclave, the event is immediately > > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > > > ucode preamble, but from software's perspective it's a normal event > > > > > that happens to point at the AEP instead of somewhere in the enclave. > > > > > And because the signals the SDK cares about are all synchronous, the > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault > > > > > logic > > > > > resides in its signal handler. IRQs and whatnot simply trampoline > > > > > back > > > > > into the enclave. > > > > > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > > case, after the trap handler has run. > > > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > > from doing stupid things? > > > > > > > > My general feeling is that userspace should be allowed to do apparently > > > > stupid things. For example, as far as the kernel is concerned, Wine and > > > > DOSEMU are just user programs that do stupid things. Linux generally > > > > tries > > > > to provide a reasonably complete view of architectural behavior. This is > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE > > > > May > > > > cause very odd behavior indeed. So magic fixups that do > > > > non-architectural > > > > things are not so great. > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > instruction, not the EENTER instruction, so if we skip it we just end > > up in lala land. > > Userspace would obviously need to be aware of the fixup behavior, but > it actually works out fairly nicely to have a separate path for ERESUME > fixup since a fault on EENTER is generally fatal, whereas as a fault on > ERESUME might be recoverable. > Hmm. > > do_eenter: > mov tcs, %rbx > lea async_exit, %rcx > mov $EENTER, %rax > ENCLU Or SOME_SILLY_PREFIX ENCLU? > > /* > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > * fault indicator, e.g. -EFAULT. > */ > eexit_or_eenter_fault: > ret But userspace wants to know whether it was a fault or not. So I think we either need two landing pads or we need to hijack a flag bit (are there any known-zeroed flag bits after EEXIT?) to say whether it was a fault. And, if it was a fault, we should give the vector, the sanitized error code, and possibly CR2. > > async_exit: > ENCLU Same prefix here, right? > > fixup_handler: > This whole thing is a bit odd, but not necessarily a terrible idea. > > > How averse would everyone be to making enclave entry be a syscall? > > The user code would do sys_sgx_enter_enclave(), and the kernel would > > stash away the register state (vm86()-style), point RIP to the vDSO's > > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and > > SYSRET. The trap handlers would understand what's going on and > > restore register state accordingly. > > Wouldn't that blast away any stack changes made by the enclave? Yes, but I was imagining that it would stash the registers into the struct host_state thing I made up :)
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > wrote: > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > > >> wrote: > > > >> > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > > > >> like the *CPU* could give a big hint, but I don't see where there is > > > >> any architectural indication of why the AEX code got called or any > > > >> obvious way for the user code to know whether the exit was fixed up by > > > >> the kernel? > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > > bit misleading because its signal handler may muck with the context's > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > On an event/exception from within an enclave, the event is immediately > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > > ucode preamble, but from software's perspective it's a normal event > > > > that happens to point at the AEP instead of somewhere in the enclave. > > > > And because the signals the SDK cares about are all synchronous, the > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > > > resides in its signal handler. IRQs and whatnot simply trampoline back > > > > into the enclave. > > > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > case, after the trap handler has run. > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > from doing stupid things? > > > > > > My general feeling is that userspace should be allowed to do apparently > > > stupid things. For example, as far as the kernel is concerned, Wine and > > > DOSEMU are just user programs that do stupid things. Linux generally tries > > > to provide a reasonably complete view of architectural behavior. This is > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > > > cause very odd behavior indeed. So magic fixups that do non-architectural > > > things are not so great. > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > that the enclave can EEXIT to immediately after the EENTER location. > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > instruction, not the EENTER instruction, so if we skip it we just end > up in lala land. Userspace would obviously need to be aware of the fixup behavior, but it actually works out fairly nicely to have a separate path for ERESUME fixup since a fault on EENTER is generally fatal, whereas as a fault on ERESUME might be recoverable. do_eenter: mov tcs, %rbx lea async_exit, %rcx mov $EENTER, %rax ENCLU /* * EEXIT or EENTER faulted. In the latter case, %RAX already holds some * fault indicator, e.g. -EFAULT. */ eexit_or_eenter_fault: ret async_exit: ENCLU fixup_handler: > How averse would everyone be to making enclave entry be a syscall? > The user code would do sys_sgx_enter_enclave(), and the kernel would > stash away the register state (vm86()-style), point RIP to the vDSO's > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and > SYSRET. The trap handlers would understand what's going on and > restore register state accordingly. Wouldn't that blast away any stack changes made by the enclave?
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > wrote: > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > > >> wrote: > > > >> > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > > > >> like the *CPU* could give a big hint, but I don't see where there is > > > >> any architectural indication of why the AEX code got called or any > > > >> obvious way for the user code to know whether the exit was fixed up by > > > >> the kernel? > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > > bit misleading because its signal handler may muck with the context's > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > On an event/exception from within an enclave, the event is immediately > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > > ucode preamble, but from software's perspective it's a normal event > > > > that happens to point at the AEP instead of somewhere in the enclave. > > > > And because the signals the SDK cares about are all synchronous, the > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > > > resides in its signal handler. IRQs and whatnot simply trampoline back > > > > into the enclave. > > > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > case, after the trap handler has run. > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > from doing stupid things? > > > > > > My general feeling is that userspace should be allowed to do apparently > > > stupid things. For example, as far as the kernel is concerned, Wine and > > > DOSEMU are just user programs that do stupid things. Linux generally tries > > > to provide a reasonably complete view of architectural behavior. This is > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > > > cause very odd behavior indeed. So magic fixups that do non-architectural > > > things are not so great. > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > that the enclave can EEXIT to immediately after the EENTER location. > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > instruction, not the EENTER instruction, so if we skip it we just end > up in lala land. Userspace would obviously need to be aware of the fixup behavior, but it actually works out fairly nicely to have a separate path for ERESUME fixup since a fault on EENTER is generally fatal, whereas as a fault on ERESUME might be recoverable. do_eenter: mov tcs, %rbx lea async_exit, %rcx mov $EENTER, %rax ENCLU /* * EEXIT or EENTER faulted. In the latter case, %RAX already holds some * fault indicator, e.g. -EFAULT. */ eexit_or_eenter_fault: ret async_exit: ENCLU fixup_handler: > How averse would everyone be to making enclave entry be a syscall? > The user code would do sys_sgx_enter_enclave(), and the kernel would > stash away the register state (vm86()-style), point RIP to the vDSO's > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and > SYSRET. The trap handlers would understand what's going on and > restore register state accordingly. Wouldn't that blast away any stack changes made by the enclave?
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > >> wrote: > > >> > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > >> Sean, how does the current SDK AEX handler decide whether to do > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > > >> like the *CPU* could give a big hint, but I don't see where there is > > >> any architectural indication of why the AEX code got called or any > > >> obvious way for the user code to know whether the exit was fixed up by > > >> the kernel? > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > bit misleading because its signal handler may muck with the context's > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > On an event/exception from within an enclave, the event is immediately > > > delivered after loading synthetic state and changing RIP to the AEP. > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > ucode preamble, but from software's perspective it's a normal event > > > that happens to point at the AEP instead of somewhere in the enclave. > > > And because the signals the SDK cares about are all synchronous, the > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > > resides in its signal handler. IRQs and whatnot simply trampoline back > > > into the enclave. > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > case, after the trap handler has run. > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > from doing stupid things? > > > > My general feeling is that userspace should be allowed to do apparently > > stupid things. For example, as far as the kernel is concerned, Wine and > > DOSEMU are just user programs that do stupid things. Linux generally tries > > to provide a reasonably complete view of architectural behavior. This is > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > > cause very odd behavior indeed. So magic fixups that do non-architectural > > things are not so great. > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > with a specific (ignored) prefix pattern? I.e. effectively make the magic > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > that the enclave can EEXIT to immediately after the EENTER location. > How does that even work, though? On an AEX, RIP points to the ERESUME instruction, not the EENTER instruction, so if we skip it we just end up in lala land. How averse would everyone be to making enclave entry be a syscall? The user code would do sys_sgx_enter_enclave(), and the kernel would stash away the register state (vm86()-style), point RIP to the vDSO's ENCLU instruction, point RCX to another vDSO ENCLU instruction, and SYSRET. The trap handlers would understand what's going on and restore register state accordingly. On non-Meltdown hardware (hah!) this would even be fairly fast. --Andy
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > > >> wrote: > > >> > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > >> Sean, how does the current SDK AEX handler decide whether to do > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > > >> like the *CPU* could give a big hint, but I don't see where there is > > >> any architectural indication of why the AEX code got called or any > > >> obvious way for the user code to know whether the exit was fixed up by > > >> the kernel? > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > bit misleading because its signal handler may muck with the context's > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > On an event/exception from within an enclave, the event is immediately > > > delivered after loading synthetic state and changing RIP to the AEP. > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > ucode preamble, but from software's perspective it's a normal event > > > that happens to point at the AEP instead of somewhere in the enclave. > > > And because the signals the SDK cares about are all synchronous, the > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > > resides in its signal handler. IRQs and whatnot simply trampoline back > > > into the enclave. > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > case, after the trap handler has run. > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > from doing stupid things? > > > > My general feeling is that userspace should be allowed to do apparently > > stupid things. For example, as far as the kernel is concerned, Wine and > > DOSEMU are just user programs that do stupid things. Linux generally tries > > to provide a reasonably complete view of architectural behavior. This is > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > > cause very odd behavior indeed. So magic fixups that do non-architectural > > things are not so great. > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > with a specific (ignored) prefix pattern? I.e. effectively make the magic > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > that the enclave can EEXIT to immediately after the EENTER location. > How does that even work, though? On an AEX, RIP points to the ERESUME instruction, not the EENTER instruction, so if we skip it we just end up in lala land. How averse would everyone be to making enclave entry be a syscall? The user code would do sys_sgx_enter_enclave(), and the kernel would stash away the register state (vm86()-style), point RIP to the vDSO's ENCLU instruction, point RCX to another vDSO ENCLU instruction, and SYSRET. The trap handlers would understand what's going on and restore register state accordingly. On non-Meltdown hardware (hah!) this would even be fairly fast. --Andy
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > >> wrote: > >> > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > >> Sean, how does the current SDK AEX handler decide whether to do > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > >> like the *CPU* could give a big hint, but I don't see where there is > >> any architectural indication of why the AEX code got called or any > >> obvious way for the user code to know whether the exit was fixed up by > >> the kernel? > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > bit misleading because its signal handler may muck with the context's > > RIP, e.g. to abort the enclave on a fatal fault. > > > > On an event/exception from within an enclave, the event is immediately > > delivered after loading synthetic state and changing RIP to the AEP. > > In other words, jamming CPU state is essentially a bunch of vectoring > > ucode preamble, but from software's perspective it's a normal event > > that happens to point at the AEP instead of somewhere in the enclave. > > And because the signals the SDK cares about are all synchronous, the > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > resides in its signal handler. IRQs and whatnot simply trampoline back > > into the enclave. > > > > Userspace can do something funky instead of ERESUME, but only *after* > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > case, after the trap handler has run. > > > > Jumping back a bit, how much do we care about preventing userspace > > from doing stupid things? > > My general feeling is that userspace should be allowed to do apparently > stupid things. For example, as far as the kernel is concerned, Wine and > DOSEMU are just user programs that do stupid things. Linux generally tries > to provide a reasonably complete view of architectural behavior. This is > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > cause very odd behavior indeed. So magic fixups that do non-architectural > things are not so great. Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU with a specific (ignored) prefix pattern? I.e. effectively make the magic fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so that the enclave can EEXIT to immediately after the EENTER location. > (How does the Windows case work? If there’s an exception after the untrusted > stack allocation and before EEXIT and SEH tries to handle it, how does the > unwinder figure out where to start?) No clue, I'll ask and report back.
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson > >> wrote: > >> > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > >> Sean, how does the current SDK AEX handler decide whether to do > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > >> like the *CPU* could give a big hint, but I don't see where there is > >> any architectural indication of why the AEX code got called or any > >> obvious way for the user code to know whether the exit was fixed up by > >> the kernel? > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > bit misleading because its signal handler may muck with the context's > > RIP, e.g. to abort the enclave on a fatal fault. > > > > On an event/exception from within an enclave, the event is immediately > > delivered after loading synthetic state and changing RIP to the AEP. > > In other words, jamming CPU state is essentially a bunch of vectoring > > ucode preamble, but from software's perspective it's a normal event > > that happens to point at the AEP instead of somewhere in the enclave. > > And because the signals the SDK cares about are all synchronous, the > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > resides in its signal handler. IRQs and whatnot simply trampoline back > > into the enclave. > > > > Userspace can do something funky instead of ERESUME, but only *after* > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > case, after the trap handler has run. > > > > Jumping back a bit, how much do we care about preventing userspace > > from doing stupid things? > > My general feeling is that userspace should be allowed to do apparently > stupid things. For example, as far as the kernel is concerned, Wine and > DOSEMU are just user programs that do stupid things. Linux generally tries > to provide a reasonably complete view of architectural behavior. This is > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > cause very odd behavior indeed. So magic fixups that do non-architectural > things are not so great. Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU with a specific (ignored) prefix pattern? I.e. effectively make the magic fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so that the enclave can EEXIT to immediately after the EENTER location. > (How does the Windows case work? If there’s an exception after the untrusted > stack allocation and before EEXIT and SEH tries to handle it, how does the > unwinder figure out where to start?) No clue, I'll ask and report back.
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > I almost feel like the right solution is to call into SGX on its own > > > > private stack or maybe even its own private address space. > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > > > enclave like its own "thread" with its own stack and its own set of > > > registers and context? That seems like a much more workable model than > > > trying to weave it together with the EENTER context. > > > > So maybe the API should be, roughly > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > host_state *state); > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > where host_state is something like: > > > > struct host_state { > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > }; > > > > and the values in host_state explicitly have nothing to do with the > > actual host registers. So, if you want to use the outcall mechanism, > > you'd allocate some memory, point sp to that memory, call > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > Actually implementing this would be distinctly nontrivial, and would > > almost certainly need some degree of kernel help to avoid an explosion > > when a signal gets delivered while we have host_state.sp loaded into > > the actual SP register. Maybe rseq could help with this? > > > > The ISA here is IMO not well thought through. > > Maybe I'm mistaken about some fundamentals here, but my understanding > of SGX is that the whole point is that the host application and the > code running in the enclave are mutually adversarial towards one > another. Do any or all of the proposed protocols here account for this > and fully protect the host application from malicious code in the > enclave? It seems that having control over the register file on exit > from the enclave is fundamentally problematic but I assume there must > be some way I'm missing that this is fixed up. SGX provides protections for the enclave but not the other way around. The kernel has all of its normal non-SGX protections in place, but the enclave can certainly wreak havoc on its userspace process. The basic design idea is that the enclave is a specialized .so that gets extra security protections but is still effectively part of the overall application, e.g. it has full access to its host userspace process' virtual memory.
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 06:17:30PM -0500, Rich Felker wrote: > On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > > I almost feel like the right solution is to call into SGX on its own > > > > private stack or maybe even its own private address space. > > > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > > > enclave like its own "thread" with its own stack and its own set of > > > registers and context? That seems like a much more workable model than > > > trying to weave it together with the EENTER context. > > > > So maybe the API should be, roughly > > > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > > host_state *state); > > sgx_exit_reason_t sgx_resume_enclave(same args); > > > > where host_state is something like: > > > > struct host_state { > > unsigned long bp, sp, ax, bx, cx, dx, si, di; > > }; > > > > and the values in host_state explicitly have nothing to do with the > > actual host registers. So, if you want to use the outcall mechanism, > > you'd allocate some memory, point sp to that memory, call > > sgx_enter_enclave(), and then read that memory to do the outcall. > > > > Actually implementing this would be distinctly nontrivial, and would > > almost certainly need some degree of kernel help to avoid an explosion > > when a signal gets delivered while we have host_state.sp loaded into > > the actual SP register. Maybe rseq could help with this? > > > > The ISA here is IMO not well thought through. > > Maybe I'm mistaken about some fundamentals here, but my understanding > of SGX is that the whole point is that the host application and the > code running in the enclave are mutually adversarial towards one > another. Do any or all of the proposed protocols here account for this > and fully protect the host application from malicious code in the > enclave? It seems that having control over the register file on exit > from the enclave is fundamentally problematic but I assume there must > be some way I'm missing that this is fixed up. SGX provides protections for the enclave but not the other way around. The kernel has all of its normal non-SGX protections in place, but the enclave can certainly wreak havoc on its userspace process. The basic design idea is that the enclave is a specialized .so that gets extra security protections but is still effectively part of the overall application, e.g. it has full access to its host userspace process' virtual memory.
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > I almost feel like the right solution is to call into SGX on its own > > > private stack or maybe even its own private address space. > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > > enclave like its own "thread" with its own stack and its own set of > > registers and context? That seems like a much more workable model than > > trying to weave it together with the EENTER context. > > So maybe the API should be, roughly > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > host_state *state); > sgx_exit_reason_t sgx_resume_enclave(same args); > > where host_state is something like: > > struct host_state { > unsigned long bp, sp, ax, bx, cx, dx, si, di; > }; > > and the values in host_state explicitly have nothing to do with the > actual host registers. So, if you want to use the outcall mechanism, > you'd allocate some memory, point sp to that memory, call > sgx_enter_enclave(), and then read that memory to do the outcall. > > Actually implementing this would be distinctly nontrivial, and would > almost certainly need some degree of kernel help to avoid an explosion > when a signal gets delivered while we have host_state.sp loaded into > the actual SP register. Maybe rseq could help with this? > > The ISA here is IMO not well thought through. Maybe I'm mistaken about some fundamentals here, but my understanding of SGX is that the whole point is that the host application and the code running in the enclave are mutually adversarial towards one another. Do any or all of the proposed protocols here account for this and fully protect the host application from malicious code in the enclave? It seems that having control over the register file on exit from the enclave is fundamentally problematic but I assume there must be some way I'm missing that this is fixed up. Rich
Re: RFC: userspace exception fixups
On Tue, Nov 06, 2018 at 11:02:11AM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > > I almost feel like the right solution is to call into SGX on its own > > > private stack or maybe even its own private address space. > > > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > > enclave like its own "thread" with its own stack and its own set of > > registers and context? That seems like a much more workable model than > > trying to weave it together with the EENTER context. > > So maybe the API should be, roughly > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > host_state *state); > sgx_exit_reason_t sgx_resume_enclave(same args); > > where host_state is something like: > > struct host_state { > unsigned long bp, sp, ax, bx, cx, dx, si, di; > }; > > and the values in host_state explicitly have nothing to do with the > actual host registers. So, if you want to use the outcall mechanism, > you'd allocate some memory, point sp to that memory, call > sgx_enter_enclave(), and then read that memory to do the outcall. > > Actually implementing this would be distinctly nontrivial, and would > almost certainly need some degree of kernel help to avoid an explosion > when a signal gets delivered while we have host_state.sp loaded into > the actual SP register. Maybe rseq could help with this? > > The ISA here is IMO not well thought through. Maybe I'm mistaken about some fundamentals here, but my understanding of SGX is that the whole point is that the host application and the code running in the enclave are mutually adversarial towards one another. Do any or all of the proposed protocols here account for this and fully protect the host application from malicious code in the enclave? It seems that having control over the register file on exit from the enclave is fundamentally problematic but I assume there must be some way I'm missing that this is fixed up. Rich
Re: RFC: userspace exception fixups
>> On Nov 6, 2018, at 1:59 PM, Sean Christopherson >> wrote: >> >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wrote: > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote: > True, but what if we have a nasty enclave that writes to memory just > below SP *before* decrementing SP? Yeah, that would be unfortunate. If an enclave did this (roughly): 1. EENTER 2. Hardware sets eenter_hwframe->sp = %sp 3. Enclave runs... wants to do out-call 4. Enclave sets up parameters: memcpy(_hwframe->sp[-offset], arg1, size); ... 5. Enclave sets eenter_hwframe->sp -= offset If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that was on the stack. The enclave could easily fix this by moving ->sp first. But, this is one of those "fun" parts of the ABI that I think we need to talk about. If we do this, we also basically require that the code which handles asynchronous exits must *not* write to the stack. That's not hard because it's typically just a single ERESUME instruction, but it *is* a requirement. >>> I was assuming that the async exit stuff was completely hidden by the API. >>> The AEP code would decide whether the exit got fixed up by the kernel >>> (which may or may not be easy to tell — can the >>> code even tell without kernel help whether it was, say, an IRQ vs #UD?) and >>> then either do ERESUME or cause sgx_enter_enclave() to return with an >>> appropriate return value. >> Sean, how does the current SDK AEX handler decide whether to do >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems >> like the *CPU* could give a big hint, but I don't see where there is >> any architectural indication of why the AEX code got called or any >> obvious way for the user code to know whether the exit was fixed up by >> the kernel? > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > bit misleading because its signal handler may muck with the context's > RIP, e.g. to abort the enclave on a fatal fault. > > On an event/exception from within an enclave, the event is immediately > delivered after loading synthetic state and changing RIP to the AEP. > In other words, jamming CPU state is essentially a bunch of vectoring > ucode preamble, but from software's perspective it's a normal event > that happens to point at the AEP instead of somewhere in the enclave. > And because the signals the SDK cares about are all synchronous, the > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > resides in its signal handler. IRQs and whatnot simply trampoline back > into the enclave. > > Userspace can do something funky instead of ERESUME, but only *after* > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > case, after the trap handler has run. > > Jumping back a bit, how much do we care about preventing userspace > from doing stupid things? My general feeling is that userspace should be allowed to do apparently stupid things. For example, as far as the kernel is concerned, Wine and DOSEMU are just user programs that do stupid things. Linux generally tries to provide a reasonably complete view of architectural behavior. This is in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May cause very odd behavior indeed. So magic fixups that do non-architectural things are not so great. The flip side, of course, is that the architecture is arguably inherently erratic here, and it’s apparently impossible to have an SGX library with sane semantics without some kernel assistance. So if we can make my straw man API work, perhaps with vDSO or rseq-like help, then the official SDK can use it, but less well behaved programs can still mostly work. (Modulo Linux’s non-support for EINITTOKEN, of course.) Thinking about it some more, the major sticking point may be finding the RIP and stack frame of EENTER in the AEP code or in its fixup. The vDSO can’t use TLS without serious hackery. We could massively abuse WRFSBASE, but that’s really ugly. (How does the Windows case work? If there’s an exception after the untrusted stack allocation and before EEXIT and SEH tries to handle it, how does the unwinder figure out where to start?) > I did a quick POC on the idea of hardcoding > fixup for the ENCLU opcode, and the basic idea checks out. The code > is fairly minimal and doesn't impact the core functionality of the SDK. > They'd need to redo their trap handling to move it from the signal > handler to inline, but their stack shenanigans won't be any more broken > than they already are.
Re: RFC: userspace exception fixups
>> On Nov 6, 2018, at 1:59 PM, Sean Christopherson >> wrote: >> >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wrote: > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote: > True, but what if we have a nasty enclave that writes to memory just > below SP *before* decrementing SP? Yeah, that would be unfortunate. If an enclave did this (roughly): 1. EENTER 2. Hardware sets eenter_hwframe->sp = %sp 3. Enclave runs... wants to do out-call 4. Enclave sets up parameters: memcpy(_hwframe->sp[-offset], arg1, size); ... 5. Enclave sets eenter_hwframe->sp -= offset If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that was on the stack. The enclave could easily fix this by moving ->sp first. But, this is one of those "fun" parts of the ABI that I think we need to talk about. If we do this, we also basically require that the code which handles asynchronous exits must *not* write to the stack. That's not hard because it's typically just a single ERESUME instruction, but it *is* a requirement. >>> I was assuming that the async exit stuff was completely hidden by the API. >>> The AEP code would decide whether the exit got fixed up by the kernel >>> (which may or may not be easy to tell — can the >>> code even tell without kernel help whether it was, say, an IRQ vs #UD?) and >>> then either do ERESUME or cause sgx_enter_enclave() to return with an >>> appropriate return value. >> Sean, how does the current SDK AEX handler decide whether to do >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems >> like the *CPU* could give a big hint, but I don't see where there is >> any architectural indication of why the AEX code got called or any >> obvious way for the user code to know whether the exit was fixed up by >> the kernel? > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > bit misleading because its signal handler may muck with the context's > RIP, e.g. to abort the enclave on a fatal fault. > > On an event/exception from within an enclave, the event is immediately > delivered after loading synthetic state and changing RIP to the AEP. > In other words, jamming CPU state is essentially a bunch of vectoring > ucode preamble, but from software's perspective it's a normal event > that happens to point at the AEP instead of somewhere in the enclave. > And because the signals the SDK cares about are all synchronous, the > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > resides in its signal handler. IRQs and whatnot simply trampoline back > into the enclave. > > Userspace can do something funky instead of ERESUME, but only *after* > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > case, after the trap handler has run. > > Jumping back a bit, how much do we care about preventing userspace > from doing stupid things? My general feeling is that userspace should be allowed to do apparently stupid things. For example, as far as the kernel is concerned, Wine and DOSEMU are just user programs that do stupid things. Linux generally tries to provide a reasonably complete view of architectural behavior. This is in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May cause very odd behavior indeed. So magic fixups that do non-architectural things are not so great. The flip side, of course, is that the architecture is arguably inherently erratic here, and it’s apparently impossible to have an SGX library with sane semantics without some kernel assistance. So if we can make my straw man API work, perhaps with vDSO or rseq-like help, then the official SDK can use it, but less well behaved programs can still mostly work. (Modulo Linux’s non-support for EINITTOKEN, of course.) Thinking about it some more, the major sticking point may be finding the RIP and stack frame of EENTER in the AEP code or in its fixup. The vDSO can’t use TLS without serious hackery. We could massively abuse WRFSBASE, but that’s really ugly. (How does the Windows case work? If there’s an exception after the untrusted stack allocation and before EEXIT and SEH tries to handle it, how does the unwinder figure out where to start?) > I did a quick POC on the idea of hardcoding > fixup for the ENCLU opcode, and the basic idea checks out. The code > is fairly minimal and doesn't impact the core functionality of the SDK. > They'd need to redo their trap handling to move it from the signal > handler to inline, but their stack shenanigans won't be any more broken > than they already are.
Re: RFC: userspace exception fixups
On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wrote: > > > > > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > > > > > > > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote: > > > > True, but what if we have a nasty enclave that writes to memory just > > > > below SP *before* decrementing SP? > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > > > > 1. EENTER > > > 2. Hardware sets eenter_hwframe->sp = %sp > > > 3. Enclave runs... wants to do out-call > > > 4. Enclave sets up parameters: > > > memcpy(_hwframe->sp[-offset], arg1, size); > > > ... > > > 5. Enclave sets eenter_hwframe->sp -= offset > > > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > > talk about. If we do this, we also basically require that the code > > > which handles asynchronous exits must *not* write to the stack. That's > > > not hard because it's typically just a single ERESUME instruction, but > > > it *is* a requirement. > > > > > I was assuming that the async exit stuff was completely hidden by the API. > > The AEP code would decide whether the exit got fixed up by the kernel > > (which may or may not be easy to tell — can the > > code even tell without kernel help whether it was, say, an IRQ vs #UD?) and > > then either do ERESUME or cause sgx_enter_enclave() to return with an > > appropriate return value. > > > > > Sean, how does the current SDK AEX handler decide whether to do > EENTER, ERESUME, or just bail and consider the enclave dead? It seems > like the *CPU* could give a big hint, but I don't see where there is > any architectural indication of why the AEX code got called or any > obvious way for the user code to know whether the exit was fixed up by > the kernel? The SDK "unconditionally" does ERESUME at the AEP location, but that's bit misleading because its signal handler may muck with the context's RIP, e.g. to abort the enclave on a fatal fault. On an event/exception from within an enclave, the event is immediately delivered after loading synthetic state and changing RIP to the AEP. In other words, jamming CPU state is essentially a bunch of vectoring ucode preamble, but from software's perspective it's a normal event that happens to point at the AEP instead of somewhere in the enclave. And because the signals the SDK cares about are all synchronous, the SDK can simply hardcode ERESUME at the AEP since all of the fault logic resides in its signal handler. IRQs and whatnot simply trampoline back into the enclave. Userspace can do something funky instead of ERESUME, but only *after* IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's case, after the trap handler has run. Jumping back a bit, how much do we care about preventing userspace from doing stupid things? I did a quick POC on the idea of hardcoding fixup for the ENCLU opcode, and the basic idea checks out. The code is fairly minimal and doesn't impact the core functionality of the SDK. They'd need to redo their trap handling to move it from the signal handler to inline, but their stack shenanigans won't be any more broken than they already are.
Re: RFC: userspace exception fixups
On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wrote: > > > > > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > > > > > > > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote: > > > > True, but what if we have a nasty enclave that writes to memory just > > > > below SP *before* decrementing SP? > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > > > > 1. EENTER > > > 2. Hardware sets eenter_hwframe->sp = %sp > > > 3. Enclave runs... wants to do out-call > > > 4. Enclave sets up parameters: > > > memcpy(_hwframe->sp[-offset], arg1, size); > > > ... > > > 5. Enclave sets eenter_hwframe->sp -= offset > > > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > > talk about. If we do this, we also basically require that the code > > > which handles asynchronous exits must *not* write to the stack. That's > > > not hard because it's typically just a single ERESUME instruction, but > > > it *is* a requirement. > > > > > I was assuming that the async exit stuff was completely hidden by the API. > > The AEP code would decide whether the exit got fixed up by the kernel > > (which may or may not be easy to tell — can the > > code even tell without kernel help whether it was, say, an IRQ vs #UD?) and > > then either do ERESUME or cause sgx_enter_enclave() to return with an > > appropriate return value. > > > > > Sean, how does the current SDK AEX handler decide whether to do > EENTER, ERESUME, or just bail and consider the enclave dead? It seems > like the *CPU* could give a big hint, but I don't see where there is > any architectural indication of why the AEX code got called or any > obvious way for the user code to know whether the exit was fixed up by > the kernel? The SDK "unconditionally" does ERESUME at the AEP location, but that's bit misleading because its signal handler may muck with the context's RIP, e.g. to abort the enclave on a fatal fault. On an event/exception from within an enclave, the event is immediately delivered after loading synthetic state and changing RIP to the AEP. In other words, jamming CPU state is essentially a bunch of vectoring ucode preamble, but from software's perspective it's a normal event that happens to point at the AEP instead of somewhere in the enclave. And because the signals the SDK cares about are all synchronous, the SDK can simply hardcode ERESUME at the AEP since all of the fault logic resides in its signal handler. IRQs and whatnot simply trampoline back into the enclave. Userspace can do something funky instead of ERESUME, but only *after* IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's case, after the trap handler has run. Jumping back a bit, how much do we care about preventing userspace from doing stupid things? I did a quick POC on the idea of hardcoding fixup for the ENCLU opcode, and the basic idea checks out. The code is fairly minimal and doesn't impact the core functionality of the SDK. They'd need to redo their trap handling to move it from the signal handler to inline, but their stack shenanigans won't be any more broken than they already are.
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wrote: > > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: > >> True, but what if we have a nasty enclave that writes to memory just > >> below SP *before* decrementing SP? > > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > >1. EENTER > >2. Hardware sets eenter_hwframe->sp = %sp > >3. Enclave runs... wants to do out-call > >4. Enclave sets up parameters: > >memcpy(_hwframe->sp[-offset], arg1, size); > >... > >5. Enclave sets eenter_hwframe->sp -= offset > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > talk about. If we do this, we also basically require that the code > > which handles asynchronous exits must *not* write to the stack. That's > > not hard because it's typically just a single ERESUME instruction, but > > it *is* a requirement. > > > > I was assuming that the async exit stuff was completely hidden by the API. > The AEP code would decide whether the exit got fixed up by the kernel (which > may or may not be easy to tell — can the code even tell without kernel help > whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause > sgx_enter_enclave() to return with an appropriate return value. > > Sean, how does the current SDK AEX handler decide whether to do EENTER, ERESUME, or just bail and consider the enclave dead? It seems like the *CPU* could give a big hint, but I don't see where there is any architectural indication of why the AEX code got called or any obvious way for the user code to know whether the exit was fixed up by the kernel?
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wrote: > > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: > >> True, but what if we have a nasty enclave that writes to memory just > >> below SP *before* decrementing SP? > > > > Yeah, that would be unfortunate. If an enclave did this (roughly): > > > >1. EENTER > >2. Hardware sets eenter_hwframe->sp = %sp > >3. Enclave runs... wants to do out-call > >4. Enclave sets up parameters: > >memcpy(_hwframe->sp[-offset], arg1, size); > >... > >5. Enclave sets eenter_hwframe->sp -= offset > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > was on the stack. The enclave could easily fix this by moving ->sp first. > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > talk about. If we do this, we also basically require that the code > > which handles asynchronous exits must *not* write to the stack. That's > > not hard because it's typically just a single ERESUME instruction, but > > it *is* a requirement. > > > > I was assuming that the async exit stuff was completely hidden by the API. > The AEP code would decide whether the exit got fixed up by the kernel (which > may or may not be easy to tell — can the code even tell without kernel help > whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause > sgx_enter_enclave() to return with an appropriate return value. > > Sean, how does the current SDK AEX handler decide whether to do EENTER, ERESUME, or just bail and consider the enclave dead? It seems like the *CPU* could give a big hint, but I don't see where there is any architectural indication of why the AEX code got called or any obvious way for the user code to know whether the exit was fixed up by the kernel?
Re: RFC: userspace exception fixups
> On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: >> True, but what if we have a nasty enclave that writes to memory just >> below SP *before* decrementing SP? > > Yeah, that would be unfortunate. If an enclave did this (roughly): > >1. EENTER >2. Hardware sets eenter_hwframe->sp = %sp >3. Enclave runs... wants to do out-call >4. Enclave sets up parameters: >memcpy(_hwframe->sp[-offset], arg1, size); >... >5. Enclave sets eenter_hwframe->sp -= offset > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > was on the stack. The enclave could easily fix this by moving ->sp first. > > But, this is one of those "fun" parts of the ABI that I think we need to > talk about. If we do this, we also basically require that the code > which handles asynchronous exits must *not* write to the stack. That's > not hard because it's typically just a single ERESUME instruction, but > it *is* a requirement. > I was assuming that the async exit stuff was completely hidden by the API. The AEP code would decide whether the exit got fixed up by the kernel (which may or may not be easy to tell — can the code even tell without kernel help whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause sgx_enter_enclave() to return with an appropriate return value.
Re: RFC: userspace exception fixups
> On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote: >> True, but what if we have a nasty enclave that writes to memory just >> below SP *before* decrementing SP? > > Yeah, that would be unfortunate. If an enclave did this (roughly): > >1. EENTER >2. Hardware sets eenter_hwframe->sp = %sp >3. Enclave runs... wants to do out-call >4. Enclave sets up parameters: >memcpy(_hwframe->sp[-offset], arg1, size); >... >5. Enclave sets eenter_hwframe->sp -= offset > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > was on the stack. The enclave could easily fix this by moving ->sp first. > > But, this is one of those "fun" parts of the ABI that I think we need to > talk about. If we do this, we also basically require that the code > which handles asynchronous exits must *not* write to the stack. That's > not hard because it's typically just a single ERESUME instruction, but > it *is* a requirement. > I was assuming that the async exit stuff was completely hidden by the API. The AEP code would decide whether the exit got fixed up by the kernel (which may or may not be easy to tell — can the code even tell without kernel help whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause sgx_enter_enclave() to return with an appropriate return value.
Re: RFC: userspace exception fixups
On 11/6/18 12:12 PM, Andy Lutomirski wrote: > True, but what if we have a nasty enclave that writes to memory just > below SP *before* decrementing SP? Yeah, that would be unfortunate. If an enclave did this (roughly): 1. EENTER 2. Hardware sets eenter_hwframe->sp = %sp 3. Enclave runs... wants to do out-call 4. Enclave sets up parameters: memcpy(_hwframe->sp[-offset], arg1, size); ... 5. Enclave sets eenter_hwframe->sp -= offset If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that was on the stack. The enclave could easily fix this by moving ->sp first. But, this is one of those "fun" parts of the ABI that I think we need to talk about. If we do this, we also basically require that the code which handles asynchronous exits must *not* write to the stack. That's not hard because it's typically just a single ERESUME instruction, but it *is* a requirement. It means fun stuff like that you absolutely can't just async-exit to C code.
Re: RFC: userspace exception fixups
On 11/6/18 12:12 PM, Andy Lutomirski wrote: > True, but what if we have a nasty enclave that writes to memory just > below SP *before* decrementing SP? Yeah, that would be unfortunate. If an enclave did this (roughly): 1. EENTER 2. Hardware sets eenter_hwframe->sp = %sp 3. Enclave runs... wants to do out-call 4. Enclave sets up parameters: memcpy(_hwframe->sp[-offset], arg1, size); ... 5. Enclave sets eenter_hwframe->sp -= offset If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that was on the stack. The enclave could easily fix this by moving ->sp first. But, this is one of those "fun" parts of the ABI that I think we need to talk about. If we do this, we also basically require that the code which handles asynchronous exits must *not* write to the stack. That's not hard because it's typically just a single ERESUME instruction, but it *is* a requirement. It means fun stuff like that you absolutely can't just async-exit to C code.
Re: RFC: userspace exception fixups
> On Nov 6, 2018, at 11:22 AM, Dave Hansen wrote: > >> On 11/6/18 11:02 AM, Andy Lutomirski wrote: >>> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: >>> On 11/6/18 10:20 AM, Andy Lutomirski wrote: I almost feel like the right solution is to call into SGX on its own private stack or maybe even its own private address space. >>> >>> Yeah, I had the same gut feeling. Couldn't the debugger even treat the >>> enclave like its own "thread" with its own stack and its own set of >>> registers and context? That seems like a much more workable model than >>> trying to weave it together with the EENTER context. >> >> So maybe the API should be, roughly >> >> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct >> host_state *state); >> sgx_exit_reason_t sgx_resume_enclave(same args); >> >> where host_state is something like: >> >> struct host_state { >> unsigned long bp, sp, ax, bx, cx, dx, si, di; >> }; >> >> and the values in host_state explicitly have nothing to do with the >> actual host registers. So, if you want to use the outcall mechanism, >> you'd allocate some memory, point sp to that memory, call >> sgx_enter_enclave(), and then read that memory to do the outcall. > > Ah, so instead of the enclave rudely "hijacking" the EENTER context, we > have it nicely return and nicely _hint_ to the calling context what it > would like to do. Then, the EENTER context can make a controlled > transition over to the requested context. Exactly. And existing enclaves keep working — their rudeness is just magically translated into a hint! > >> Actually implementing this would be distinctly nontrivial, and would >> almost certainly need some degree of kernel help to avoid an explosion >> when a signal gets delivered while we have host_state.sp loaded into >> the actual SP register. Maybe rseq could help with this? > > As long as the memory pointed to by host_state.sp is valid and can hold > the signal frame (grows down without clobbering anything), what goes > boom? The signal handling would push a signal frame and call the > handler. It would have a shallow-looking stack, but the handler could > just do its normal business and return from the signal where the frame > would get popped and continue with %rsp=host_state.sp, blissfully > unaware of the signal ever having happened. True, but what if we have a nasty enclave that writes to memory just below SP *before* decrementing SP? I suspect that rseq really can be used for this with only minimal-ish modifications. Or we could stick this in the vDSO with some appropriate fixups in the kernel.
Re: RFC: userspace exception fixups
> On Nov 6, 2018, at 11:22 AM, Dave Hansen wrote: > >> On 11/6/18 11:02 AM, Andy Lutomirski wrote: >>> On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: >>> On 11/6/18 10:20 AM, Andy Lutomirski wrote: I almost feel like the right solution is to call into SGX on its own private stack or maybe even its own private address space. >>> >>> Yeah, I had the same gut feeling. Couldn't the debugger even treat the >>> enclave like its own "thread" with its own stack and its own set of >>> registers and context? That seems like a much more workable model than >>> trying to weave it together with the EENTER context. >> >> So maybe the API should be, roughly >> >> sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct >> host_state *state); >> sgx_exit_reason_t sgx_resume_enclave(same args); >> >> where host_state is something like: >> >> struct host_state { >> unsigned long bp, sp, ax, bx, cx, dx, si, di; >> }; >> >> and the values in host_state explicitly have nothing to do with the >> actual host registers. So, if you want to use the outcall mechanism, >> you'd allocate some memory, point sp to that memory, call >> sgx_enter_enclave(), and then read that memory to do the outcall. > > Ah, so instead of the enclave rudely "hijacking" the EENTER context, we > have it nicely return and nicely _hint_ to the calling context what it > would like to do. Then, the EENTER context can make a controlled > transition over to the requested context. Exactly. And existing enclaves keep working — their rudeness is just magically translated into a hint! > >> Actually implementing this would be distinctly nontrivial, and would >> almost certainly need some degree of kernel help to avoid an explosion >> when a signal gets delivered while we have host_state.sp loaded into >> the actual SP register. Maybe rseq could help with this? > > As long as the memory pointed to by host_state.sp is valid and can hold > the signal frame (grows down without clobbering anything), what goes > boom? The signal handling would push a signal frame and call the > handler. It would have a shallow-looking stack, but the handler could > just do its normal business and return from the signal where the frame > would get popped and continue with %rsp=host_state.sp, blissfully > unaware of the signal ever having happened. True, but what if we have a nasty enclave that writes to memory just below SP *before* decrementing SP? I suspect that rseq really can be used for this with only minimal-ish modifications. Or we could stick this in the vDSO with some appropriate fixups in the kernel.
Re: RFC: userspace exception fixups
On 11/6/18 11:02 AM, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: >> >> On 11/6/18 10:20 AM, Andy Lutomirski wrote: >>> I almost feel like the right solution is to call into SGX on its own >>> private stack or maybe even its own private address space. >> >> Yeah, I had the same gut feeling. Couldn't the debugger even treat the >> enclave like its own "thread" with its own stack and its own set of >> registers and context? That seems like a much more workable model than >> trying to weave it together with the EENTER context. > > So maybe the API should be, roughly > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > host_state *state); > sgx_exit_reason_t sgx_resume_enclave(same args); > > where host_state is something like: > > struct host_state { > unsigned long bp, sp, ax, bx, cx, dx, si, di; > }; > > and the values in host_state explicitly have nothing to do with the > actual host registers. So, if you want to use the outcall mechanism, > you'd allocate some memory, point sp to that memory, call > sgx_enter_enclave(), and then read that memory to do the outcall. Ah, so instead of the enclave rudely "hijacking" the EENTER context, we have it nicely return and nicely _hint_ to the calling context what it would like to do. Then, the EENTER context can make a controlled transition over to the requested context. > Actually implementing this would be distinctly nontrivial, and would > almost certainly need some degree of kernel help to avoid an explosion > when a signal gets delivered while we have host_state.sp loaded into > the actual SP register. Maybe rseq could help with this? As long as the memory pointed to by host_state.sp is valid and can hold the signal frame (grows down without clobbering anything), what goes boom? The signal handling would push a signal frame and call the handler. It would have a shallow-looking stack, but the handler could just do its normal business and return from the signal where the frame would get popped and continue with %rsp=host_state.sp, blissfully unaware of the signal ever having happened.
Re: RFC: userspace exception fixups
On 11/6/18 11:02 AM, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: >> >> On 11/6/18 10:20 AM, Andy Lutomirski wrote: >>> I almost feel like the right solution is to call into SGX on its own >>> private stack or maybe even its own private address space. >> >> Yeah, I had the same gut feeling. Couldn't the debugger even treat the >> enclave like its own "thread" with its own stack and its own set of >> registers and context? That seems like a much more workable model than >> trying to weave it together with the EENTER context. > > So maybe the API should be, roughly > > sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct > host_state *state); > sgx_exit_reason_t sgx_resume_enclave(same args); > > where host_state is something like: > > struct host_state { > unsigned long bp, sp, ax, bx, cx, dx, si, di; > }; > > and the values in host_state explicitly have nothing to do with the > actual host registers. So, if you want to use the outcall mechanism, > you'd allocate some memory, point sp to that memory, call > sgx_enter_enclave(), and then read that memory to do the outcall. Ah, so instead of the enclave rudely "hijacking" the EENTER context, we have it nicely return and nicely _hint_ to the calling context what it would like to do. Then, the EENTER context can make a controlled transition over to the requested context. > Actually implementing this would be distinctly nontrivial, and would > almost certainly need some degree of kernel help to avoid an explosion > when a signal gets delivered while we have host_state.sp loaded into > the actual SP register. Maybe rseq could help with this? As long as the memory pointed to by host_state.sp is valid and can hold the signal frame (grows down without clobbering anything), what goes boom? The signal handling would push a signal frame and call the handler. It would have a shallow-looking stack, but the handler could just do its normal business and return from the signal where the frame would get popped and continue with %rsp=host_state.sp, blissfully unaware of the signal ever having happened.
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > I almost feel like the right solution is to call into SGX on its own > > private stack or maybe even its own private address space. > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > enclave like its own "thread" with its own stack and its own set of > registers and context? That seems like a much more workable model than > trying to weave it together with the EENTER context. So maybe the API should be, roughly sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct host_state *state); sgx_exit_reason_t sgx_resume_enclave(same args); where host_state is something like: struct host_state { unsigned long bp, sp, ax, bx, cx, dx, si, di; }; and the values in host_state explicitly have nothing to do with the actual host registers. So, if you want to use the outcall mechanism, you'd allocate some memory, point sp to that memory, call sgx_enter_enclave(), and then read that memory to do the outcall. Actually implementing this would be distinctly nontrivial, and would almost certainly need some degree of kernel help to avoid an explosion when a signal gets delivered while we have host_state.sp loaded into the actual SP register. Maybe rseq could help with this? The ISA here is IMO not well thought through.
Re: RFC: userspace exception fixups
On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: > > On 11/6/18 10:20 AM, Andy Lutomirski wrote: > > I almost feel like the right solution is to call into SGX on its own > > private stack or maybe even its own private address space. > > Yeah, I had the same gut feeling. Couldn't the debugger even treat the > enclave like its own "thread" with its own stack and its own set of > registers and context? That seems like a much more workable model than > trying to weave it together with the EENTER context. So maybe the API should be, roughly sgx_exit_reason_t sgx_enter_enclave(pointer_to_enclave, struct host_state *state); sgx_exit_reason_t sgx_resume_enclave(same args); where host_state is something like: struct host_state { unsigned long bp, sp, ax, bx, cx, dx, si, di; }; and the values in host_state explicitly have nothing to do with the actual host registers. So, if you want to use the outcall mechanism, you'd allocate some memory, point sp to that memory, call sgx_enter_enclave(), and then read that memory to do the outcall. Actually implementing this would be distinctly nontrivial, and would almost certainly need some degree of kernel help to avoid an explosion when a signal gets delivered while we have host_state.sp loaded into the actual SP register. Maybe rseq could help with this? The ISA here is IMO not well thought through.
Re: RFC: userspace exception fixups
On 11/6/18 10:20 AM, Andy Lutomirski wrote: > I almost feel like the right solution is to call into SGX on its own > private stack or maybe even its own private address space. Yeah, I had the same gut feeling. Couldn't the debugger even treat the enclave like its own "thread" with its own stack and its own set of registers and context? That seems like a much more workable model than trying to weave it together with the EENTER context.
Re: RFC: userspace exception fixups
On 11/6/18 10:20 AM, Andy Lutomirski wrote: > I almost feel like the right solution is to call into SGX on its own > private stack or maybe even its own private address space. Yeah, I had the same gut feeling. Couldn't the debugger even treat the enclave like its own "thread" with its own stack and its own set of registers and context? That seems like a much more workable model than trying to weave it together with the EENTER context.