Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Fri, Dec 21, 2018 at 10:28:09AM -0800, Sean Christopherson wrote: > > Why would you want to pass EPC through user space to KVM rather than > > KVM allocating it through kernel interfaces? > > Delegating EPC management to userspace fits better with KVM's existing > memory ABI. KVM provides a single ioctl(), KVM_SET_USER_MEMORY_REGION[1], > that allows userspace to create, move, modify and delete memory regions. > > Skipping over a lot of details, there are essentially three options for > exposing EPC to a KVM guest: > > 1) Provide a dedicated KVM ioctl() to manage EPC without routing it > through KVM_SET_USER_MEMORY_REGION. > > 2) Add a flag to 'struct kvm_userspace_memory_region' that denotes an > EPC memory region and mmap() / allocate EPC in KVM. > > 3) Provide an ABI to allocate raw EPC and let userspace manage it like > any other memory region. > > Option (1) requires duplicating all of KVM_SET_USER_MEMORY_REGION's > functionality unless the ioctl() is severly restricted. > > Option (2) is an ugly abuse of KVM_SET_USER_MEMORY_REGION since the EPC > flag would have completely different semantics than all other usage of > KVM_SET_USER_MEMORY_REGION. > > Thus, option (3). OK, thank you for patience explaining this. > Probably a better question to answer is why provide the ABI through > /dev/sgx and not /dev/kvm. IMO /dev/sgx is a more logical way to > advertise support to userspace, e.g. userspace can simply check if > /dev/sgx (or /dev/sgx/epc) exists vs. probing a KVM capability. You have to understand that for a KVM non-expert like me it was really important to get the context, which you kindly gave. I have never used KVM's memory management API but now that I know how it works all of this makes perfect sense. This is not a better question but it is definitely a good follow up question :-) I don't really understand you deduction here, however. If SGX was not supported, why couldn't the hypothetical /dev/kvm functionality just return an error? For me it sounds a bit messy that KVM functionality, which is a client to the SGX functionality, places some of its functionality to the SGX core. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Wed, Dec 19, 2018 at 07:00:47AM +0200, Jarkko Sakkinen wrote: > On Tue, Dec 18, 2018 at 10:53:49AM -0800, Sean Christopherson wrote: > > What if we re-organize the ioctls in such a way that we leave open the > > possibility of allocating raw EPC for KVM via /dev/sgx? I'm not 100% > > positive this approach will work[1], but conceptually it fits well with > > KVM's memory model, e.g. KVM is aware of the GPA<->HVA association but > > generally speaking doesn't know what's physically backing each memory > > region. > > Why would you want to pass EPC through user space to KVM rather than > KVM allocating it through kernel interfaces? Delegating EPC management to userspace fits better with KVM's existing memory ABI. KVM provides a single ioctl(), KVM_SET_USER_MEMORY_REGION[1], that allows userspace to create, move, modify and delete memory regions. Skipping over a lot of details, there are essentially three options for exposing EPC to a KVM guest: 1) Provide a dedicated KVM ioctl() to manage EPC without routing it through KVM_SET_USER_MEMORY_REGION. 2) Add a flag to 'struct kvm_userspace_memory_region' that denotes an EPC memory region and mmap() / allocate EPC in KVM. 3) Provide an ABI to allocate raw EPC and let userspace manage it like any other memory region. Option (1) requires duplicating all of KVM_SET_USER_MEMORY_REGION's functionality unless the ioctl() is severly restricted. Option (2) is an ugly abuse of KVM_SET_USER_MEMORY_REGION since the EPC flag would have completely different semantics than all other usage of KVM_SET_USER_MEMORY_REGION. Thus, option (3). Probably a better question to answer is why provide the ABI through /dev/sgx and not /dev/kvm. IMO /dev/sgx is a more logical way to advertise support to userspace, e.g. userspace can simply check if /dev/sgx (or /dev/sgx/epc) exists vs. probing a KVM capability. Without EPC oversubscription in KVM, /dev/sgx is easily the best fit since the EPC management would reside completely in x86/sgx, i.e. KVM would essentially have zero code related to EPC management. EPC oversubscription complicates things because the architecture forces aspects of VMM oversubscription into the KVM domain, e.g. requires a post-VMXON instruction (ENCLV) and a VM-Exit handler. I still think /dev/sgx is a better fit, my only concern is that the oversubscription code would be even more heinous due to splitting responsibilities. But, Andy's idea of having /dev/sgx/enclave vs. /dev/sgx/epc might help avoid that entirely.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Wed, Dec 19, 2018 at 06:47:32AM +0200, Jarkko Sakkinen wrote: > On Tue, Dec 18, 2018 at 07:44:18AM -0800, Sean Christopherson wrote: > > My fd/inode knowledge is lacking, to say the least. Whatever works, so > > long as we have a way to uniquely identify enclaves. > > I will simply trial and error :-) I think it should work since it does > own an address space, but yeah, testing will tell. We can go also with > anon inode if required. I think this can be concluded with the fact that it is nice to be able to multiplex the dev fd. That is the key reason for using anon inode. You KVM comment locks the decision here. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 09:55:08PM -0800, Andy Lutomirski wrote: > On Thu, Nov 15, 2018 at 5:08 PM Jarkko Sakkinen > wrote: > > > > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that > > can be used by applications to set aside private regions of code and > > data. The code outside the enclave is disallowed to access the memory > > inside the enclave by the CPU access control. > > This is a very partial review. Thank you, appreciate it. > > +int sgx_encl_find(struct mm_struct *mm, unsigned long addr, > > + struct vm_area_struct **vma) > > +{ > > + struct vm_area_struct *result; > > + struct sgx_encl *encl; > > + > > + result = find_vma(mm, addr); > > + if (!result || result->vm_ops != _vm_ops || addr < > > result->vm_start) > > + return -EINVAL; > > + > > + encl = result->vm_private_data; > > + *vma = result; > > + > > + return encl ? 0 : -ENOENT; > > +} > > I realize that this function may go away entirely but, if you keep it: > what are the locking rules? What, if anything, prevents another > thread from destroying the enclave after sgx_encl_find() returns? The kref inside the enclave is used to manage this but this function directly does not prevent it (see for example sgx_encl_get). But yes, this function does not give any guarantees (should probably have a documentation stating this). > > +static int sgx_validate_secs(const struct sgx_secs *secs, > > +unsigned long ssaframesize) > > +{ > > ... > > > + if (secs->attributes & SGX_ATTR_MODE64BIT) { > > + if (secs->size > sgx_encl_size_max_64) > > + return -EINVAL; > > + } else { > > + /* On 64-bit architecture allow 32-bit encls only in > > +* the compatibility mode. > > +*/ > > + if (!test_thread_flag(TIF_ADDR32)) > > + return -EINVAL; > > + if (secs->size > sgx_encl_size_max_32) > > + return -EINVAL; > > + } > > Why do we need the 32-bit-on-64-bit check? In general, anything that > checks per-task or per-mm flags like TIF_ADDR32 is IMO likely to be > problematic. You're allowing 64-bit enclaves in 32-bit tasks, so I'm > guessing you could just delete the check. I guess you are right. I can remove this. > > > + > > + if (!(secs->xfrm & XFEATURE_MASK_FP) || > > + !(secs->xfrm & XFEATURE_MASK_SSE) || > > + (((secs->xfrm >> XFEATURE_BNDREGS) & 1) != > > +((secs->xfrm >> XFEATURE_BNDCSR) & 1)) || > > + (secs->xfrm & ~sgx_xfrm_mask)) > > + return -EINVAL; > > Do we need to check that the enclave doesn't use xfeatures that the > kernel doesn't know about? Or are they all safe by design in enclave > mode? Really good catch BTW. We don't check what kernel doesn't know about. I'm not sure what harm this would have as the enclave cannot have much effect to the outside world. Is there easy way to get similar mask of kernel supported features as sgx_xfrm_mask? The safe play would be to use such here as I don't have definitive answer to your second question. > > > +static int sgx_encl_pm_notifier(struct notifier_block *nb, > > + unsigned long action, void *data) > > +{ > > + struct sgx_encl *encl = container_of(nb, struct sgx_encl, > > pm_notifier); > > + > > + if (action != PM_SUSPEND_PREPARE && action != > > PM_HIBERNATION_PREPARE) > > + return NOTIFY_DONE; > > Hmm. There's an argument to made that omitting this would better > exercise the code that handles fully asynchronous loss of an enclave. > Also, I think you're unnecessarily killing enclaves when suspend is > attempted but fails. Are you proposing to not do anything at all to the enclaves? There was is a problem that it could lead to infinite #PF loop if we don't do it. > > > + > > +static int sgx_get_key_hash(const void *modulus, void *hash) > > +{ > > + struct crypto_shash *tfm; > > + int ret; > > + > > + tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC); > > + if (IS_ERR(tfm)) > > + return PTR_ERR(tfm); > > + > > + ret = __sgx_get_key_hash(tfm, modulus, hash); > > + > > + crypto_free_shash(tfm); > > + return ret; > > +} > > + > > I'm so sorry you had to deal with this API. Once Zinc lands, you > could clean this up :) > > > > +static int sgx_encl_get(unsigned long addr, struct sgx_encl **encl) > > +{ > > + struct mm_struct *mm = current->mm; > > + struct vm_area_struct *vma; > > + int ret; > > + > > + if (addr & (PAGE_SIZE - 1)) > > + return -EINVAL; > > + > > + down_read(>mmap_sem); > > + > > + ret = sgx_encl_find(mm, addr, ); > > + if (!ret) { > > + *encl = vma->vm_private_data; > > + > > + if ((*encl)->flags &
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Wed, Dec 19, 2018 at 07:00:47AM +0200, Jarkko Sakkinen wrote: > On Tue, Dec 18, 2018 at 10:53:49AM -0800, Sean Christopherson wrote: > > What if we re-organize the ioctls in such a way that we leave open the > > possibility of allocating raw EPC for KVM via /dev/sgx? I'm not 100% > > positive this approach will work[1], but conceptually it fits well with > > KVM's memory model, e.g. KVM is aware of the GPA<->HVA association but > > generally speaking doesn't know what's physically backing each memory > > region. > > Why would you want to pass EPC through user space to KVM rather than > KVM allocating it through kernel interfaces? > > > Tangentially related, I think we should support allocating multiple > > enclaves from a single /dev/sgx fd, i.e. a process shouldn't have to > > open /dev/sgx every time it wants to create a new enclave. > > I'm fine with this. It just requires to create anon inode. I'll just > add a new field called 'enclave_fd' to struct sgx_enclave_create and > that's all. > > I think I have otherwise ingredients for v19 ready except where to swap. If I follow your proposal here and allow to create multiple enclaves (i.e. with anon inodes for each) with one descriptor, is that sufficient API to later add what you want to KVM? /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Dec 18, 2018 at 10:53:49AM -0800, Sean Christopherson wrote: > What if we re-organize the ioctls in such a way that we leave open the > possibility of allocating raw EPC for KVM via /dev/sgx? I'm not 100% > positive this approach will work[1], but conceptually it fits well with > KVM's memory model, e.g. KVM is aware of the GPA<->HVA association but > generally speaking doesn't know what's physically backing each memory > region. Why would you want to pass EPC through user space to KVM rather than KVM allocating it through kernel interfaces? > Tangentially related, I think we should support allocating multiple > enclaves from a single /dev/sgx fd, i.e. a process shouldn't have to > open /dev/sgx every time it wants to create a new enclave. I'm fine with this. It just requires to create anon inode. I'll just add a new field called 'enclave_fd' to struct sgx_enclave_create and that's all. I think I have otherwise ingredients for v19 ready except where to swap. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Dec 18, 2018 at 07:44:18AM -0800, Sean Christopherson wrote: > My fd/inode knowledge is lacking, to say the least. Whatever works, so > long as we have a way to uniquely identify enclaves. I will simply trial and error :-) I think it should work since it does own an address space, but yeah, testing will tell. We can go also with anon inode if required. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Dec 18, 2018 at 07:44:18AM -0800, Sean Christopherson wrote: > On Mon, Dec 17, 2018 at 08:59:54PM -0800, Andy Lutomirski wrote: > > On Mon, Dec 17, 2018 at 2:20 PM Sean Christopherson > > wrote: > > > > > > > > My brain is still sorting out the details, but I generally like the idea > > > of allocating an anon inode when creating an enclave, and exposing the > > > other ioctls() via the returned fd. This is essentially the approach > > > used by KVM to manage multiple "layers" of ioctls across KVM itself, VMs > > > and vCPUS. There are even similarities to accessing physical memory via > > > multiple disparate domains, e.g. host kernel, host userspace and guest. > > > > > > > In my mind, opening /dev/sgx would give you the requisite inode. I'm > > not 100% sure that the chardev infrastructure allows this, but I think > > it does. > > My fd/inode knowledge is lacking, to say the least. Whatever works, so > long as we have a way to uniquely identify enclaves. Actually, while we're dissecting the interface... What if we re-organize the ioctls in such a way that we leave open the possibility of allocating raw EPC for KVM via /dev/sgx? I'm not 100% positive this approach will work[1], but conceptually it fits well with KVM's memory model, e.g. KVM is aware of the GPA<->HVA association but generally speaking doesn't know what's physically backing each memory region. Tangentially related, I think we should support allocating multiple enclaves from a single /dev/sgx fd, i.e. a process shouldn't have to open /dev/sgx every time it wants to create a new enclave. Something like this: /dev/sgx | -> mmap() { return -EINVAL; } | -> unlocked_ioctl() | -> SGX_CREATE_ENCLAVE: { return alloc_enclave_fd(); } | | | -> mmap() { ... } | | | -> get_unmapped_area() { | | if (enclave->size) { | | if (!addr) | | addr = enclave->base; | | if (addr + len + pgoff > enclave->base + enclave->size) | | return -EINVAL; | | } else { | | if (!validate_size(len)) | | return -EINVAL; | | addr = naturally_align(len); | | } | | } | | | -> unlocked_ioctl() { | SGX_ENCLAVE_ADD_PAGE: { ... } | SGX_ENCLAVE_INIT: { ... } | SGX_ENCLAVE_REMOVE_PAGES: { ... } | SGX_ENCLAVE_MODIFY_PAGES: { ... } | } | -> SGX_CREATE_VIRTUAL_EPC: {return alloc_epc_fd(); } | -> mmap() { ... } | -> get_unmapped_area() { } | -> unlocked_ioctl() { SGX_VIRTUAL_EPC_???: SGX_VIRTUAL_EPC_???: } [1] Delegating EPC management to /dev/sgx is viable for virtualizing SGX without oversubscribing EPC to guests, but oversubscribing EPC in a VMM requires handling EPC-related VM-Exits and using instructions that will #UD if the CPU is not post-VMXON. I *think* having KVM forward VM-Exits to x86/sgx would work, but it's entirely possible it'd be a complete cluster.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Dec 18, 2018 at 03:13:11PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 10:21:49PM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 09:33:22PM +0200, Jarkko Sakkinen wrote: > > > On Mon, Dec 17, 2018 at 10:48:58AM -0800, Sean Christopherson wrote: > > > > On Mon, Dec 17, 2018 at 08:43:33PM +0200, Jarkko Sakkinen wrote: > > > > > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > > > > > > I'm pretty sure doing mmget() would result in circular dependencies > > > > > > and > > > > > > a zombie enclave. In the do_exit() case where a task is abruptly > > > > > > killed: > > > > > > > > > > > > - __mmput() is never called because the enclave holds a ref > > > > > > - sgx_encl_release() is never be called because its VMAs hold refs > > > > > > - sgx_vma_close() is never called because __mmput()->exit_mmap() > > > > > > is > > > > > > blocked and the process itself is dead, i.e. won't unmap > > > > > > anything. > > > > > > > > > > Right, it does, you are absolutely right. Tried it and removed the > > > > > commit already. > > > > > > > > > > Well, what we came up from your suggestion i.e. setting mm to NULL > > > > > and checking that is very subtle change and does not have any such > > > > > circular dependencies. We'll go with that. > > > > > > > > We can't set mm to NULL as we need it to unregister the notifier, and > > > > I'm fairly certain attempting to unregister in the release callback > > > > will deadlock. > > > > > > Noticed that too. mmu_notifier_unregister() requires a valid mm. > > > > Both branches updated... > > I'm not still seeing why you would want to call sgx_free_page() from > sgx_invalidate(). Kind of resistant to adding extra logging just for > checking for programming errors. What I would do if I had to debug > there a leak would be simply put kretprobe on __sgx_free_page(). The WARN is needed to detect the leak in the first place. And leaking pages because EREMOVE fails usually means there's a serious bug.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 08:59:54PM -0800, Andy Lutomirski wrote: > On Mon, Dec 17, 2018 at 2:20 PM Sean Christopherson > wrote: > > > > > My brain is still sorting out the details, but I generally like the idea > > of allocating an anon inode when creating an enclave, and exposing the > > other ioctls() via the returned fd. This is essentially the approach > > used by KVM to manage multiple "layers" of ioctls across KVM itself, VMs > > and vCPUS. There are even similarities to accessing physical memory via > > multiple disparate domains, e.g. host kernel, host userspace and guest. > > > > In my mind, opening /dev/sgx would give you the requisite inode. I'm > not 100% sure that the chardev infrastructure allows this, but I think > it does. My fd/inode knowledge is lacking, to say the least. Whatever works, so long as we have a way to uniquely identify enclaves. > > The only potential hiccup I can see is the build flow. Currently, > > EADD+EEXTEND is done via a work queue to avoid major performance issues > > (10x regression) when userspace is building multiple enclaves in parallel > > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > > but I've only confirmed the Golang case). The issue is that allocating > > an EPC page acts like a blocking syscall when the EPC is under pressure, > > i.e. an EPC page isn't immediately available. This causes Go's scheduler > > to thrash and tank performance[1]. > > What's the issue, and how does a workqueue help? I'm wondering if a > nicer solution would be an ioctl to add lots of pages in a single > call. Adding pages via workqueue makes the ioctl itself fast enough to avoid triggering Go's rescheduling. A batched EADD flow would likely help, I just haven't had the time to rework the userspace side to be able to test the performance. > > > > Alternatively, we could change the EADD+EEXTEND flow to not insert the > > added page's PFN into the owner's process space, i.e. force userspace to > > fault when it runs the enclave. But that only delays the issue because > > eventually we'll want to account EPC pages, i.e. add a cgroup, at which > > point we'll likely need current->mm anyways. > > You should be able to account the backing pages to a cgroup without > actually sticking them into the EPC, no? Or am I misunderstanding? I > guess we'll eventually want a cgroup to limit use of the limited EPC > resources. It's the latter, a cgroup to limit EPC. The mm is used to retrieve the cgroup without having track e.g. the task_struct.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 09:02:03PM -0800, Andy Lutomirski wrote: > That's what unmap_mapping_range(), etc do for you, no? IOW make a > struct address_space that represents the logical enclave address > space, i.e. address 0 is the start and the pages count up from there. > You can unmap pages whenever you want, and the core mm code will take > care of zapping the pages from all vmas referencing that > address_space. OK, so it does. Did not have time to look at it last night (about 3AM) :-) Yes, we could use that to do the N process zapping. Based on this discussion I can take the first steps with the swapping code. And yeah, I don't think we need anon inode for this one. Can just use the dev inode (did not check in detail but on the surface looks like it). /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 08:55:02PM -0800, Andy Lutomirski wrote: > On Mon, Dec 17, 2018 at 5:39 PM Jarkko Sakkinen > wrote: > > > > On Mon, Dec 17, 2018 at 02:20:48PM -0800, Sean Christopherson wrote: > > > The only potential hiccup I can see is the build flow. Currently, > > > EADD+EEXTEND is done via a work queue to avoid major performance issues > > > (10x regression) when userspace is building multiple enclaves in parallel > > > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > > > but I've only confirmed the Golang case). The issue is that allocating > > > an EPC page acts like a blocking syscall when the EPC is under pressure, > > > i.e. an EPC page isn't immediately available. This causes Go's scheduler > > > to thrash and tank performance[1]. > > > > I don't see any major issues having that kthread. All the code that > > maps the enclave would be removed. > > > > I would only allow to map enclave to process address space after the > > enclave has been initialized i.e. SGX_IOC_ENCLAVE_ATTACH. > > > > What's SGX_IOC_ENCLAVE_ATTACH? Why would it be needed at all? I > would imagine that all pages would be faulted in as needed (or > prefaulted as an optimization) and the enclave would just work in any > process. The way I see it the efficient way to implement this is to have the enclave attached to a single process address space at a time. #PF handler is trivial with multiple address spaces but swapping is a bit tedious as you would need to zap N processes. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 10:21:49PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 09:33:22PM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 10:48:58AM -0800, Sean Christopherson wrote: > > > On Mon, Dec 17, 2018 at 08:43:33PM +0200, Jarkko Sakkinen wrote: > > > > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > > > > > I'm pretty sure doing mmget() would result in circular dependencies > > > > > and > > > > > a zombie enclave. In the do_exit() case where a task is abruptly > > > > > killed: > > > > > > > > > > - __mmput() is never called because the enclave holds a ref > > > > > - sgx_encl_release() is never be called because its VMAs hold refs > > > > > - sgx_vma_close() is never called because __mmput()->exit_mmap() is > > > > > blocked and the process itself is dead, i.e. won't unmap anything. > > > > > > > > Right, it does, you are absolutely right. Tried it and removed the > > > > commit already. > > > > > > > > Well, what we came up from your suggestion i.e. setting mm to NULL > > > > and checking that is very subtle change and does not have any such > > > > circular dependencies. We'll go with that. > > > > > > We can't set mm to NULL as we need it to unregister the notifier, and > > > I'm fairly certain attempting to unregister in the release callback > > > will deadlock. > > > > Noticed that too. mmu_notifier_unregister() requires a valid mm. > > Both branches updated... I'm not still seeing why you would want to call sgx_free_page() from sgx_invalidate(). Kind of resistant to adding extra logging just for checking for programming errors. What I would do if I had to debug there a leak would be simply put kretprobe on __sgx_free_page(). /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 08:59:54PM -0800, Andy Lutomirski wrote: > On Mon, Dec 17, 2018 at 2:20 PM Sean Christopherson > wrote: > > > > > My brain is still sorting out the details, but I generally like the idea > > of allocating an anon inode when creating an enclave, and exposing the > > other ioctls() via the returned fd. This is essentially the approach > > used by KVM to manage multiple "layers" of ioctls across KVM itself, VMs > > and vCPUS. There are even similarities to accessing physical memory via > > multiple disparate domains, e.g. host kernel, host userspace and guest. > > > > In my mind, opening /dev/sgx would give you the requisite inode. I'm > not 100% sure that the chardev infrastructure allows this, but I think > it does. Yes, this is what I was thinking too i.e. enclave_fd = open("/dev/sgx/", O_RDWR); After this enclave_fd "is" the enclave up until the file is closed. > > The only potential hiccup I can see is the build flow. Currently, > > EADD+EEXTEND is done via a work queue to avoid major performance issues > > (10x regression) when userspace is building multiple enclaves in parallel > > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > > but I've only confirmed the Golang case). The issue is that allocating > > an EPC page acts like a blocking syscall when the EPC is under pressure, > > i.e. an EPC page isn't immediately available. This causes Go's scheduler > > to thrash and tank performance[1]. > > What's the issue, and how does a workqueue help? I'm wondering if a > nicer solution would be an ioctl to add lots of pages in a single > call. I don't think this really is an issue as long as the thread does not depend on any VMAs. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Thu, Nov 15, 2018 at 5:08 PM Jarkko Sakkinen wrote: > > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that > can be used by applications to set aside private regions of code and > data. The code outside the enclave is disallowed to access the memory > inside the enclave by the CPU access control. This is a very partial review. > +int sgx_encl_find(struct mm_struct *mm, unsigned long addr, > + struct vm_area_struct **vma) > +{ > + struct vm_area_struct *result; > + struct sgx_encl *encl; > + > + result = find_vma(mm, addr); > + if (!result || result->vm_ops != _vm_ops || addr < > result->vm_start) > + return -EINVAL; > + > + encl = result->vm_private_data; > + *vma = result; > + > + return encl ? 0 : -ENOENT; > +} I realize that this function may go away entirely but, if you keep it: what are the locking rules? What, if anything, prevents another thread from destroying the enclave after sgx_encl_find() returns? > +static int sgx_validate_secs(const struct sgx_secs *secs, > +unsigned long ssaframesize) > +{ ... > + if (secs->attributes & SGX_ATTR_MODE64BIT) { > + if (secs->size > sgx_encl_size_max_64) > + return -EINVAL; > + } else { > + /* On 64-bit architecture allow 32-bit encls only in > +* the compatibility mode. > +*/ > + if (!test_thread_flag(TIF_ADDR32)) > + return -EINVAL; > + if (secs->size > sgx_encl_size_max_32) > + return -EINVAL; > + } Why do we need the 32-bit-on-64-bit check? In general, anything that checks per-task or per-mm flags like TIF_ADDR32 is IMO likely to be problematic. You're allowing 64-bit enclaves in 32-bit tasks, so I'm guessing you could just delete the check. > + > + if (!(secs->xfrm & XFEATURE_MASK_FP) || > + !(secs->xfrm & XFEATURE_MASK_SSE) || > + (((secs->xfrm >> XFEATURE_BNDREGS) & 1) != > +((secs->xfrm >> XFEATURE_BNDCSR) & 1)) || > + (secs->xfrm & ~sgx_xfrm_mask)) > + return -EINVAL; Do we need to check that the enclave doesn't use xfeatures that the kernel doesn't know about? Or are they all safe by design in enclave mode? > +static int sgx_encl_pm_notifier(struct notifier_block *nb, > + unsigned long action, void *data) > +{ > + struct sgx_encl *encl = container_of(nb, struct sgx_encl, > pm_notifier); > + > + if (action != PM_SUSPEND_PREPARE && action != PM_HIBERNATION_PREPARE) > + return NOTIFY_DONE; Hmm. There's an argument to made that omitting this would better exercise the code that handles fully asynchronous loss of an enclave. Also, I think you're unnecessarily killing enclaves when suspend is attempted but fails. > + > +static int sgx_get_key_hash(const void *modulus, void *hash) > +{ > + struct crypto_shash *tfm; > + int ret; > + > + tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC); > + if (IS_ERR(tfm)) > + return PTR_ERR(tfm); > + > + ret = __sgx_get_key_hash(tfm, modulus, hash); > + > + crypto_free_shash(tfm); > + return ret; > +} > + I'm so sorry you had to deal with this API. Once Zinc lands, you could clean this up :) > +static int sgx_encl_get(unsigned long addr, struct sgx_encl **encl) > +{ > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *vma; > + int ret; > + > + if (addr & (PAGE_SIZE - 1)) > + return -EINVAL; > + > + down_read(>mmap_sem); > + > + ret = sgx_encl_find(mm, addr, ); > + if (!ret) { > + *encl = vma->vm_private_data; > + > + if ((*encl)->flags & SGX_ENCL_SUSPEND) > + ret = SGX_POWER_LOST_ENCLAVE; > + else > + kref_get(&(*encl)->refcount); > + } Hmm. This version has explicit refcounting. > +static int sgx_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + vma->vm_ops = _vm_ops; > + vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO | > +VM_DONTCOPY; > + > + return 0; > +} > + > +static unsigned long sgx_get_unmapped_area(struct file *file, > + unsigned long addr, > + unsigned long len, > + unsigned long pgoff, > + unsigned long flags) > +{ > + if (len < 2 * PAGE_SIZE || (len & (len - 1))) > + return -EINVAL; > + > + if (len > sgx_encl_size_max_64) > + return -EINVAL; > + > + if (len > sgx_encl_size_max_32 && test_thread_flag(TIF_ADDR32)) > + return -EINVAL; Generally speaking, this type of
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 7:27 PM Jarkko Sakkinen wrote: > > On Tue, Dec 18, 2018 at 03:39:18AM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 02:20:48PM -0800, Sean Christopherson wrote: > > > The only potential hiccup I can see is the build flow. Currently, > > > EADD+EEXTEND is done via a work queue to avoid major performance issues > > > (10x regression) when userspace is building multiple enclaves in parallel > > > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > > > but I've only confirmed the Golang case). The issue is that allocating > > > an EPC page acts like a blocking syscall when the EPC is under pressure, > > > i.e. an EPC page isn't immediately available. This causes Go's scheduler > > > to thrash and tank performance[1]. > > > > I don't see any major issues having that kthread. All the code that > > maps the enclave would be removed. > > > > I would only allow to map enclave to process address space after the > > enclave has been initialized i.e. SGX_IOC_ENCLAVE_ATTACH. > > Some refined thoughts. > > PTE insertion can done in the #PF handler. In fact, we can PoC this > already with the current architecture (and I will right after sending > v18). > > The backing space is a bit more nasty issue in the add pager thread. > The previous shmem swapping would have been a better fit. Maybe that > should be reconsidered? > > If shmem was used, all the commits up to "SGX Enclave Driver" could > be reworked to the new model. > > When we think about the swapping code, there uprises some difficulties. > Namely, when a page is swapped, the enclave must unmap the PTE from all > processes that have it mapped. That's what unmap_mapping_range(), etc do for you, no? IOW make a struct address_space that represents the logical enclave address space, i.e. address 0 is the start and the pages count up from there. You can unmap pages whenever you want, and the core mm code will take care of zapping the pages from all vmas referencing that address_space.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 2:20 PM Sean Christopherson wrote: > > My brain is still sorting out the details, but I generally like the idea > of allocating an anon inode when creating an enclave, and exposing the > other ioctls() via the returned fd. This is essentially the approach > used by KVM to manage multiple "layers" of ioctls across KVM itself, VMs > and vCPUS. There are even similarities to accessing physical memory via > multiple disparate domains, e.g. host kernel, host userspace and guest. > In my mind, opening /dev/sgx would give you the requisite inode. I'm not 100% sure that the chardev infrastructure allows this, but I think it does. > The only potential hiccup I can see is the build flow. Currently, > EADD+EEXTEND is done via a work queue to avoid major performance issues > (10x regression) when userspace is building multiple enclaves in parallel > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > but I've only confirmed the Golang case). The issue is that allocating > an EPC page acts like a blocking syscall when the EPC is under pressure, > i.e. an EPC page isn't immediately available. This causes Go's scheduler > to thrash and tank performance[1]. What's the issue, and how does a workqueue help? I'm wondering if a nicer solution would be an ioctl to add lots of pages in a single call. > > Alternatively, we could change the EADD+EEXTEND flow to not insert the > added page's PFN into the owner's process space, i.e. force userspace to > fault when it runs the enclave. But that only delays the issue because > eventually we'll want to account EPC pages, i.e. add a cgroup, at which > point we'll likely need current->mm anyways. You should be able to account the backing pages to a cgroup without actually sticking them into the EPC, no? Or am I misunderstanding? I guess we'll eventually want a cgroup to limit use of the limited EPC resources.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 5:39 PM Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 02:20:48PM -0800, Sean Christopherson wrote: > > The only potential hiccup I can see is the build flow. Currently, > > EADD+EEXTEND is done via a work queue to avoid major performance issues > > (10x regression) when userspace is building multiple enclaves in parallel > > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > > but I've only confirmed the Golang case). The issue is that allocating > > an EPC page acts like a blocking syscall when the EPC is under pressure, > > i.e. an EPC page isn't immediately available. This causes Go's scheduler > > to thrash and tank performance[1]. > > I don't see any major issues having that kthread. All the code that > maps the enclave would be removed. > > I would only allow to map enclave to process address space after the > enclave has been initialized i.e. SGX_IOC_ENCLAVE_ATTACH. > What's SGX_IOC_ENCLAVE_ATTACH? Why would it be needed at all? I would imagine that all pages would be faulted in as needed (or prefaulted as an optimization) and the enclave would just work in any process.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Dec 18, 2018 at 03:39:18AM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 02:20:48PM -0800, Sean Christopherson wrote: > > The only potential hiccup I can see is the build flow. Currently, > > EADD+EEXTEND is done via a work queue to avoid major performance issues > > (10x regression) when userspace is building multiple enclaves in parallel > > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > > but I've only confirmed the Golang case). The issue is that allocating > > an EPC page acts like a blocking syscall when the EPC is under pressure, > > i.e. an EPC page isn't immediately available. This causes Go's scheduler > > to thrash and tank performance[1]. > > I don't see any major issues having that kthread. All the code that > maps the enclave would be removed. > > I would only allow to map enclave to process address space after the > enclave has been initialized i.e. SGX_IOC_ENCLAVE_ATTACH. Some refined thoughts. PTE insertion can done in the #PF handler. In fact, we can PoC this already with the current architecture (and I will right after sending v18). The backing space is a bit more nasty issue in the add pager thread. The previous shmem swapping would have been a better fit. Maybe that should be reconsidered? If shmem was used, all the commits up to "SGX Enclave Driver" could be reworked to the new model. When we think about the swapping code, there uprises some difficulties. Namely, when a page is swapped, the enclave must unmap the PTE from all processes that have it mapped. I have a one compromise solution for the problem above: make enclaves shared BUT mutually exclusive. When you attach an enclave it gets detached from the previous process that had it. This would still fully implement the daemon example that Andy gave in earlier response. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 12:10:17PM -0800, Andy Lutomirski wrote: > On Mon, Dec 17, 2018 at 12:03 PM Dave Hansen wrote: > > > > On 12/17/18 11:55 AM, Andy Lutomirski wrote: > > >> You're effectively rebuilding reverse-mapping infrastructure here. It's > > >> a frequent thing for the core VM to need to go from 'struct page' back > > >> to the page tables mapping it. For that we go (logically) > > >> page->{anon_vma,mapping}->vma->vm_mm->pagetable. > > > This is a bit outside my expertise here, but doesn't > > > unmap_mapping_range() do exactly what SGX wants? > > > > There's no 'struct page' for enclave memory as it stands. That means no > > page cache, and that means there's no 'struct address_space *mapping' in > > the first place. > > > > Basically, the choice was made a long time ago to have SGX's memory > > management live outside the core VM. I've waffled back and forth on it, > > but I do still think this is the right way to do it. > > AFAICS a lack of struct page isn't a problem. The core code seems to > understand that address_space objects might cover non-struct-page > memory. Morally, enclave memory is a lot like hot-unpluggable PCI > space. I'm fine using it if it works. Will try it for v19. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 02:20:48PM -0800, Sean Christopherson wrote: > The only potential hiccup I can see is the build flow. Currently, > EADD+EEXTEND is done via a work queue to avoid major performance issues > (10x regression) when userspace is building multiple enclaves in parallel > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > but I've only confirmed the Golang case). The issue is that allocating > an EPC page acts like a blocking syscall when the EPC is under pressure, > i.e. an EPC page isn't immediately available. This causes Go's scheduler > to thrash and tank performance[1]. I don't see any major issues having that kthread. All the code that maps the enclave would be removed. I would only allow to map enclave to process address space after the enclave has been initialized i.e. SGX_IOC_ENCLAVE_ATTACH. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Dec 18, 2018 at 03:17:25AM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 11:12:21AM -0800, Andy Lutomirski wrote: > > I'm going to ask an obnoxious high-level question: why does an enclave > > even refer to a specific mm? > > The reason is that it has not been yet in focus in the review process > and there has been other concerns. > > At least the code is fairly stable i.e. working code is usually good > starting point for making something different (ignoring the recent > regression caused by the shmem to VMA migration). > > > If I were designing this thing, and if I hadn't started trying to > > implement it, my first thought would be that an enclave tracks its > > linear address range, which is just a pair of numbers, and also keeps > > track of a whole bunch of physical EPC pages, data structures, etc. > > And that mmap() gets rejected unless the requested virtual address > > matches the linear address range that the enclave wants and, aside > > from that, just creates a VMA that keeps a reference to the enclave. > > (And, for convenience, I suppose that the first mmap() call done > > before any actual enclave setup happens could choose any address and > > then cause the enclave to lock itself to that address, although a > > regular anonymous PROT_NONE MAP_NORESERVE mapping would do just fine, > > too.) And the driver would explicitly allow multiple different mms to > > have the same enclave mapped. More importantly, a daemon could set up > > an enclave without necessarily mapping it at all and then SCM_RIGHTS > > the enclave over to the process that plans to run it. > > The current SGX_IOC_ENCLAVE_CREATE ioctl would be trivial to change to > use this approach. Instead looking up VMA with an enclave instance it > would create a new enclave instance. > > Then we could have SGX_IOC_ENCLAVE_ATTACH to attach an enclave to a VMA. > > This does not sound too complicated. > > > Now I'm sure this has all kinds of problems, such as the ISA possibly > > making it rather obnoxious to add pages to the enclave without having > > it mapped. But these operations could, in principle, be done by > > We do EADD in a kthread. What this would require to put current->mm > into a request that it is processed by that thread. This would be > doable with mmget(). Correction here. We need mm just for vm_insert_pfn(), which would be removed, no need to pass mm. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 11:12:21AM -0800, Andy Lutomirski wrote: > I'm going to ask an obnoxious high-level question: why does an enclave > even refer to a specific mm? The reason is that it has not been yet in focus in the review process and there has been other concerns. At least the code is fairly stable i.e. working code is usually good starting point for making something different (ignoring the recent regression caused by the shmem to VMA migration). > If I were designing this thing, and if I hadn't started trying to > implement it, my first thought would be that an enclave tracks its > linear address range, which is just a pair of numbers, and also keeps > track of a whole bunch of physical EPC pages, data structures, etc. > And that mmap() gets rejected unless the requested virtual address > matches the linear address range that the enclave wants and, aside > from that, just creates a VMA that keeps a reference to the enclave. > (And, for convenience, I suppose that the first mmap() call done > before any actual enclave setup happens could choose any address and > then cause the enclave to lock itself to that address, although a > regular anonymous PROT_NONE MAP_NORESERVE mapping would do just fine, > too.) And the driver would explicitly allow multiple different mms to > have the same enclave mapped. More importantly, a daemon could set up > an enclave without necessarily mapping it at all and then SCM_RIGHTS > the enclave over to the process that plans to run it. The current SGX_IOC_ENCLAVE_CREATE ioctl would be trivial to change to use this approach. Instead looking up VMA with an enclave instance it would create a new enclave instance. Then we could have SGX_IOC_ENCLAVE_ATTACH to attach an enclave to a VMA. This does not sound too complicated. > Now I'm sure this has all kinds of problems, such as the ISA possibly > making it rather obnoxious to add pages to the enclave without having > it mapped. But these operations could, in principle, be done by We do EADD in a kthread. What this would require to put current->mm into a request that it is processed by that thread. This would be doable with mmget(). The deadlock that Sean mentioned would not exist since closing VMAs is not bounded to the enclave life-cycle anymore. So at least non-swapping ISA is easy to fit to this framework. I can rework this for v19. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 12:15:47PM -0800, Dave Hansen wrote: > On 12/17/18 12:10 PM, Andy Lutomirski wrote: > >> There's no 'struct page' for enclave memory as it stands. That means no > >> page cache, and that means there's no 'struct address_space *mapping' in > >> the first place. > >> > >> Basically, the choice was made a long time ago to have SGX's memory > >> management live outside the core VM. I've waffled back and forth on it, > >> but I do still think this is the right way to do it. > > AFAICS a lack of struct page isn't a problem. The core code seems to > > understand that address_space objects might cover non-struct-page > > memory. Morally, enclave memory is a lot like hot-unpluggable PCI > > space. > > Yeah, this is true. The existing code seems to make it all the way from > unmap_mapping_range() down to zap_page_range() without 'struct page'. > > Overall, I think what Andy is saying here is that an open(/dev/sgx) > should give you a "unique" enclave fd. That fd can end up mapped into > one or more processes either via fork() or the other ways fds end up > getting handed around. mmap() of this fd would be *required* to be > MAP_SHARED. That means you don't need to support COW, and the semantics > are the same as any other MAP_SHARED mapping: children and parents and > anybody mmap()'ing it must all coordinate. > > This sounds interesting at least. It might lead to an unholy mess in > the driver, or it might be a great cleanup. But, it does sound like > something that would both potentially simplify the semantics and the > implementation. It's very similar to KVM's model, which has proven to be fairly robust, so I don't think it'll be an unholy mess (famous last words). It probably won't be a "great" cleanup per se, but it definitely should make the code more maintainable in the long run. The other interesting aspect of the enclave fd approach is that it would allow userspace to *execute* an enclave from multiple processes, so long as it did the necessary multiplexing of pthreads to enclave threads. I think SGX2 instructions (dynamic EPC management) would even allow adding new enclave threads on-demand.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 11:12:21AM -0800, Andy Lutomirski wrote: > On Mon, Dec 17, 2018 at 10:47 AM Dave Hansen wrote: > > > > On 12/17/18 10:43 AM, Jarkko Sakkinen wrote: > > > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > > >> I'm pretty sure doing mmget() would result in circular dependencies and > > >> a zombie enclave. In the do_exit() case where a task is abruptly killed: > > >> > > >> - __mmput() is never called because the enclave holds a ref > > >> - sgx_encl_release() is never be called because its VMAs hold refs > > >> - sgx_vma_close() is never called because __mmput()->exit_mmap() is > > >> blocked and the process itself is dead, i.e. won't unmap anything. > > > Right, it does, you are absolutely right. Tried it and removed the > > > commit already. > > > > > > Well, what we came up from your suggestion i.e. setting mm to NULL > > > and checking that is very subtle change and does not have any such > > > circular dependencies. We'll go with that. > > > > This all screams that you need to break out this code from the massive > > "18" patch and get the mm interactions reviewed more thoroughly. > > > > Also, no matter what method you go with, you have a bunch of commenting > > and changelogging to do here. > > I'm going to ask an obnoxious high-level question: why does an enclave > even refer to a specific mm? Primarily because that's what the code has "always" done. I can't speak for Jarkko, but I got involved with this joyful project long after the code was originally written. > If I were designing this thing, and if I hadn't started trying to > implement it, my first thought would be that an enclave tracks its > linear address range, which is just a pair of numbers, and also keeps > track of a whole bunch of physical EPC pages, data structures, etc. > And that mmap() gets rejected unless the requested virtual address > matches the linear address range that the enclave wants and, aside > from that, just creates a VMA that keeps a reference to the enclave. > (And, for convenience, I suppose that the first mmap() call done > before any actual enclave setup happens could choose any address and > then cause the enclave to lock itself to that address, although a > regular anonymous PROT_NONE MAP_NORESERVE mapping would do just fine, > too.) And the driver would explicitly allow multiple different mms to > have the same enclave mapped. More importantly, a daemon could set up > an enclave without necessarily mapping it at all and then SCM_RIGHTS > the enclave over to the process that plans to run it. Hmm, this could work, the obvious weirdness would be ensuring the linear range is available in the destination mm, but that'd be userspace's problem. I don't think we'd need to keep a reference to the enclave in the VMA. The enclave's ref could be held by the fd. Assuming the kernel is using its private mapping to access the enclave, that's all we'd need to be able to manipulate the enclave, e.g. reclaim EPC pages. Userspace would need to keep the fd alive in order to use the VMA, but that sort of goes without saying. The mm/VMA juggling today is for zapping/testing the correct PTEs, but as you pointed out in a different email we can use unmap_mapping_range(), with the enclave's fd being the source of the address space passed to unmap_mapping_range(). Removing a VMA simply means we don't need to zap it or test its age. > Now I'm sure this has all kinds of problems, such as the ISA possibly > making it rather obnoxious to add pages to the enclave without having > it mapped. But these operations could, in principle, be done by > having the enclave own a private mm that's used just for setup. While > this would be vaguely annoying, Nadav's still-pending-but-nearly-done > text_poke series adds all the infrastructure that's needed for the > kernel to manage little private mms. But some things get simpler -- > invalidating the enclave can presumably use the regular rmap APIs to > zap all the PTEs in all VMAs pointing into the enclave. We don't even need a private mm, we can (and already do) use the kernel's translations for ENCLS instructions. Hardware only enforces the linear address stuff when it's actually in enclave mode, i.e. executing the enclave. ENCLS instructions aren't subject to the ELRANGE checks and can use any VA->PA combination. > So I'm not saying that you shouldn't do it the way you are now, but I > do think that the changelog or at least some emails should explain > *why* the enclave needs to keep a pointer to the creating process's > mm. And, if you do keep the current model, it would be nice to > understand what happens if you do something awful like mremap()ing an > enclave, or calling madvise on it, or otherwise abusing the vma. Or > doing fork(), for that matter. > > I also find it suspicious that the various ioctl handlers > systematically ignore their "filep" parameters and instead use > find_vma() to find the relevant mm
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 09:33:22PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 10:48:58AM -0800, Sean Christopherson wrote: > > On Mon, Dec 17, 2018 at 08:43:33PM +0200, Jarkko Sakkinen wrote: > > > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > > > > I'm pretty sure doing mmget() would result in circular dependencies and > > > > a zombie enclave. In the do_exit() case where a task is abruptly > > > > killed: > > > > > > > > - __mmput() is never called because the enclave holds a ref > > > > - sgx_encl_release() is never be called because its VMAs hold refs > > > > - sgx_vma_close() is never called because __mmput()->exit_mmap() is > > > > blocked and the process itself is dead, i.e. won't unmap anything. > > > > > > Right, it does, you are absolutely right. Tried it and removed the > > > commit already. > > > > > > Well, what we came up from your suggestion i.e. setting mm to NULL > > > and checking that is very subtle change and does not have any such > > > circular dependencies. We'll go with that. > > > > We can't set mm to NULL as we need it to unregister the notifier, and > > I'm fairly certain attempting to unregister in the release callback > > will deadlock. > > Noticed that too. mmu_notifier_unregister() requires a valid mm. Both branches updated... /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 12:10 PM, Andy Lutomirski wrote: >> There's no 'struct page' for enclave memory as it stands. That means no >> page cache, and that means there's no 'struct address_space *mapping' in >> the first place. >> >> Basically, the choice was made a long time ago to have SGX's memory >> management live outside the core VM. I've waffled back and forth on it, >> but I do still think this is the right way to do it. > AFAICS a lack of struct page isn't a problem. The core code seems to > understand that address_space objects might cover non-struct-page > memory. Morally, enclave memory is a lot like hot-unpluggable PCI > space. Yeah, this is true. The existing code seems to make it all the way from unmap_mapping_range() down to zap_page_range() without 'struct page'. Overall, I think what Andy is saying here is that an open(/dev/sgx) should give you a "unique" enclave fd. That fd can end up mapped into one or more processes either via fork() or the other ways fds end up getting handed around. mmap() of this fd would be *required* to be MAP_SHARED. That means you don't need to support COW, and the semantics are the same as any other MAP_SHARED mapping: children and parents and anybody mmap()'ing it must all coordinate. This sounds interesting at least. It might lead to an unholy mess in the driver, or it might be a great cleanup. But, it does sound like something that would both potentially simplify the semantics and the implementation.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 12:03 PM Dave Hansen wrote: > > On 12/17/18 11:55 AM, Andy Lutomirski wrote: > >> You're effectively rebuilding reverse-mapping infrastructure here. It's > >> a frequent thing for the core VM to need to go from 'struct page' back > >> to the page tables mapping it. For that we go (logically) > >> page->{anon_vma,mapping}->vma->vm_mm->pagetable. > > This is a bit outside my expertise here, but doesn't > > unmap_mapping_range() do exactly what SGX wants? > > There's no 'struct page' for enclave memory as it stands. That means no > page cache, and that means there's no 'struct address_space *mapping' in > the first place. > > Basically, the choice was made a long time ago to have SGX's memory > management live outside the core VM. I've waffled back and forth on it, > but I do still think this is the right way to do it. AFAICS a lack of struct page isn't a problem. The core code seems to understand that address_space objects might cover non-struct-page memory. Morally, enclave memory is a lot like hot-unpluggable PCI space.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 11:55 AM, Andy Lutomirski wrote: >> You're effectively rebuilding reverse-mapping infrastructure here. It's >> a frequent thing for the core VM to need to go from 'struct page' back >> to the page tables mapping it. For that we go (logically) >> page->{anon_vma,mapping}->vma->vm_mm->pagetable. > This is a bit outside my expertise here, but doesn't > unmap_mapping_range() do exactly what SGX wants? There's no 'struct page' for enclave memory as it stands. That means no page cache, and that means there's no 'struct address_space *mapping' in the first place. Basically, the choice was made a long time ago to have SGX's memory management live outside the core VM. I've waffled back and forth on it, but I do still think this is the right way to do it.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 11:53 AM Dave Hansen wrote: > > On 12/17/18 11:49 AM, Jarkko Sakkinen wrote: > >> Yeah, the code is built to have one VMA and only one VMA per enclave. > >> You need to go over the origin of this restriction and what enforces this. > > It is before ECREATE but after that you can split it with mprotect(). > > > > Lets take an example. I'm not sure how we would acquire mm efficiently > > in sgx_encl_page_reclaim() otherwise than having it as a field in encl. > > You're effectively rebuilding reverse-mapping infrastructure here. It's > a frequent thing for the core VM to need to go from 'struct page' back > to the page tables mapping it. For that we go (logically) > page->{anon_vma,mapping}->vma->vm_mm->pagetable. This is a bit outside my expertise here, but doesn't unmap_mapping_range() do exactly what SGX wants?
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 11:25:47AM -0800, Andy Lutomirski wrote: > On Mon, Dec 17, 2018 at 11:17 AM Dave Hansen wrote: > > > > On 12/17/18 11:12 AM, Andy Lutomirski wrote: > > > So I'm not saying that you shouldn't do it the way you are now, but I > > > do think that the changelog or at least some emails should explain > > > *why* the enclave needs to keep a pointer to the creating process's > > > mm. And, if you do keep the current model, it would be nice to > > > understand what happens if you do something awful like mremap()ing an > > > enclave, or calling madvise on it, or otherwise abusing the vma. Or > > > doing fork(), for that matter. > > > > Yeah, the code is built to have one VMA and only one VMA per enclave. > > You need to go over the origin of this restriction and what enforces this. > > There is a sad historical reason that you may regret keeping this > restriction. There are plenty of pieces of code out there that think > it's reasonable to spawn a subprocess by calling fork() and then > execve(). (This is *not* a sensible thing to do. One should use > posix_spawn() or some CLONE_VM variant. But even fairly recent > posix_spawn() implementations will fork(). So the driver has to do > *something* sensible on fork() or a bunch of things that use SGX > unsuspectingly via, for example, PKCS #11, are going to be very sad. > I suppose you could make enclaves just not show up in the fork()ed > children, but then you have a different problem: creating an enclave > and then doing daemon() won't work. > > Yes, POSIX traditions are rather silly. ATM enclave VMAs are not copied on fork. Not sure how you would implement COW semantics with enclaves. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 11:49 AM, Jarkko Sakkinen wrote: >> Yeah, the code is built to have one VMA and only one VMA per enclave. >> You need to go over the origin of this restriction and what enforces this. > It is before ECREATE but after that you can split it with mprotect(). > > Lets take an example. I'm not sure how we would acquire mm efficiently > in sgx_encl_page_reclaim() otherwise than having it as a field in encl. You're effectively rebuilding reverse-mapping infrastructure here. It's a frequent thing for the core VM to need to go from 'struct page' back to the page tables mapping it. For that we go (logically) page->{anon_vma,mapping}->vma->vm_mm->pagetable. This, on the other hand, is trying to do page->encl->mm->pagetable. You could very easily have a VMA analog in there instead of jumping straight to the mm.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 11:17:49AM -0800, Dave Hansen wrote: > On 12/17/18 11:12 AM, Andy Lutomirski wrote: > > So I'm not saying that you shouldn't do it the way you are now, but I > > do think that the changelog or at least some emails should explain > > *why* the enclave needs to keep a pointer to the creating process's > > mm. And, if you do keep the current model, it would be nice to > > understand what happens if you do something awful like mremap()ing an > > enclave, or calling madvise on it, or otherwise abusing the vma. Or > > doing fork(), for that matter. > > Yeah, the code is built to have one VMA and only one VMA per enclave. > You need to go over the origin of this restriction and what enforces this. It is before ECREATE but after that you can split it with mprotect(). Lets take an example. I'm not sure how we would acquire mm efficiently in sgx_encl_page_reclaim() otherwise than having it as a field in encl. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 11:37 AM, Jarkko Sakkinen wrote: >> Suggestion: >> >> It looks like you only expect one VMA per enclave. Things go bonkers if >> this is not true. So, instead of storing encl->mm, don't. You can get >> the mm from vma->vm_mm and you could just store encl->vma instead. > The code actually supports having multiple VMAs per enclave. That seems at least somewhat at odds with this comment: > static void sgx_vma_open(struct vm_area_struct *vma) > { > struct sgx_encl *encl = vma->vm_private_data; > > if (!encl) > return; > > /* kref cannot underflow because ECREATE ioctl checks that there is > only > * one single VMA for the enclave before proceeding. > */ > kref_get(>refcount); > }
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 11:09:33AM -0800, Dave Hansen wrote: > On 12/17/18 10:48 AM, Sean Christopherson wrote: > > We can't set mm to NULL as we need it to unregister the notifier, and > > I'm fairly certain attempting to unregister in the release callback > > will deadlock. > > Suggestion: > > It looks like you only expect one VMA per enclave. Things go bonkers if > this is not true. So, instead of storing encl->mm, don't. You can get > the mm from vma->vm_mm and you could just store encl->vma instead. The code actually supports having multiple VMAs per enclave. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 10:46:25AM -0800, Sean Christopherson wrote: > On Mon, Dec 17, 2018 at 08:23:19PM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 10:09:57AM -0800, Sean Christopherson wrote: > > > No, EREMOVE should never fail if the enclave is being released, i.e. all > > > references to the enclave are gone. And failure during sgx_encl_release() > > > means we leaked an EPC page, which warrants a WARN. > > > > Right that what I was suspecting as swapper should hold a ref to the > > enclave while it is working on it. It is a programming error when this > > happens. > > > > Maybe change the boolean parameter to flags parameter have a flag to > > use sgx_free_page()? > > I tried that approach when I first split it to __sgx_free_page() and > sgx_free_page(), but IMO the code is more difficult to read and harder > to maintain since sgx_free_page() should be used except under special > circumstances, e.g. race with reclaim or the freeing is "untrusted", > i.e. requested by userspace via sgx_ioc_enclave_remove_pages(). I mean inside sgx_invalidate() call either __sgx_free_page() or sgx_free_page() depending on a flag. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 10:48:58AM -0800, Sean Christopherson wrote: > On Mon, Dec 17, 2018 at 08:43:33PM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > > > I'm pretty sure doing mmget() would result in circular dependencies and > > > a zombie enclave. In the do_exit() case where a task is abruptly killed: > > > > > > - __mmput() is never called because the enclave holds a ref > > > - sgx_encl_release() is never be called because its VMAs hold refs > > > - sgx_vma_close() is never called because __mmput()->exit_mmap() is > > > blocked and the process itself is dead, i.e. won't unmap anything. > > > > Right, it does, you are absolutely right. Tried it and removed the > > commit already. > > > > Well, what we came up from your suggestion i.e. setting mm to NULL > > and checking that is very subtle change and does not have any such > > circular dependencies. We'll go with that. > > We can't set mm to NULL as we need it to unregister the notifier, and > I'm fairly certain attempting to unregister in the release callback > will deadlock. Noticed that too. mmu_notifier_unregister() requires a valid mm. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 11:17 AM Dave Hansen wrote: > > On 12/17/18 11:12 AM, Andy Lutomirski wrote: > > So I'm not saying that you shouldn't do it the way you are now, but I > > do think that the changelog or at least some emails should explain > > *why* the enclave needs to keep a pointer to the creating process's > > mm. And, if you do keep the current model, it would be nice to > > understand what happens if you do something awful like mremap()ing an > > enclave, or calling madvise on it, or otherwise abusing the vma. Or > > doing fork(), for that matter. > > Yeah, the code is built to have one VMA and only one VMA per enclave. > You need to go over the origin of this restriction and what enforces this. There is a sad historical reason that you may regret keeping this restriction. There are plenty of pieces of code out there that think it's reasonable to spawn a subprocess by calling fork() and then execve(). (This is *not* a sensible thing to do. One should use posix_spawn() or some CLONE_VM variant. But even fairly recent posix_spawn() implementations will fork(). So the driver has to do *something* sensible on fork() or a bunch of things that use SGX unsuspectingly via, for example, PKCS #11, are going to be very sad. I suppose you could make enclaves just not show up in the fork()ed children, but then you have a different problem: creating an enclave and then doing daemon() won't work. Yes, POSIX traditions are rather silly.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 11:12 AM, Andy Lutomirski wrote: > So I'm not saying that you shouldn't do it the way you are now, but I > do think that the changelog or at least some emails should explain > *why* the enclave needs to keep a pointer to the creating process's > mm. And, if you do keep the current model, it would be nice to > understand what happens if you do something awful like mremap()ing an > enclave, or calling madvise on it, or otherwise abusing the vma. Or > doing fork(), for that matter. Yeah, the code is built to have one VMA and only one VMA per enclave. You need to go over the origin of this restriction and what enforces this.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 10:47 AM Dave Hansen wrote: > > On 12/17/18 10:43 AM, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > >> I'm pretty sure doing mmget() would result in circular dependencies and > >> a zombie enclave. In the do_exit() case where a task is abruptly killed: > >> > >> - __mmput() is never called because the enclave holds a ref > >> - sgx_encl_release() is never be called because its VMAs hold refs > >> - sgx_vma_close() is never called because __mmput()->exit_mmap() is > >> blocked and the process itself is dead, i.e. won't unmap anything. > > Right, it does, you are absolutely right. Tried it and removed the > > commit already. > > > > Well, what we came up from your suggestion i.e. setting mm to NULL > > and checking that is very subtle change and does not have any such > > circular dependencies. We'll go with that. > > This all screams that you need to break out this code from the massive > "18" patch and get the mm interactions reviewed more thoroughly. > > Also, no matter what method you go with, you have a bunch of commenting > and changelogging to do here. I'm going to ask an obnoxious high-level question: why does an enclave even refer to a specific mm? If I were designing this thing, and if I hadn't started trying to implement it, my first thought would be that an enclave tracks its linear address range, which is just a pair of numbers, and also keeps track of a whole bunch of physical EPC pages, data structures, etc. And that mmap() gets rejected unless the requested virtual address matches the linear address range that the enclave wants and, aside from that, just creates a VMA that keeps a reference to the enclave. (And, for convenience, I suppose that the first mmap() call done before any actual enclave setup happens could choose any address and then cause the enclave to lock itself to that address, although a regular anonymous PROT_NONE MAP_NORESERVE mapping would do just fine, too.) And the driver would explicitly allow multiple different mms to have the same enclave mapped. More importantly, a daemon could set up an enclave without necessarily mapping it at all and then SCM_RIGHTS the enclave over to the process that plans to run it. Now I'm sure this has all kinds of problems, such as the ISA possibly making it rather obnoxious to add pages to the enclave without having it mapped. But these operations could, in principle, be done by having the enclave own a private mm that's used just for setup. While this would be vaguely annoying, Nadav's still-pending-but-nearly-done text_poke series adds all the infrastructure that's needed for the kernel to manage little private mms. But some things get simpler -- invalidating the enclave can presumably use the regular rmap APIs to zap all the PTEs in all VMAs pointing into the enclave. So I'm not saying that you shouldn't do it the way you are now, but I do think that the changelog or at least some emails should explain *why* the enclave needs to keep a pointer to the creating process's mm. And, if you do keep the current model, it would be nice to understand what happens if you do something awful like mremap()ing an enclave, or calling madvise on it, or otherwise abusing the vma. Or doing fork(), for that matter. I also find it suspicious that the various ioctl handlers systematically ignore their "filep" parameters and instead use find_vma() to find the relevant mm data structures. That seems backwards.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 10:48 AM, Sean Christopherson wrote: > We can't set mm to NULL as we need it to unregister the notifier, and > I'm fairly certain attempting to unregister in the release callback > will deadlock. Suggestion: It looks like you only expect one VMA per enclave. Things go bonkers if this is not true. So, instead of storing encl->mm, don't. You can get the mm from vma->vm_mm and you could just store encl->vma instead. Doing that, you could even axe encl->base and encl->size, I think because you just get those from the VMA itself. That makes the relationship clearer: 1 VMA per enclave. We also implicitly understand that if you have a VMA, you implicitly have a ref to the mm *and* the VMA is immutable. If there were ever a path where encl->vma wasn't immutable, we'd have a bug (or load of bugs) somewhere, right?
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 08:43:33PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > > I'm pretty sure doing mmget() would result in circular dependencies and > > a zombie enclave. In the do_exit() case where a task is abruptly killed: > > > > - __mmput() is never called because the enclave holds a ref > > - sgx_encl_release() is never be called because its VMAs hold refs > > - sgx_vma_close() is never called because __mmput()->exit_mmap() is > > blocked and the process itself is dead, i.e. won't unmap anything. > > Right, it does, you are absolutely right. Tried it and removed the > commit already. > > Well, what we came up from your suggestion i.e. setting mm to NULL > and checking that is very subtle change and does not have any such > circular dependencies. We'll go with that. We can't set mm to NULL as we need it to unregister the notifier, and I'm fairly certain attempting to unregister in the release callback will deadlock.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 10:43 AM, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: >> I'm pretty sure doing mmget() would result in circular dependencies and >> a zombie enclave. In the do_exit() case where a task is abruptly killed: >> >> - __mmput() is never called because the enclave holds a ref >> - sgx_encl_release() is never be called because its VMAs hold refs >> - sgx_vma_close() is never called because __mmput()->exit_mmap() is >> blocked and the process itself is dead, i.e. won't unmap anything. > Right, it does, you are absolutely right. Tried it and removed the > commit already. > > Well, what we came up from your suggestion i.e. setting mm to NULL > and checking that is very subtle change and does not have any such > circular dependencies. We'll go with that. This all screams that you need to break out this code from the massive "18" patch and get the mm interactions reviewed more thoroughly. Also, no matter what method you go with, you have a bunch of commenting and changelogging to do here.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 08:23:19PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 10:09:57AM -0800, Sean Christopherson wrote: > > No, EREMOVE should never fail if the enclave is being released, i.e. all > > references to the enclave are gone. And failure during sgx_encl_release() > > means we leaked an EPC page, which warrants a WARN. > > Right that what I was suspecting as swapper should hold a ref to the > enclave while it is working on it. It is a programming error when this > happens. > > Maybe change the boolean parameter to flags parameter have a flag to > use sgx_free_page()? I tried that approach when I first split it to __sgx_free_page() and sgx_free_page(), but IMO the code is more difficult to read and harder to maintain since sgx_free_page() should be used except under special circumstances, e.g. race with reclaim or the freeing is "untrusted", i.e. requested by userspace via sgx_ioc_enclave_remove_pages(). > > > That makes sense. > > What do you think of Dave's proposal? > > /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 10:36:13AM -0800, Sean Christopherson wrote: > I'm pretty sure doing mmget() would result in circular dependencies and > a zombie enclave. In the do_exit() case where a task is abruptly killed: > > - __mmput() is never called because the enclave holds a ref > - sgx_encl_release() is never be called because its VMAs hold refs > - sgx_vma_close() is never called because __mmput()->exit_mmap() is > blocked and the process itself is dead, i.e. won't unmap anything. Right, it does, you are absolutely right. Tried it and removed the commit already. Well, what we came up from your suggestion i.e. setting mm to NULL and checking that is very subtle change and does not have any such circular dependencies. We'll go with that. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 08:01:02PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 09:45:40AM -0800, Dave Hansen wrote: > > > +struct sgx_encl *sgx_encl_alloc(struct sgx_secs *secs) > > > +{ > > ... > > > + kref_init(>refcount); > > > + INIT_LIST_HEAD(>add_page_reqs); > > > + INIT_RADIX_TREE(>page_tree, GFP_KERNEL); > > > + mutex_init(>lock); > > > + INIT_WORK(>add_page_work, sgx_add_page_worker); > > > + > > > + encl->mm = current->mm; <-> + > > > encl->base = secs->base; > > > + encl->size = secs->size; > > > + encl->ssaframesize = secs->ssa_frame_size; > > > + encl->backing = backing; > > > + > > > + return encl; > > > +} > > > > How is this OK without taking a reference on the mm? It's subtle and the ordering is all kinds of weird, but technically we are taking a reference on mm when the mmu_notifier is registered in sgx_encl_create(). sgx_encl_alloc() and sgx_encl_create() are always called in tandem and with mm->mm_users > 0, so we'll never use encl->mm without holding a reference to mm. We need to comment the weirdness or maybe register the notifier before > > I have a feeling a bunch of your bugs with the mmu notifiers and so > > forth are because the refcounting is wrong here. Eh, not really. Maybe the mmu_notifier is more subtle, e.g. calling do_unmap() after mmput() would be quite obvious, but there's no fundamental bug, we just haven't needed to touch VMAs during release prior to moving away from shmem. > > Sean's SGX_ENCL_MM_RELEASED would, I think be unnecessary if you just > > take a refcount here and release it when the enclave is destroyed. > > Right, atomic_inc(encl->mm->count) here and once when releasing. > > The we would not even need the whole mmu notifier in the first place. I'm pretty sure doing mmget() would result in circular dependencies and a zombie enclave. In the do_exit() case where a task is abruptly killed: - __mmput() is never called because the enclave holds a ref - sgx_encl_release() is never be called because its VMAs hold refs - sgx_vma_close() is never called because __mmput()->exit_mmap() is blocked and the process itself is dead, i.e. won't unmap anything.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 10:07:08AM -0800, Dave Hansen wrote: > On 12/17/18 10:01 AM, Jarkko Sakkinen wrote: > >>> + encl->mm = current->mm; <-> + > >>> encl->base = secs->base; > >>> + encl->size = secs->size; > >>> + encl->ssaframesize = secs->ssa_frame_size; > >>> + encl->backing = backing; > >>> + > >>> + return encl; > >>> +} > >> How is this OK without taking a reference on the mm? > >> > >> I have a feeling a bunch of your bugs with the mmu notifiers and so > >> forth are because the refcounting is wrong here. > >> > >> Sean's SGX_ENCL_MM_RELEASED would, I think be unnecessary if you just > >> take a refcount here and release it when the enclave is destroyed. > > Right, atomic_inc(encl->mm->count) here and once when releasing. > > > > The we would not even need the whole mmu notifier in the first place. > > Please use mmget()/mmput(). There's now a patch to test on top of the master. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 10:09:57AM -0800, Sean Christopherson wrote: > No, EREMOVE should never fail if the enclave is being released, i.e. all > references to the enclave are gone. And failure during sgx_encl_release() > means we leaked an EPC page, which warrants a WARN. Right that what I was suspecting as swapper should hold a ref to the enclave while it is working on it. It is a programming error when this happens. Maybe change the boolean parameter to flags parameter have a flag to use sgx_free_page()? > That makes sense. What do you think of Dave's proposal? /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 07:49:35PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 09:31:06AM -0800, Sean Christopherson wrote: > > This doesn't work as-is. sgx_encl_release() needs to use sgx_free_page() > > and not __sgx_free_page() so that we get a WARN() if the page can't be > > freed. sgx_invalidate() needs to use __sgx_free_page() as freeing a page > > can fail due to running concurrently with reclaim. I'll play around with > > the code a bit, there's probably a fairly clean way to share code between > > the two flows. > > Hmm... but why issue a warning in that case? It should be legit > behaviour. No, EREMOVE should never fail if the enclave is being released, i.e. all references to the enclave are gone. And failure during sgx_encl_release() means we leaked an EPC page, which warrants a WARN. The only legitimate reason __sgx_free_page() can fail in sgx_invalidate() is because a page might be in the process of being reclaimed. We could theoretically WARN on EREMOVE failure in that case, but it'd make the code a little fragile and it's not "fatal" in the sense that we get a second chance to free the page during sgx_encl_release(). And unless I missed something, using sgx_invalidate() means were' leaking all sgx_encl_page structs as well as the radix tree entries. > > sgx_encl_release_worker() calls do_unmap() without checking the validity > > of the page tables[1]. As is, the code doesn't even guarantee mm_struct > > itself is valid. > > > > The easiest fix I can think of is to add a SGX_ENCL_MM_RELEASED flag > > that is set along with SGX_ENCL_DEAD in sgx_mmu_notifier_release(), and > > only call do_unmap() if SGX_ENCL_MM_RELEASED is false. Note that this > > means we cant unregister the mmu_notifier until after do_unmap(), but > > that's true no matter what since we're relying on the mmu_notifier to > > hold a reference to mm_struct. Patch attached. > > OK, the fix change makes sense but I'm thinking that would it be a > better idea just to set mm NULL and check that instead? That makes sense.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On 12/17/18 10:01 AM, Jarkko Sakkinen wrote: >>> + encl->mm = current->mm; <-> + >>> encl->base = secs->base; >>> + encl->size = secs->size; >>> + encl->ssaframesize = secs->ssa_frame_size; >>> + encl->backing = backing; >>> + >>> + return encl; >>> +} >> How is this OK without taking a reference on the mm? >> >> I have a feeling a bunch of your bugs with the mmu notifiers and so >> forth are because the refcounting is wrong here. >> >> Sean's SGX_ENCL_MM_RELEASED would, I think be unnecessary if you just >> take a refcount here and release it when the enclave is destroyed. > Right, atomic_inc(encl->mm->count) here and once when releasing. > > The we would not even need the whole mmu notifier in the first place. Please use mmget()/mmput().
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 09:45:40AM -0800, Dave Hansen wrote: > > +struct sgx_encl *sgx_encl_alloc(struct sgx_secs *secs) > > +{ > ... > > + kref_init(>refcount); > > + INIT_LIST_HEAD(>add_page_reqs); > > + INIT_RADIX_TREE(>page_tree, GFP_KERNEL); > > + mutex_init(>lock); > > + INIT_WORK(>add_page_work, sgx_add_page_worker); > > + > > + encl->mm = current->mm; <-> + > > encl->base = secs->base; > > + encl->size = secs->size; > > + encl->ssaframesize = secs->ssa_frame_size; > > + encl->backing = backing; > > + > > + return encl; > > +} > > How is this OK without taking a reference on the mm? > > I have a feeling a bunch of your bugs with the mmu notifiers and so > forth are because the refcounting is wrong here. > > Sean's SGX_ENCL_MM_RELEASED would, I think be unnecessary if you just > take a refcount here and release it when the enclave is destroyed. Right, atomic_inc(encl->mm->count) here and once when releasing. The we would not even need the whole mmu notifier in the first place. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 09:31:06AM -0800, Sean Christopherson wrote: > This doesn't work as-is. sgx_encl_release() needs to use sgx_free_page() > and not __sgx_free_page() so that we get a WARN() if the page can't be > freed. sgx_invalidate() needs to use __sgx_free_page() as freeing a page > can fail due to running concurrently with reclaim. I'll play around with > the code a bit, there's probably a fairly clean way to share code between > the two flows. Hmm... but why issue a warning in that case? It should be legit behaviour. > sgx_encl_release_worker() calls do_unmap() without checking the validity > of the page tables[1]. As is, the code doesn't even guarantee mm_struct > itself is valid. > > The easiest fix I can think of is to add a SGX_ENCL_MM_RELEASED flag > that is set along with SGX_ENCL_DEAD in sgx_mmu_notifier_release(), and > only call do_unmap() if SGX_ENCL_MM_RELEASED is false. Note that this > means we cant unregister the mmu_notifier until after do_unmap(), but > that's true no matter what since we're relying on the mmu_notifier to > hold a reference to mm_struct. Patch attached. OK, the fix change makes sense but I'm thinking that would it be a better idea just to set mm NULL and check that instead? /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
> +struct sgx_encl *sgx_encl_alloc(struct sgx_secs *secs) > +{ ... > + kref_init(>refcount); > + INIT_LIST_HEAD(>add_page_reqs); > + INIT_RADIX_TREE(>page_tree, GFP_KERNEL); > + mutex_init(>lock); > + INIT_WORK(>add_page_work, sgx_add_page_worker); > + > + encl->mm = current->mm; <-> + > encl->base = secs->base; > + encl->size = secs->size; > + encl->ssaframesize = secs->ssa_frame_size; > + encl->backing = backing; > + > + return encl; > +} How is this OK without taking a reference on the mm? I have a feeling a bunch of your bugs with the mmu notifiers and so forth are because the refcounting is wrong here. Sean's SGX_ENCL_MM_RELEASED would, I think be unnecessary if you just take a refcount here and release it when the enclave is destroyed.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 04:08:11PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 03:39:28PM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 03:28:59PM +0200, Jarkko Sakkinen wrote: > > > On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: > > > > [ 504.149548] [ cut here ] > > > > [ 504.149550] kernel BUG at > > > > /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! > > > > [ 504.150288] invalid opcode: [#1] SMP > > > > [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted > > > > 4.20.0-rc2+ #267 > > > > [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > 0.0.0 02/06/2015 > > > > [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker > > > > [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820 > > > > [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 00 > > > > 45 31 e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f > > > > c3 <0f> 0b 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 e8 fe ee > > > > [ 504.154109] RSP: :c94ebd60 EFLAGS: 00010206 > > > > [ 504.154535] RAX: 7fd92ef7e000 RBX: 888467af16c0 RCX: > > > > 888467af16e0 > > > > [ 504.155104] RDX: 888458fd09e0 RSI: 7fd954021000 RDI: > > > > 88846bf9e798 > > > > [ 504.155673] RBP: 888467af1480 R08: 88845bea2000 R09: > > > > > > > > [ 504.156242] R10: 8000 R11: fefefefefefefeff R12: > > > > > > > > [ 504.156810] R13: 88846bf9e790 R14: 888467af1b70 R15: > > > > 888467af1b60 > > > > [ 504.157378] FS: () GS:88846f70() > > > > knlGS: > > > > [ 504.158021] CS: 0010 DS: ES: CR0: 80050033 > > > > [ 504.158483] CR2: 7f2c56e99000 CR3: 05009001 CR4: > > > > 00360ee0 > > > > [ 504.159054] DR0: DR1: DR2: > > > > > > > > [ 504.159623] DR3: DR6: fffe0ff0 DR7: > > > > 0400 > > > > [ 504.160193] Call Trace: > > > > [ 504.160406] __split_vma+0x16f/0x180 > > > > [ 504.160706] ? __switch_to_asm+0x40/0x70 > > > > [ 504.161024] __do_munmap+0xfb/0x450 > > > > [ 504.161308] sgx_encl_release_worker+0x44/0x70 > > > > [ 504.161675] process_one_work+0x200/0x3f0 > > > > [ 504.162004] worker_thread+0x2d/0x3d0 > > > > [ 504.162301] ? process_one_work+0x3f0/0x3f0 > > > > [ 504.162645] kthread+0x113/0x130 > > > > [ 504.162912] ? kthread_park+0x90/0x90 > > > > [ 504.163209] ret_from_fork+0x35/0x40 > > > > [ 504.163503] Modules linked in: bridge stp llc > > > > [ 504.163866] ---[ end trace 83076139fc25e3e0 ]--- > > > > > > There was a race with release and swapping code that I thought I fixed, > > > and this is looks like a race there. Have to recheck what I did not > > > consider. Anyway, though to share this if you have time to look at it. > > > That is the part where something is now unsync most probably. > > > > I think I found it. I was careless to make sgx_encl_release() to use > > sgx_invalidate(), which does not delete pages in the case when enclave > > is already marked as dead. This was after I had fixed the race that I > > had there in the first place. That is why I was puzzled why it suddenly > > reappeared. > > > > Would be nice to use sgx_invalidate() also in release for consistency in > > semantics sake so maybe just delete this: > > > > if (encl->flags & SGX_ENCL_DEAD) > > return; This doesn't work as-is. sgx_encl_release() needs to use sgx_free_page() and not __sgx_free_page() so that we get a WARN() if the page can't be freed. sgx_invalidate() needs to use __sgx_free_page() as freeing a page can fail due to running concurrently with reclaim. I'll play around with the code a bit, there's probably a fairly clean way to share code between the two flows. > > Updated master, not at this point next. Still broken (as Greg's parallel email points out). sgx_encl_release_worker() calls do_unmap() without checking the validity of the page tables[1]. As is, the code doesn't even guarantee mm_struct itself is valid. The easiest fix I can think of is to add a SGX_ENCL_MM_RELEASED flag that is set along with SGX_ENCL_DEAD in sgx_mmu_notifier_release(), and only call do_unmap() if SGX_ENCL_MM_RELEASED is false. Note that this means we cant unregister the mmu_notifier until after do_unmap(), but that's true no matter what since we're relying on the mmu_notifier to hold a reference to mm_struct. Patch attached. [1] https://www.spinics.net/lists/dri-devel/msg186827.html >From 7cfdf34ec5b70392216b24853d6b8cc5e3192a92 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Mon, 17 Dec 2018 09:21:14 -0800 Subject: [PATCH] x86/sgx: Do not attempt to unmap enclave VMAs if mm_struct is defunct Add a flag, SGX_ENCL_MM_RELEASED, to explicitly track the lifecycle of the enclave's associated
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 04:13:15PM +0200, Jarkko Sakkinen wrote: Good morning to everyone. > On Mon, Dec 17, 2018 at 04:08:11PM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 03:39:28PM +0200, Jarkko Sakkinen wrote: > > > On Mon, Dec 17, 2018 at 03:28:59PM +0200, Jarkko Sakkinen wrote: > > > > On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: > > > > > [ 504.149548] [ cut here ] > > > > > [ 504.149550] kernel BUG at > > > > > /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! > > > > > [ 504.150288] invalid opcode: [#1] SMP > > > > > [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted > > > > > 4.20.0-rc2+ #267 > > > > > [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > > > > > BIOS 0.0.0 02/06/2015 > > > > > [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker > > > > > [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820 > > > > > [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 > > > > > 00 45 31 e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e > > > > > 41 5f c3 <0f> 0b 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 > > > > > e8 fe ee > > > > > [ 504.154109] RSP: :c94ebd60 EFLAGS: 00010206 > > > > > [ 504.154535] RAX: 7fd92ef7e000 RBX: 888467af16c0 RCX: > > > > > 888467af16e0 > > > > > [ 504.155104] RDX: 888458fd09e0 RSI: 7fd954021000 RDI: > > > > > 88846bf9e798 > > > > > [ 504.155673] RBP: 888467af1480 R08: 88845bea2000 R09: > > > > > > > > > > [ 504.156242] R10: 8000 R11: fefefefefefefeff R12: > > > > > > > > > > [ 504.156810] R13: 88846bf9e790 R14: 888467af1b70 R15: > > > > > 888467af1b60 > > > > > [ 504.157378] FS: () GS:88846f70() > > > > > knlGS: > > > > > [ 504.158021] CS: 0010 DS: ES: CR0: 80050033 > > > > > [ 504.158483] CR2: 7f2c56e99000 CR3: 05009001 CR4: > > > > > 00360ee0 > > > > > [ 504.159054] DR0: DR1: DR2: > > > > > > > > > > [ 504.159623] DR3: DR6: fffe0ff0 DR7: > > > > > 0400 > > > > > [ 504.160193] Call Trace: > > > > > [ 504.160406] __split_vma+0x16f/0x180 > > > > > [ 504.160706] ? __switch_to_asm+0x40/0x70 > > > > > [ 504.161024] __do_munmap+0xfb/0x450 > > > > > [ 504.161308] sgx_encl_release_worker+0x44/0x70 > > > > > [ 504.161675] process_one_work+0x200/0x3f0 > > > > > [ 504.162004] worker_thread+0x2d/0x3d0 > > > > > [ 504.162301] ? process_one_work+0x3f0/0x3f0 > > > > > [ 504.162645] kthread+0x113/0x130 > > > > > [ 504.162912] ? kthread_park+0x90/0x90 > > > > > [ 504.163209] ret_from_fork+0x35/0x40 > > > > > [ 504.163503] Modules linked in: bridge stp llc > > > > > [ 504.163866] ---[ end trace 83076139fc25e3e0 ]--- > > > > There was a race with release and swapping code that I thought > > > > I fixed, and this is looks like a race there. Have to recheck > > > > what I did not consider. Anyway, though to share this if you > > > > have time to look at it. That is the part where something is > > > > now unsync most probably. > > > I think I found it. I was careless to make sgx_encl_release() to > > > use sgx_invalidate(), which does not delete pages in the case > > > when enclave is already marked as dead. This was after I had > > > fixed the race that I had there in the first place. That is why > > > I was puzzled why it suddenly reappeared. > > > Would be nice to use sgx_invalidate() also in release for consistency in > > > semantics sake so maybe just delete this: > > > > > > if (encl->flags & SGX_ENCL_DEAD) > > > return; > > > > Updated master, not at this point next. > If I checked this right was that mmu_notifier_unregister() cause > DEAD to set, and thus when sgx_invalidate() is executed, it returns > without doing anything... On a pristine jarkko-sgx/next local branch we commented out the 'if (encl->flags & SGX_ENCL_DEAD) return' clause in the following file/function: arch/x86/kernel/cpu/sgx/driver/encl.c:sgx_invalidate() And tested the kernel. This fix seems to prevent the memory manager from getting catastrophically corrupted but the EINIT ioctl still fails. On the first invocation after a fresh boot the EINIT ioctl returns -1. On subsequent invocations of the loader it returns EBUSY. Every 8-10 invocations we get the -1 (EPERM -?) from the EINIT call and then it returns to issueing EBUSY. Here is a representative call trace from the loader utility: --- address: 7ff5cbe0, create address: 7ff5cbe0 Non-token initialization requested. EINIT retn: -1 / No error information [SGXenclave.c,init_enclave,652]: Error location. [sgx-load.c,main,180]: Error location. address: 7f425520, create address: 7f425520
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sat, Dec 15, 2018 at 05:22:31PM -0600, Dr. Greg wrote: > On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: > > Good afternoon, I hope the weekend is going well for everyone. > > > On Fri, Dec 14, 2018 at 05:59:17PM -0600, Dr. Greg wrote: > > > On Wed, Dec 12, 2018 at 08:00:36PM +0200, Jarkko Sakkinen wrote: > > > > > > Good evening, I hope the week has gone well for everyone. > > > > > > > On Mon, Dec 10, 2018 at 04:49:08AM -0600, Dr. Greg wrote: > > > > > In the meantime, I wanted to confirm that your jarkko-sgx/master > > > > > branch contains the proposed driver that is headed upstream. > > > > > Before adding the SFLC patches we thought it best to run the > > > > > driver through some testing in order to verify that any problems > > > > > we generated where attributable to our work and not the base > > > > > driver. > > > > > > > > The master branch is by definition unstable at the moment i.e. it > > > > can sometimes (not often) contain unfinished changes. Use next for > > > > testing. I update next when I consider the master contents "stable > > > > enough". > > > > > > I noticed in the last day or so that you appeared to sync > > > jarkko-sgx/master with jarkko-sgx/next, so I checked out a local > > > branch against jarkko-sgx/next and ran it against our unit tests. > > > Based on what we are seeing the driver is still experiencing issues > > > with initialization of a non-trivial enclave. > > > master branch is broken, looks like the VMA code Jarkko is reworking is > > buggy. I should be able to help debug this next week. > > > > [ 504.149548] [ cut here ] > > [ 504.149550] kernel BUG at > > /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! > > Rodger, dodger. > > Let us know when you think you have something working pushed up into > one of the branches and we will put it on the bench here in the lab > and see what our runtime is able to do with it. > > BTW, your new vDSO work appears to be shaping up well. Just out of > curiosity though, how are you testing and validating the new vDSO > based exception handler if it isn't possible to initialize and run an > enclave with the new driver? Cherry-pick the patches to a stable version of the driver, the vDSO code doesn't use any of the SGX APIs.
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 04:08:11PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 03:39:28PM +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 17, 2018 at 03:28:59PM +0200, Jarkko Sakkinen wrote: > > > On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: > > > > [ 504.149548] [ cut here ] > > > > [ 504.149550] kernel BUG at > > > > /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! > > > > [ 504.150288] invalid opcode: [#1] SMP > > > > [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted > > > > 4.20.0-rc2+ #267 > > > > [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > 0.0.0 02/06/2015 > > > > [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker > > > > [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820 > > > > [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 00 > > > > 45 31 e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f > > > > c3 <0f> 0b 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 e8 fe ee > > > > [ 504.154109] RSP: :c94ebd60 EFLAGS: 00010206 > > > > [ 504.154535] RAX: 7fd92ef7e000 RBX: 888467af16c0 RCX: > > > > 888467af16e0 > > > > [ 504.155104] RDX: 888458fd09e0 RSI: 7fd954021000 RDI: > > > > 88846bf9e798 > > > > [ 504.155673] RBP: 888467af1480 R08: 88845bea2000 R09: > > > > > > > > [ 504.156242] R10: 8000 R11: fefefefefefefeff R12: > > > > > > > > [ 504.156810] R13: 88846bf9e790 R14: 888467af1b70 R15: > > > > 888467af1b60 > > > > [ 504.157378] FS: () GS:88846f70() > > > > knlGS: > > > > [ 504.158021] CS: 0010 DS: ES: CR0: 80050033 > > > > [ 504.158483] CR2: 7f2c56e99000 CR3: 05009001 CR4: > > > > 00360ee0 > > > > [ 504.159054] DR0: DR1: DR2: > > > > > > > > [ 504.159623] DR3: DR6: fffe0ff0 DR7: > > > > 0400 > > > > [ 504.160193] Call Trace: > > > > [ 504.160406] __split_vma+0x16f/0x180 > > > > [ 504.160706] ? __switch_to_asm+0x40/0x70 > > > > [ 504.161024] __do_munmap+0xfb/0x450 > > > > [ 504.161308] sgx_encl_release_worker+0x44/0x70 > > > > [ 504.161675] process_one_work+0x200/0x3f0 > > > > [ 504.162004] worker_thread+0x2d/0x3d0 > > > > [ 504.162301] ? process_one_work+0x3f0/0x3f0 > > > > [ 504.162645] kthread+0x113/0x130 > > > > [ 504.162912] ? kthread_park+0x90/0x90 > > > > [ 504.163209] ret_from_fork+0x35/0x40 > > > > [ 504.163503] Modules linked in: bridge stp llc > > > > [ 504.163866] ---[ end trace 83076139fc25e3e0 ]--- > > > > > > There was a race with release and swapping code that I thought I fixed, > > > and this is looks like a race there. Have to recheck what I did not > > > consider. Anyway, though to share this if you have time to look at it. > > > That is the part where something is now unsync most probably. > > > > I think I found it. I was careless to make sgx_encl_release() to use > > sgx_invalidate(), which does not delete pages in the case when enclave > > is already marked as dead. This was after I had fixed the race that I > > had there in the first place. That is why I was puzzled why it suddenly > > reappeared. > > > > Would be nice to use sgx_invalidate() also in release for consistency in > > semantics sake so maybe just delete this: > > > > if (encl->flags & SGX_ENCL_DEAD) > > return; > > Updated master, not at this point next. If I checked this right was that mmu_notifier_unregister() cause DEAD to set, and thus when sgx_invalidate() is executed, it returns without doing anything... /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 03:39:28PM +0200, Jarkko Sakkinen wrote: > On Mon, Dec 17, 2018 at 03:28:59PM +0200, Jarkko Sakkinen wrote: > > On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: > > > [ 504.149548] [ cut here ] > > > [ 504.149550] kernel BUG at > > > /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! > > > [ 504.150288] invalid opcode: [#1] SMP > > > [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted > > > 4.20.0-rc2+ #267 > > > [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > 0.0.0 02/06/2015 > > > [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker > > > [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820 > > > [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 00 > > > 45 31 e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f > > > c3 <0f> 0b 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 e8 fe ee > > > [ 504.154109] RSP: :c94ebd60 EFLAGS: 00010206 > > > [ 504.154535] RAX: 7fd92ef7e000 RBX: 888467af16c0 RCX: > > > 888467af16e0 > > > [ 504.155104] RDX: 888458fd09e0 RSI: 7fd954021000 RDI: > > > 88846bf9e798 > > > [ 504.155673] RBP: 888467af1480 R08: 88845bea2000 R09: > > > > > > [ 504.156242] R10: 8000 R11: fefefefefefefeff R12: > > > > > > [ 504.156810] R13: 88846bf9e790 R14: 888467af1b70 R15: > > > 888467af1b60 > > > [ 504.157378] FS: () GS:88846f70() > > > knlGS: > > > [ 504.158021] CS: 0010 DS: ES: CR0: 80050033 > > > [ 504.158483] CR2: 7f2c56e99000 CR3: 05009001 CR4: > > > 00360ee0 > > > [ 504.159054] DR0: DR1: DR2: > > > > > > [ 504.159623] DR3: DR6: fffe0ff0 DR7: > > > 0400 > > > [ 504.160193] Call Trace: > > > [ 504.160406] __split_vma+0x16f/0x180 > > > [ 504.160706] ? __switch_to_asm+0x40/0x70 > > > [ 504.161024] __do_munmap+0xfb/0x450 > > > [ 504.161308] sgx_encl_release_worker+0x44/0x70 > > > [ 504.161675] process_one_work+0x200/0x3f0 > > > [ 504.162004] worker_thread+0x2d/0x3d0 > > > [ 504.162301] ? process_one_work+0x3f0/0x3f0 > > > [ 504.162645] kthread+0x113/0x130 > > > [ 504.162912] ? kthread_park+0x90/0x90 > > > [ 504.163209] ret_from_fork+0x35/0x40 > > > [ 504.163503] Modules linked in: bridge stp llc > > > [ 504.163866] ---[ end trace 83076139fc25e3e0 ]--- > > > > There was a race with release and swapping code that I thought I fixed, > > and this is looks like a race there. Have to recheck what I did not > > consider. Anyway, though to share this if you have time to look at it. > > That is the part where something is now unsync most probably. > > I think I found it. I was careless to make sgx_encl_release() to use > sgx_invalidate(), which does not delete pages in the case when enclave > is already marked as dead. This was after I had fixed the race that I > had there in the first place. That is why I was puzzled why it suddenly > reappeared. > > Would be nice to use sgx_invalidate() also in release for consistency in > semantics sake so maybe just delete this: > > if (encl->flags & SGX_ENCL_DEAD) > return; Updated master, not at this point next. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 17, 2018 at 03:28:59PM +0200, Jarkko Sakkinen wrote: > On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: > > [ 504.149548] [ cut here ] > > [ 504.149550] kernel BUG at > > /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! > > [ 504.150288] invalid opcode: [#1] SMP > > [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted 4.20.0-rc2+ > > #267 > > [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > 0.0.0 02/06/2015 > > [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker > > [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820 > > [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 00 45 > > 31 e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 > > <0f> 0b 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 e8 fe ee > > [ 504.154109] RSP: :c94ebd60 EFLAGS: 00010206 > > [ 504.154535] RAX: 7fd92ef7e000 RBX: 888467af16c0 RCX: > > 888467af16e0 > > [ 504.155104] RDX: 888458fd09e0 RSI: 7fd954021000 RDI: > > 88846bf9e798 > > [ 504.155673] RBP: 888467af1480 R08: 88845bea2000 R09: > > > > [ 504.156242] R10: 8000 R11: fefefefefefefeff R12: > > > > [ 504.156810] R13: 88846bf9e790 R14: 888467af1b70 R15: > > 888467af1b60 > > [ 504.157378] FS: () GS:88846f70() > > knlGS: > > [ 504.158021] CS: 0010 DS: ES: CR0: 80050033 > > [ 504.158483] CR2: 7f2c56e99000 CR3: 05009001 CR4: > > 00360ee0 > > [ 504.159054] DR0: DR1: DR2: > > > > [ 504.159623] DR3: DR6: fffe0ff0 DR7: > > 0400 > > [ 504.160193] Call Trace: > > [ 504.160406] __split_vma+0x16f/0x180 > > [ 504.160706] ? __switch_to_asm+0x40/0x70 > > [ 504.161024] __do_munmap+0xfb/0x450 > > [ 504.161308] sgx_encl_release_worker+0x44/0x70 > > [ 504.161675] process_one_work+0x200/0x3f0 > > [ 504.162004] worker_thread+0x2d/0x3d0 > > [ 504.162301] ? process_one_work+0x3f0/0x3f0 > > [ 504.162645] kthread+0x113/0x130 > > [ 504.162912] ? kthread_park+0x90/0x90 > > [ 504.163209] ret_from_fork+0x35/0x40 > > [ 504.163503] Modules linked in: bridge stp llc > > [ 504.163866] ---[ end trace 83076139fc25e3e0 ]--- > > There was a race with release and swapping code that I thought I fixed, > and this is looks like a race there. Have to recheck what I did not > consider. Anyway, though to share this if you have time to look at it. > That is the part where something is now unsync most probably. I think I found it. I was careless to make sgx_encl_release() to use sgx_invalidate(), which does not delete pages in the case when enclave is already marked as dead. This was after I had fixed the race that I had there in the first place. That is why I was puzzled why it suddenly reappeared. Would be nice to use sgx_invalidate() also in release for consistency in semantics sake so maybe just delete this: if (encl->flags & SGX_ENCL_DEAD) return; ? /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: > [ 504.149548] [ cut here ] > [ 504.149550] kernel BUG at /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! > [ 504.150288] invalid opcode: [#1] SMP > [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted 4.20.0-rc2+ > #267 > [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 > 02/06/2015 > [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker > [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820 > [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 00 45 31 > e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b > 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 e8 fe ee > [ 504.154109] RSP: :c94ebd60 EFLAGS: 00010206 > [ 504.154535] RAX: 7fd92ef7e000 RBX: 888467af16c0 RCX: > 888467af16e0 > [ 504.155104] RDX: 888458fd09e0 RSI: 7fd954021000 RDI: > 88846bf9e798 > [ 504.155673] RBP: 888467af1480 R08: 88845bea2000 R09: > > [ 504.156242] R10: 8000 R11: fefefefefefefeff R12: > > [ 504.156810] R13: 88846bf9e790 R14: 888467af1b70 R15: > 888467af1b60 > [ 504.157378] FS: () GS:88846f70() > knlGS: > [ 504.158021] CS: 0010 DS: ES: CR0: 80050033 > [ 504.158483] CR2: 7f2c56e99000 CR3: 05009001 CR4: > 00360ee0 > [ 504.159054] DR0: DR1: DR2: > > [ 504.159623] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 504.160193] Call Trace: > [ 504.160406] __split_vma+0x16f/0x180 > [ 504.160706] ? __switch_to_asm+0x40/0x70 > [ 504.161024] __do_munmap+0xfb/0x450 > [ 504.161308] sgx_encl_release_worker+0x44/0x70 > [ 504.161675] process_one_work+0x200/0x3f0 > [ 504.162004] worker_thread+0x2d/0x3d0 > [ 504.162301] ? process_one_work+0x3f0/0x3f0 > [ 504.162645] kthread+0x113/0x130 > [ 504.162912] ? kthread_park+0x90/0x90 > [ 504.163209] ret_from_fork+0x35/0x40 > [ 504.163503] Modules linked in: bridge stp llc > [ 504.163866] ---[ end trace 83076139fc25e3e0 ]--- There was a race with release and swapping code that I thought I fixed, and this is looks like a race there. Have to recheck what I did not consider. Anyway, though to share this if you have time to look at it. That is the part where something is now unsync most probably. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote: Good afternoon, I hope the weekend is going well for everyone. > On Fri, Dec 14, 2018 at 05:59:17PM -0600, Dr. Greg wrote: > > On Wed, Dec 12, 2018 at 08:00:36PM +0200, Jarkko Sakkinen wrote: > > > > Good evening, I hope the week has gone well for everyone. > > > > > On Mon, Dec 10, 2018 at 04:49:08AM -0600, Dr. Greg wrote: > > > > In the meantime, I wanted to confirm that your jarkko-sgx/master > > > > branch contains the proposed driver that is headed upstream. > > > > Before adding the SFLC patches we thought it best to run the > > > > driver through some testing in order to verify that any problems > > > > we generated where attributable to our work and not the base > > > > driver. > > > > > > The master branch is by definition unstable at the moment i.e. it > > > can sometimes (not often) contain unfinished changes. Use next for > > > testing. I update next when I consider the master contents "stable > > > enough". > > > > I noticed in the last day or so that you appeared to sync > > jarkko-sgx/master with jarkko-sgx/next, so I checked out a local > > branch against jarkko-sgx/next and ran it against our unit tests. > > Based on what we are seeing the driver is still experiencing issues > > with initialization of a non-trivial enclave. > master branch is broken, looks like the VMA code Jarkko is reworking is > buggy. I should be able to help debug this next week. > > [ 504.149548] [ cut here ] > [ 504.149550] kernel BUG at /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! Rodger, dodger. Let us know when you think you have something working pushed up into one of the branches and we will put it on the bench here in the lab and see what our runtime is able to do with it. BTW, your new vDSO work appears to be shaping up well. Just out of curiosity though, how are you testing and validating the new vDSO based exception handler if it isn't possible to initialize and run an enclave with the new driver? We will look forward to hearing from you. Have a good remainder of the weekend. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: g...@enjellic.com -- "Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." -- Howard Aiken
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Fri, Dec 14, 2018 at 05:59:17PM -0600, Dr. Greg wrote: > On Wed, Dec 12, 2018 at 08:00:36PM +0200, Jarkko Sakkinen wrote: > > Good evening, I hope the week has gone well for everyone. > > > On Mon, Dec 10, 2018 at 04:49:08AM -0600, Dr. Greg wrote: > > > In the meantime, I wanted to confirm that your jarkko-sgx/master > > > branch contains the proposed driver that is headed upstream. > > > Before adding the SFLC patches we thought it best to run the > > > driver through some testing in order to verify that any problems > > > we generated where attributable to our work and not the base > > > driver. > > > > The master branch is by definition unstable at the moment i.e. it > > can sometimes (not often) contain unfinished changes. Use next for > > testing. I update next when I consider the master contents "stable > > enough". > > I noticed in the last day or so that you appeared to sync > jarkko-sgx/master with jarkko-sgx/next, so I checked out a local > branch against jarkko-sgx/next and ran it against our unit tests. > Based on what we are seeing the driver is still experiencing issues > with initialization of a non-trivial enclave. master branch is broken, looks like the VMA code Jarkko is reworking is buggy. I should be able to help debug this next week. [ 504.149548] [ cut here ] [ 504.149550] kernel BUG at /home/sean/go/src/kernel.org/linux/mm/mmap.c:669! [ 504.150288] invalid opcode: [#1] SMP [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted 4.20.0-rc2+ #267 [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820 [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 00 45 31 e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 e8 fe ee [ 504.154109] RSP: :c94ebd60 EFLAGS: 00010206 [ 504.154535] RAX: 7fd92ef7e000 RBX: 888467af16c0 RCX: 888467af16e0 [ 504.155104] RDX: 888458fd09e0 RSI: 7fd954021000 RDI: 88846bf9e798 [ 504.155673] RBP: 888467af1480 R08: 88845bea2000 R09: [ 504.156242] R10: 8000 R11: fefefefefefefeff R12: [ 504.156810] R13: 88846bf9e790 R14: 888467af1b70 R15: 888467af1b60 [ 504.157378] FS: () GS:88846f70() knlGS: [ 504.158021] CS: 0010 DS: ES: CR0: 80050033 [ 504.158483] CR2: 7f2c56e99000 CR3: 05009001 CR4: 00360ee0 [ 504.159054] DR0: DR1: DR2: [ 504.159623] DR3: DR6: fffe0ff0 DR7: 0400 [ 504.160193] Call Trace: [ 504.160406] __split_vma+0x16f/0x180 [ 504.160706] ? __switch_to_asm+0x40/0x70 [ 504.161024] __do_munmap+0xfb/0x450 [ 504.161308] sgx_encl_release_worker+0x44/0x70 [ 504.161675] process_one_work+0x200/0x3f0 [ 504.162004] worker_thread+0x2d/0x3d0 [ 504.162301] ? process_one_work+0x3f0/0x3f0 [ 504.162645] kthread+0x113/0x130 [ 504.162912] ? kthread_park+0x90/0x90 [ 504.163209] ret_from_fork+0x35/0x40 [ 504.163503] Modules linked in: bridge stp llc [ 504.163866] ---[ end trace 83076139fc25e3e0 ]---
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Wed, Dec 12, 2018 at 08:00:36PM +0200, Jarkko Sakkinen wrote: Good evening, I hope the week has gone well for everyone. > On Mon, Dec 10, 2018 at 04:49:08AM -0600, Dr. Greg wrote: > > In the meantime, I wanted to confirm that your jarkko-sgx/master > > branch contains the proposed driver that is headed upstream. > > Before adding the SFLC patches we thought it best to run the > > driver through some testing in order to verify that any problems > > we generated where attributable to our work and not the base > > driver. > > The master branch is by definition unstable at the moment i.e. it > can sometimes (not often) contain unfinished changes. Use next for > testing. I update next when I consider the master contents "stable > enough". I noticed in the last day or so that you appeared to sync jarkko-sgx/master with jarkko-sgx/next, so I checked out a local branch against jarkko-sgx/next and ran it against our unit tests. Based on what we are seeing the driver is still experiencing issues with initialization of a non-trivial enclave. On the first test boot of the new kernel, the EINIT ioctl consistently returned EBUSY over multiple invocations of the unit test. This did not appear to generate any negative issues with the kernel at large. We rebooted the box to run the test against a fresh kernel load. This time around we experienced issues similar to what we had previously described. The EINIT ioctl generates a segmentation fault which seems to largely incapacitate the kernel. Here are the logs and first backtrace from the test: --- Dec 14 13:25:06 nuc2 kernel: PGD 4f001067 P4D 4f001067 PUD 0 Dec 14 13:25:06 nuc2 kernel: BUG: unable to handle kernel paging request at 97bf3ae916fe Dec 14 13:25:06 nuc2 kernel: Oops: 0002 [#1] SMP PTI Dec 14 13:25:06 nuc2 kernel: CPU: 1 PID: 34 Comm: kworker/1:1 Not tainted 4.20.0-rc2-sgx-nuc2+ #12 Dec 14 13:25:06 nuc2 kernel: Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0046.2018.1103.1316 11/03/2018 Dec 14 13:25:06 nuc2 kernel: Workqueue: events cache_reap Dec 14 13:25:06 nuc2 kernel: RIP: 0010:free_block+0xe3/0x182 Dec 14 13:25:06 nuc2 kernel: Code: 20 45 29 d4 41 d3 ec 0f b6 4f 1d 45 01 e2 41 d3 ea 41 8b 49 30 ff c9 49 83 79 20 00 41 89 49 30 75 04 4d 89 59 20 4d 8b 59 20 <45> 88 14 0b 49 8d 49 08 41 83 79 30 00 75 1a 4c 8b 50 28 49 89 4a Dec 14 13:25:06 nuc2 kernel: RSP: 0018:b90800123db0 EFLAGS: 00210046 Dec 14 13:25:06 nuc2 kernel: RAX: 97be3b419080 RBX: 000f RCX: ff7e Dec 14 13:25:06 nuc2 kernel: RDX: 0018 RSI: d907ffc82b70 RDI: 97be3b44c200 Dec 14 13:25:06 nuc2 kernel: RBP: b90800123dd8 R08: b90800123e10 R09: f9b345eba440 Dec 14 13:25:06 nuc2 kernel: R10: 0051f663 R11: 97be3ae91780 R12: 11ede5c3 Dec 14 13:25:06 nuc2 kernel: R13: 8000 R14: 97be3b419088 R15: 97be3b4190a8 Dec 14 13:25:06 nuc2 kernel: FS: () GS:97be3be8() knlGS: Dec 14 13:25:06 nuc2 kernel: CS: 0010 DS: ES: CR0: 80050033 Dec 14 13:25:06 nuc2 kernel: CR2: 97bf3ae916fe CR3: 4ec0a000 CR4: 00340ee0 Dec 14 13:25:06 nuc2 kernel: Call Trace: Dec 14 13:25:06 nuc2 kernel: drain_array_locked+0x50/0x75 Dec 14 13:25:06 nuc2 kernel: drain_array.constprop.67+0x57/0x72 Dec 14 13:25:06 nuc2 kernel: cache_reap+0x58/0x101 Dec 14 13:25:06 nuc2 kernel: process_one_work+0x183/0x271 Dec 14 13:25:06 nuc2 kernel: worker_thread+0x1e5/0x2b4 Dec 14 13:25:06 nuc2 kernel: ? cancel_delayed_work_sync+0x10/0x10 Dec 14 13:25:06 nuc2 kernel: kthread+0x116/0x11e Dec 14 13:25:06 nuc2 kernel: ? kthread_park+0x7e/0x7e Dec 14 13:25:06 nuc2 kernel: ret_from_fork+0x1f/0x40 Dec 14 13:25:06 nuc2 kernel: Modules linked in: Dec 14 13:25:06 nuc2 kernel: CR2: 97bf3ae916fe Dec 14 13:25:06 nuc2 kernel: ---[ end trace 7f5dc24edc7285b3 ]--- Dec 14 13:25:06 nuc2 kernel: RIP: 0010:free_block+0xe3/0x182 Dec 14 13:25:06 nuc2 kernel: Code: 20 45 29 d4 41 d3 ec 0f b6 4f 1d 45 01 e2 41 d3 ea 41 8b 49 30 ff c9 49 83 79 20 00 41 89 49 30 75 04 4d 89 59 20 4d 8b 59 20 <45> 88 14 0b 49 8d 49 08 41 83 79 30 00 75 1a 4c 8b 50 28 49 89 4a Dec 14 13:25:06 nuc2 kernel: RSP: 0018:b90800123db0 EFLAGS: 00210046 Dec 14 13:25:06 nuc2 kernel: RAX: 97be3b419080 RBX: 000f RCX: ff7e Dec 14 13:25:06 nuc2 kernel: RDX: 0018 RSI: d907ffc82b70 RDI: 97be3b44c200 Dec 14 13:25:06 nuc2 kernel: RBP: b90800123dd8 R08: b90800123e10 R09: f9b345eba440 Dec 14 13:25:06 nuc2 kernel: R10: 0051f663 R11: 97be3ae91780 R12: 11ede5c3 Dec 14 13:25:06 nuc2 kernel: R13: 8000 R14: 97be3b419088 R15: 97be3b4190a8 Dec 14 13:25:06 nuc2 kernel: FS: () GS:97be3be8() knlGS: Dec 14 13:25:06 nuc2 kernel: CS: 0010 DS: ES:
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Dec 10, 2018 at 04:49:08AM -0600, Dr. Greg wrote: > In the meantime, I wanted to confirm that your jarkko-sgx/master > branch contains the proposed driver that is headed upstream. Before > adding the SFLC patches we thought it best to run the driver through > some testing in order to verify that any problems we generated where > attributable to our work and not the base driver. The master branch is by definition unstable at the moment i.e. it can sometimes (not often) contain unfinished changes. Use next for testing. I update next when I consider the master contents "stable enough". Thanks. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sun, Dec 09, 2018 at 06:01:32PM +0100, Pavel Machek wrote: > Hi! Good morning to everyone. > > On Thu, Nov 15, 2018 at 5:08 PM Jarkko Sakkinen > > wrote: > > > > > > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that > > > can be used by applications to set aside private regions of code and > > > data. The code outside the enclave is disallowed to access the memory > > > inside the enclave by the CPU access control. > > > > > > SGX driver provides a ioctl API for loading and initializing enclaves. > > > Address range for enclaves is reserved with mmap() and they are > > > destroyed with munmap(). Enclave construction, measurement and > > > initialization is done with the provided the ioctl API. > > > > I brought this up a while back, and I think I should re-ask it now > > that this driver is getting close to ready: > > > > As it stands, there's just one SGX character device, and I imagine > > that it'll be available to unprivileged applications. I'm concerned > > that this isn't quite what we want. I certainly think that everyone, > > or at least almost everyone, ought to be able to run normal > > enclaves. > I don't think nobody or postfix or guest should be running enclaves > on my systems. First, I'd like to be able to debug my systems. > > Second, sgx quite complex and tricky. It may turn out to be secure > in the end, but I'd not be surprised if we got few CVEs before we > get there. > > Last, I'd hate to find out in few years that I can't switch to amd > cpu because firefox now requires sgx. > > Just make it root-only or 660 by default. Users can get permission > in similar way they get rights to audio.. I'm not sure that using root or group restricted access to a character device is going to stop an ISV from embracing a technology, but that is an alternate debate. Relying on discretionary, or mandatory for that matter, access controls is not consistent with the security architecture of SGX. The technology was designed to provide robustness in the face of aggressors who may have compromised the operating system or hardware platform. The lingua franca of SGX security and access controls are MRSIGNER values. The SFLC patches that we will be making available, once we are convinced the upstream driver is working, implement MRSIGNER based security controls with an absolutely minimal TCB footprint in the kernel. This strategy allows the platform owner to use SGX compliant and cryptographically enforced access controls. Just as an aside, secondary to our perception that this technology and what it can do is not widely understood, we are developing a 2-part LWN article series on SGX and its implications for Linux. > Pavel Best wishes for a good day and a productive week. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: g...@enjellic.com -- "... remember that innovation is saying 'no' to 1000 things." -- Moxie Marlinspike
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Wed, Nov 28, 2018 at 11:22:28AM -0800, Jarkko Sakkinen wrote: Good morning, I hope everyone had a pleasant weekend. > On Wed, Nov 28, 2018 at 04:49:41AM -0600, Dr. Greg wrote: > > We've been carrying a patch, that drops in on top of the proposed > > kernel driver, that implements the needed policy management framework > > for DAC fragile (FLC) platforms. After a meeting yesterday with the > > client that is funding the work, a decision was made to release the > > enhancements when the SGX driver goes mainline. That will at least > > give developers the option of creating solutions on Linux that > > implement the security guarantees that SGX was designed to deliver. > We do not need yet another policy management framework to the *kernel*. > > The token based approach that Andy is proposing is proven and well > established method to create a mechanism. You can then create a > daemon to user space that decides who it wants to send tokes. I guess there will be plenty of time to argue about all of that. In the meantime, I wanted to confirm that your jarkko-sgx/master branch contains the proposed driver that is headed upstream. Before adding the SFLC patches we thought it best to run the driver through some testing in order to verify that any problems we generated where attributable to our work and not the base driver. At the current time jarkko-sgx/master appears to be having difficulty initializing the unit test enclave for our trusted runtime API librarary. Enclave creation and loading appear to work fine, things go south after the EINIT ioctl is called on the loaded image. We specifically isolated the regressions to occur secondary to the EINIT ioctl being called. We modified our sgx-load test utility to pause with the image loaded, but not initialized. We generated a fair amount of system activity while the process was holding the enclave image open and there were no issues. The process was then allowed to unmap the virtual memory image without calling EINIT and the system was fine after that as well. Symptoms vary, but in all cases appear to be linked to corruption of the virtual memory infrastructure. In all cases, the kernel ends up at a point where any attempt to start a new process hangs and becomes uninterruptible. The full kernel failure does not appear to be synchronous with when EINIT is called, which would support the notion that something is going wrong with the VM management that is being workqueue deferred. This is with your MPX patch applied that corrects issues with the wrong memory management context being acted upon by that system. In any event, the kernel configuration being used for testing does not have MPX support even enabled. Given that the changelog for the patch is indicating the new driver is attempting something unique with workqueue deferred VM management, it would seem possible that the driver is tickling bad and possibly untested behavior elsewhere in the kernel as well. The enclave in question is not terribly sophisticated by the standards of our other enclaves, but it is a non-trivial test of SGX functionality. It weighs in at about 156K and is generated and signed in debug mode with version 1.4 compliant metadata. Obviously it initializes and runs fine with the out-of-tree driver. We managed to capture two separate sets of error logs/backtraces that are included below. As I'm sure you know, without module support, working on all of this is a bit painful as it requires the classic edit-compile-link-boot-whimper procedure :-) Given that the self-test committed to the kernel sources is a trivial one page enclave and the proposed driver ABI is incompatible with the released Intel Linux PSW/SDK, this may be the most challenging test the driver has been put through. Unless your PSW/SDK team is testing the new driver behind the scenes. Obviously let us know if jarkko-master/sgx is not where the action is at or if you would like us to move forward with alternative testing. Regression traces follow: Event 1: --- Dec 9 07:35:15 nuc2 kernel: general protection fault: [#1] SMP PTI Dec 9 07:35:15 nuc2 kernel: CPU: 1 PID: 1594 Comm: less Not tainted 4.20.0-rc2-sgx-nuc2+ #11 Dec 9 07:35:15 nuc2 kernel: Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0046.2018.1103.1316 11/03/2018 Dec 9 07:35:15 nuc2 kernel: RIP: 0010:unmap_vmas+0x3c/0x83 Dec 9 07:35:15 nuc2 kernel: Code: 49 89 cc 53 48 89 f3 4c 8b 6e 40 49 83 bd a0 03 00 00 00 74 32 b9 01 00 00 00 4c 89 e2 4c 89 f6 4c 89 ef e8 db be 01 00 eb 1d <4c> 39 23 73 1d 48 89 de 45 31 c0 4c 89 e1 4c 89 f2 4c 89 ff e8 cb Dec 9 07:35:15 nuc2 kernel: RSP: 0018:9fd7404c7d90 EFLAGS: 00010282 Dec 9 07:35:15 nuc2 kernel: RAX: 0007755e RBX: 0f66fad412e0 RCX: Dec 9 07:35:15 nuc2 kernel: RDX: 8b66f9e42ee0 RSI: 8b66f9e42c00 RDI: 9fd7404c7dc8 Dec 9 07:35:15 nuc2 kernel: RBP:
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
Hi! > On Thu, Nov 15, 2018 at 5:08 PM Jarkko Sakkinen > wrote: > > > > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that > > can be used by applications to set aside private regions of code and > > data. The code outside the enclave is disallowed to access the memory > > inside the enclave by the CPU access control. > > > > SGX driver provides a ioctl API for loading and initializing enclaves. > > Address range for enclaves is reserved with mmap() and they are > > destroyed with munmap(). Enclave construction, measurement and > > initialization is done with the provided the ioctl API. > > > > I brought this up a while back, and I think I should re-ask it now > that this driver is getting close to ready: > > As it stands, there's just one SGX character device, and I imagine > that it'll be available to unprivileged applications. I'm concerned > that this isn't quite what we want. I certainly think that everyone, > or at least almost everyone, ought to be able to run normal > enclaves. I don't think nobody or postfix or guest should be running enclaves on my systems. First, I'd like to be able to debug my systems. Second, sgx quite complex and tricky. It may turn out to be secure in the end, but I'd not be surprised if we got few CVEs before we get there. Last, I'd hate to find out in few years that I can't switch to amd cpu because firefox now requires sgx. Just make it root-only or 660 by default. Users can get permission in similar way they get rights to audio.. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
Hi! > > There would be three types of users: > > > > 1. Ones that have access to neither of the devices. > > 2. Ones that have access to unprivileged. Who are these? > > Either 0666 (world) or an sgx group. Sgx group, please. Or even better, what is generic term for sgx? We probably want to use that, as sgx is likely trademarked and certainly Intelism. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 12:55 AM Dr. Greg wrote: > Since the thread has become a bit divergent I wanted to note that we > have offered a proposal for a general policy management framework > based on MRSIGNER values. This framework is consistent with the SGX > security model, ie. cryptographic rather then DAC based policy > controls. This framework also allows a much more flexible policy > implementation that doesn't result in combinatoric issues. Can you give a concrete explanation of a problem that your proposal would solve? As far as I can tell, it gets rid of a case in which an unprivileged attacker who can run enclaves but hasn't compromised the kernel can learn the PPID and other SGX-related permanent platform identifiers, but it does nothing to prevent the same attacker from learning non-SGX-related permanent identifiers, nor does it prevent the attacker from using the Intel quoting enclave (unless configured in a surprising way) and thus attesting to a remote system. So what problem does it solve?
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 12:55 AM Dr. Greg wrote: > Since the thread has become a bit divergent I wanted to note that we > have offered a proposal for a general policy management framework > based on MRSIGNER values. This framework is consistent with the SGX > security model, ie. cryptographic rather then DAC based policy > controls. This framework also allows a much more flexible policy > implementation that doesn't result in combinatoric issues. Can you give a concrete explanation of a problem that your proposal would solve? As far as I can tell, it gets rid of a case in which an unprivileged attacker who can run enclaves but hasn't compromised the kernel can learn the PPID and other SGX-related permanent platform identifiers, but it does nothing to prevent the same attacker from learning non-SGX-related permanent identifiers, nor does it prevent the attacker from using the Intel quoting enclave (unless configured in a surprising way) and thus attesting to a remote system. So what problem does it solve?
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Wed, Nov 28, 2018 at 04:49:41AM -0600, Dr. Greg wrote: > We've been carrying a patch, that drops in on top of the proposed > kernel driver, that implements the needed policy management framework > for DAC fragile (FLC) platforms. After a meeting yesterday with the > client that is funding the work, a decision was made to release the > enhancements when the SGX driver goes mainline. That will at least > give developers the option of creating solutions on Linux that > implement the security guarantees that SGX was designed to deliver. We do not need yet another policy management framework to the *kernel*. The token based approach that Andy is proposing is proven and well established method to create a mechanism. You can then create a daemon to user space that decides who it wants to send tokes. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Wed, Nov 28, 2018 at 04:49:41AM -0600, Dr. Greg wrote: > We've been carrying a patch, that drops in on top of the proposed > kernel driver, that implements the needed policy management framework > for DAC fragile (FLC) platforms. After a meeting yesterday with the > client that is funding the work, a decision was made to release the > enhancements when the SGX driver goes mainline. That will at least > give developers the option of creating solutions on Linux that > implement the security guarantees that SGX was designed to deliver. We do not need yet another policy management framework to the *kernel*. The token based approach that Andy is proposing is proven and well established method to create a mechanism. You can then create a daemon to user space that decides who it wants to send tokes. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 09:55:45AM -0800, Andy Lutomirski wrote: > > On Nov 27, 2018, at 8:41 AM, Jarkko Sakkinen > > wrote: > > > >> On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: > >> Since the thread has become a bit divergent I wanted to note that we > >> have offered a proposal for a general policy management framework > >> based on MRSIGNER values. This framework is consistent with the SGX > >> security model, ie. cryptographic rather then DAC based policy > >> controls. This framework also allows a much more flexible policy > >> implementation that doesn't result in combinatoric issues. > >> > >> Our framework also allows the preservation of the current ABI which > >> allows an EINITTOKEN to be passed in from userspace. The framework > >> also supports the ability to specify that only a kernel based launch > >> enclave (LE) should be available if the platform owner or distribution > >> should desire to implement such a model. > >> > >> The policy management framework is straight forward. Three linked > >> lists or their equivalent which are populated through /sysfs > >> pseudo-files or equivalent plumbing. Each list is populated with > >> MRSIGNER values for signing keys that are allowed to initialize > >> enclaves under three separate conditions. > >> > >> 1.) General enclaves without special attribute bits. > >> > >> 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., > >> 'Provisioning Enclaves'. > >> > >> 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch > >> Enclaves'. > >> > >> An all-null MRSIGNER value serves as a 'sealing' value that locks a > >> list from any further modifications. > >> > >> This architecture allows platform policies to be specified and then > >> sealed at early boot by the root user. At that point cryptographic > >> policy controls are in place rather then DAC based controls, the > >> latter of which have perpetual security liabilities in addition to the > >> useability constraints inherent in a DAC or device node model. > >> > >> We have developed an independent implementation of the PSW and > >> arguably have as much experience with issues surrounding how to > >> interact with the device driver as anyone. We have spent a lot of > >> time thinking about these issues and the above framework provides the > >> most flexible architecture available. > > > > Sounds like a lot bloat and policy added to the kernel whereas with > > Andy's proposal you can implement logic to a daemon and provide only > > mechanism to do it. > Well, almost. We'd need SGX_IOC_FREEZE_MR{ENCLAVE,SIGNER} or > similar. Or maybe the daemon could handle the entire loading process. > But this can wait until after the main driver is upstream. > > This does lead to a question: enclaves are kind-of-sort-of mapped > into a given address space. What happens if you issue the various > ioctls in the context of a different mm? For that matter, can two > processes mmap the same enclave? Fascinating. We've been carrying a patch, that drops in on top of the proposed kernel driver, that implements the needed policy management framework for DAC fragile (FLC) platforms. After a meeting yesterday with the client that is funding the work, a decision was made to release the enhancements when the SGX driver goes mainline. That will at least give developers the option of creating solutions on Linux that implement the security guarantees that SGX was designed to deliver. Most importantly, since it implements a driver consistent with the design of SGX, it has the added benefit of allowing system administrators the ability to enable the driver to work on non-FLC (locked) platforms. Since Jarkko confirmed that FLC is the option of platform vendors, this would seem to be important as SGX on Linux will only work in a random fashion dependent on the whims of hardware OEM's in probably a SKU dependent fashion. Which is why the client has interest in the work. Best wishes for a productive remainder of the week. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: g...@enjellic.com -- "Five year projections, are you kidding me. We don't know what we are supposed to be doing at the 4 o'clock meeting this afternoon." -- Terry Wieland Resurrection
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 09:55:45AM -0800, Andy Lutomirski wrote: > > On Nov 27, 2018, at 8:41 AM, Jarkko Sakkinen > > wrote: > > > >> On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: > >> Since the thread has become a bit divergent I wanted to note that we > >> have offered a proposal for a general policy management framework > >> based on MRSIGNER values. This framework is consistent with the SGX > >> security model, ie. cryptographic rather then DAC based policy > >> controls. This framework also allows a much more flexible policy > >> implementation that doesn't result in combinatoric issues. > >> > >> Our framework also allows the preservation of the current ABI which > >> allows an EINITTOKEN to be passed in from userspace. The framework > >> also supports the ability to specify that only a kernel based launch > >> enclave (LE) should be available if the platform owner or distribution > >> should desire to implement such a model. > >> > >> The policy management framework is straight forward. Three linked > >> lists or their equivalent which are populated through /sysfs > >> pseudo-files or equivalent plumbing. Each list is populated with > >> MRSIGNER values for signing keys that are allowed to initialize > >> enclaves under three separate conditions. > >> > >> 1.) General enclaves without special attribute bits. > >> > >> 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., > >> 'Provisioning Enclaves'. > >> > >> 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch > >> Enclaves'. > >> > >> An all-null MRSIGNER value serves as a 'sealing' value that locks a > >> list from any further modifications. > >> > >> This architecture allows platform policies to be specified and then > >> sealed at early boot by the root user. At that point cryptographic > >> policy controls are in place rather then DAC based controls, the > >> latter of which have perpetual security liabilities in addition to the > >> useability constraints inherent in a DAC or device node model. > >> > >> We have developed an independent implementation of the PSW and > >> arguably have as much experience with issues surrounding how to > >> interact with the device driver as anyone. We have spent a lot of > >> time thinking about these issues and the above framework provides the > >> most flexible architecture available. > > > > Sounds like a lot bloat and policy added to the kernel whereas with > > Andy's proposal you can implement logic to a daemon and provide only > > mechanism to do it. > Well, almost. We'd need SGX_IOC_FREEZE_MR{ENCLAVE,SIGNER} or > similar. Or maybe the daemon could handle the entire loading process. > But this can wait until after the main driver is upstream. > > This does lead to a question: enclaves are kind-of-sort-of mapped > into a given address space. What happens if you issue the various > ioctls in the context of a different mm? For that matter, can two > processes mmap the same enclave? Fascinating. We've been carrying a patch, that drops in on top of the proposed kernel driver, that implements the needed policy management framework for DAC fragile (FLC) platforms. After a meeting yesterday with the client that is funding the work, a decision was made to release the enhancements when the SGX driver goes mainline. That will at least give developers the option of creating solutions on Linux that implement the security guarantees that SGX was designed to deliver. Most importantly, since it implements a driver consistent with the design of SGX, it has the added benefit of allowing system administrators the ability to enable the driver to work on non-FLC (locked) platforms. Since Jarkko confirmed that FLC is the option of platform vendors, this would seem to be important as SGX on Linux will only work in a random fashion dependent on the whims of hardware OEM's in probably a SKU dependent fashion. Which is why the client has interest in the work. Best wishes for a productive remainder of the week. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: g...@enjellic.com -- "Five year projections, are you kidding me. We don't know what we are supposed to be doing at the 4 o'clock meeting this afternoon." -- Terry Wieland Resurrection
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sat, Nov 24, 2018 at 08:45:34AM -0800, Jarkko Sakkinen wrote: > On Fri, Nov 23, 2018 at 04:39:23AM -0600, Dr. Greg wrote: > > Jarkko, when this driver lands it will set the SGX ABI in stone for > > Linux. It would be very, very helpful to the development community if > > there was some official guidance from Intel on whether or not FLC will > > be a universal feature on all hardware and the date that is going to > > happen or has happened. > > I seriously don't know but I can take this message to the mothership... LC enabling is essentially a platform vendors choice, not Intels choice, like many other CPU features that Linux is dependent on. Of course, if Linux ends supporting only LC that will without doubt have a big impact on vendors so in that way it is indirectly also communitys choice. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sat, Nov 24, 2018 at 08:45:34AM -0800, Jarkko Sakkinen wrote: > On Fri, Nov 23, 2018 at 04:39:23AM -0600, Dr. Greg wrote: > > Jarkko, when this driver lands it will set the SGX ABI in stone for > > Linux. It would be very, very helpful to the development community if > > there was some official guidance from Intel on whether or not FLC will > > be a universal feature on all hardware and the date that is going to > > happen or has happened. > > I seriously don't know but I can take this message to the mothership... LC enabling is essentially a platform vendors choice, not Intels choice, like many other CPU features that Linux is dependent on. Of course, if Linux ends supporting only LC that will without doubt have a big impact on vendors so in that way it is indirectly also communitys choice. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
> On Nov 27, 2018, at 8:41 AM, Jarkko Sakkinen > wrote: > >> On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: >> Since the thread has become a bit divergent I wanted to note that we >> have offered a proposal for a general policy management framework >> based on MRSIGNER values. This framework is consistent with the SGX >> security model, ie. cryptographic rather then DAC based policy >> controls. This framework also allows a much more flexible policy >> implementation that doesn't result in combinatoric issues. >> >> Our framework also allows the preservation of the current ABI which >> allows an EINITTOKEN to be passed in from userspace. The framework >> also supports the ability to specify that only a kernel based launch >> enclave (LE) should be available if the platform owner or distribution >> should desire to implement such a model. >> >> The policy management framework is straight forward. Three linked >> lists or their equivalent which are populated through /sysfs >> pseudo-files or equivalent plumbing. Each list is populated with >> MRSIGNER values for signing keys that are allowed to initialize >> enclaves under three separate conditions. >> >> 1.) General enclaves without special attribute bits. >> >> 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., >> 'Provisioning Enclaves'. >> >> 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch >> Enclaves'. >> >> An all-null MRSIGNER value serves as a 'sealing' value that locks a >> list from any further modifications. >> >> This architecture allows platform policies to be specified and then >> sealed at early boot by the root user. At that point cryptographic >> policy controls are in place rather then DAC based controls, the >> latter of which have perpetual security liabilities in addition to the >> useability constraints inherent in a DAC or device node model. >> >> We have developed an independent implementation of the PSW and >> arguably have as much experience with issues surrounding how to >> interact with the device driver as anyone. We have spent a lot of >> time thinking about these issues and the above framework provides the >> most flexible architecture available. > > Sounds like a lot bloat and policy added to the kernel whereas with > Andy's proposal you can implement logic to a daemon and provide only > mechanism to do it. > > Well, almost. We’d need SGX_IOC_FREEZE_MR{ENCLAVE,SIGNER} or similar. Or maybe the daemon could handle the entire loading process. But this can wait until after the main driver is upstream. This does lead to a question: enclaves are kind-of-sort-of mapped into a given address space. What happens if you issue the various ioctls in the context of a different mm? For that matter, can two processes mmap the same enclave?
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
> On Nov 27, 2018, at 8:41 AM, Jarkko Sakkinen > wrote: > >> On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: >> Since the thread has become a bit divergent I wanted to note that we >> have offered a proposal for a general policy management framework >> based on MRSIGNER values. This framework is consistent with the SGX >> security model, ie. cryptographic rather then DAC based policy >> controls. This framework also allows a much more flexible policy >> implementation that doesn't result in combinatoric issues. >> >> Our framework also allows the preservation of the current ABI which >> allows an EINITTOKEN to be passed in from userspace. The framework >> also supports the ability to specify that only a kernel based launch >> enclave (LE) should be available if the platform owner or distribution >> should desire to implement such a model. >> >> The policy management framework is straight forward. Three linked >> lists or their equivalent which are populated through /sysfs >> pseudo-files or equivalent plumbing. Each list is populated with >> MRSIGNER values for signing keys that are allowed to initialize >> enclaves under three separate conditions. >> >> 1.) General enclaves without special attribute bits. >> >> 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., >> 'Provisioning Enclaves'. >> >> 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch >> Enclaves'. >> >> An all-null MRSIGNER value serves as a 'sealing' value that locks a >> list from any further modifications. >> >> This architecture allows platform policies to be specified and then >> sealed at early boot by the root user. At that point cryptographic >> policy controls are in place rather then DAC based controls, the >> latter of which have perpetual security liabilities in addition to the >> useability constraints inherent in a DAC or device node model. >> >> We have developed an independent implementation of the PSW and >> arguably have as much experience with issues surrounding how to >> interact with the device driver as anyone. We have spent a lot of >> time thinking about these issues and the above framework provides the >> most flexible architecture available. > > Sounds like a lot bloat and policy added to the kernel whereas with > Andy's proposal you can implement logic to a daemon and provide only > mechanism to do it. > > Well, almost. We’d need SGX_IOC_FREEZE_MR{ENCLAVE,SIGNER} or similar. Or maybe the daemon could handle the entire loading process. But this can wait until after the main driver is upstream. This does lead to a question: enclaves are kind-of-sort-of mapped into a given address space. What happens if you issue the various ioctls in the context of a different mm? For that matter, can two processes mmap the same enclave?
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: > 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch > Enclaves'. Kernel does not have to manage this. If the MSRs are read-only, they should match your LE. If the MSRs writable, you don't need an LE. This whole scheme sounds like adding own SELinux for SGX and it is only words. No code available. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: > 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch > Enclaves'. Kernel does not have to manage this. If the MSRs are read-only, they should match your LE. If the MSRs writable, you don't need an LE. This whole scheme sounds like adding own SELinux for SGX and it is only words. No code available. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: > Since the thread has become a bit divergent I wanted to note that we > have offered a proposal for a general policy management framework > based on MRSIGNER values. This framework is consistent with the SGX > security model, ie. cryptographic rather then DAC based policy > controls. This framework also allows a much more flexible policy > implementation that doesn't result in combinatoric issues. > > Our framework also allows the preservation of the current ABI which > allows an EINITTOKEN to be passed in from userspace. The framework > also supports the ability to specify that only a kernel based launch > enclave (LE) should be available if the platform owner or distribution > should desire to implement such a model. > > The policy management framework is straight forward. Three linked > lists or their equivalent which are populated through /sysfs > pseudo-files or equivalent plumbing. Each list is populated with > MRSIGNER values for signing keys that are allowed to initialize > enclaves under three separate conditions. > > 1.) General enclaves without special attribute bits. > > 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., > 'Provisioning Enclaves'. > > 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch > Enclaves'. > > An all-null MRSIGNER value serves as a 'sealing' value that locks a > list from any further modifications. > > This architecture allows platform policies to be specified and then > sealed at early boot by the root user. At that point cryptographic > policy controls are in place rather then DAC based controls, the > latter of which have perpetual security liabilities in addition to the > useability constraints inherent in a DAC or device node model. > > We have developed an independent implementation of the PSW and > arguably have as much experience with issues surrounding how to > interact with the device driver as anyone. We have spent a lot of > time thinking about these issues and the above framework provides the > most flexible architecture available. Sounds like a lot bloat and policy added to the kernel whereas with Andy's proposal you can implement logic to a daemon and provide only mechanism to do it. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Tue, Nov 27, 2018 at 02:55:33AM -0600, Dr. Greg wrote: > Since the thread has become a bit divergent I wanted to note that we > have offered a proposal for a general policy management framework > based on MRSIGNER values. This framework is consistent with the SGX > security model, ie. cryptographic rather then DAC based policy > controls. This framework also allows a much more flexible policy > implementation that doesn't result in combinatoric issues. > > Our framework also allows the preservation of the current ABI which > allows an EINITTOKEN to be passed in from userspace. The framework > also supports the ability to specify that only a kernel based launch > enclave (LE) should be available if the platform owner or distribution > should desire to implement such a model. > > The policy management framework is straight forward. Three linked > lists or their equivalent which are populated through /sysfs > pseudo-files or equivalent plumbing. Each list is populated with > MRSIGNER values for signing keys that are allowed to initialize > enclaves under three separate conditions. > > 1.) General enclaves without special attribute bits. > > 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., > 'Provisioning Enclaves'. > > 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch > Enclaves'. > > An all-null MRSIGNER value serves as a 'sealing' value that locks a > list from any further modifications. > > This architecture allows platform policies to be specified and then > sealed at early boot by the root user. At that point cryptographic > policy controls are in place rather then DAC based controls, the > latter of which have perpetual security liabilities in addition to the > useability constraints inherent in a DAC or device node model. > > We have developed an independent implementation of the PSW and > arguably have as much experience with issues surrounding how to > interact with the device driver as anyone. We have spent a lot of > time thinking about these issues and the above framework provides the > most flexible architecture available. Sounds like a lot bloat and policy added to the kernel whereas with Andy's proposal you can implement logic to a daemon and provide only mechanism to do it. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 03:04:36PM -0800, Jarkko Sakkinen wrote: Good morning to everyone. > On Mon, Nov 26, 2018 at 01:51:45PM -0800, Jarkko Sakkinen wrote: > > > ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); > > > > > > This requires extra syscalls, but it doesn???t have the combinatorial > > > explosion problem. > > > > I like this design because it is extendable. I'm now also in the same > > page why we need to protect provisioning in the first place. I would > > slight restructure this as > > > > /dev/sgx/control > > /dev/sgx/attributes/provision > I guess it would be OK to upstream only control node first as long > as provision attribute is denied in order to keep the already huge > patch set a tiny bit smaller? At this point in time I believe there is a consensus that the driver needs a policy management framework of some type for an optimum implementation. The PROVISION attribute has privacy implications and unrestricted access to release mode (full security) is problematic. Since the thread has become a bit divergent I wanted to note that we have offered a proposal for a general policy management framework based on MRSIGNER values. This framework is consistent with the SGX security model, ie. cryptographic rather then DAC based policy controls. This framework also allows a much more flexible policy implementation that doesn't result in combinatoric issues. Our framework also allows the preservation of the current ABI which allows an EINITTOKEN to be passed in from userspace. The framework also supports the ability to specify that only a kernel based launch enclave (LE) should be available if the platform owner or distribution should desire to implement such a model. The policy management framework is straight forward. Three linked lists or their equivalent which are populated through /sysfs pseudo-files or equivalent plumbing. Each list is populated with MRSIGNER values for signing keys that are allowed to initialize enclaves under three separate conditions. 1.) General enclaves without special attribute bits. 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., 'Provisioning Enclaves'. 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch Enclaves'. An all-null MRSIGNER value serves as a 'sealing' value that locks a list from any further modifications. This architecture allows platform policies to be specified and then sealed at early boot by the root user. At that point cryptographic policy controls are in place rather then DAC based controls, the latter of which have perpetual security liabilities in addition to the useability constraints inherent in a DAC or device node model. We have developed an independent implementation of the PSW and arguably have as much experience with issues surrounding how to interact with the device driver as anyone. We have spent a lot of time thinking about these issues and the above framework provides the most flexible architecture available. > /Jarkko We would be happy to discuss specific aspects of the implementation. Have a good day. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: g...@enjellic.com -- "Remember that when you take down the fishhouse you can't put the minnows back into the lake, so throw them out on the ice. Make sure you stomp on any of the live ones so they don't suffer." -- Fritz Wettstein At the lake
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 03:04:36PM -0800, Jarkko Sakkinen wrote: Good morning to everyone. > On Mon, Nov 26, 2018 at 01:51:45PM -0800, Jarkko Sakkinen wrote: > > > ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); > > > > > > This requires extra syscalls, but it doesn???t have the combinatorial > > > explosion problem. > > > > I like this design because it is extendable. I'm now also in the same > > page why we need to protect provisioning in the first place. I would > > slight restructure this as > > > > /dev/sgx/control > > /dev/sgx/attributes/provision > I guess it would be OK to upstream only control node first as long > as provision attribute is denied in order to keep the already huge > patch set a tiny bit smaller? At this point in time I believe there is a consensus that the driver needs a policy management framework of some type for an optimum implementation. The PROVISION attribute has privacy implications and unrestricted access to release mode (full security) is problematic. Since the thread has become a bit divergent I wanted to note that we have offered a proposal for a general policy management framework based on MRSIGNER values. This framework is consistent with the SGX security model, ie. cryptographic rather then DAC based policy controls. This framework also allows a much more flexible policy implementation that doesn't result in combinatoric issues. Our framework also allows the preservation of the current ABI which allows an EINITTOKEN to be passed in from userspace. The framework also supports the ability to specify that only a kernel based launch enclave (LE) should be available if the platform owner or distribution should desire to implement such a model. The policy management framework is straight forward. Three linked lists or their equivalent which are populated through /sysfs pseudo-files or equivalent plumbing. Each list is populated with MRSIGNER values for signing keys that are allowed to initialize enclaves under three separate conditions. 1.) General enclaves without special attribute bits. 2.) Enclaves with the SGX_FLAGS_PROVISION_KEY attribute set. - i.e., 'Provisioning Enclaves'. 3.) Enclaves with the SGX_FLAGS_LICENSE_KEY attribute set - i.e., 'Launch Enclaves'. An all-null MRSIGNER value serves as a 'sealing' value that locks a list from any further modifications. This architecture allows platform policies to be specified and then sealed at early boot by the root user. At that point cryptographic policy controls are in place rather then DAC based controls, the latter of which have perpetual security liabilities in addition to the useability constraints inherent in a DAC or device node model. We have developed an independent implementation of the PSW and arguably have as much experience with issues surrounding how to interact with the device driver as anyone. We have spent a lot of time thinking about these issues and the above framework provides the most flexible architecture available. > /Jarkko We would be happy to discuss specific aspects of the implementation. Have a good day. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: g...@enjellic.com -- "Remember that when you take down the fishhouse you can't put the minnows back into the lake, so throw them out on the ice. Make sure you stomp on any of the live ones so they don't suffer." -- Fritz Wettstein At the lake
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 01:51:45PM -0800, Jarkko Sakkinen wrote: > > ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); > > > > This requires extra syscalls, but it doesn’t have the combinatorial > > explosion problem. > > I like this design because it is extendable. I'm now also in the same > page why we need to protect provisioning in the first place. I would > slight restructure this as > > /dev/sgx/control > /dev/sgx/attributes/provision I guess it would be OK to upstream only control node first as long as provision attribute is denied in order to keep the already huge patch set a tiny bit smaller? /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 01:51:45PM -0800, Jarkko Sakkinen wrote: > > ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); > > > > This requires extra syscalls, but it doesn’t have the combinatorial > > explosion problem. > > I like this design because it is extendable. I'm now also in the same > page why we need to protect provisioning in the first place. I would > slight restructure this as > > /dev/sgx/control > /dev/sgx/attributes/provision I guess it would be OK to upstream only control node first as long as provision attribute is denied in order to keep the already huge patch set a tiny bit smaller? /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 05:00:39AM -0600, Dr. Greg wrote: > We will be interested in your comments as to why the proposal is > insufficient in the real world of FLC. > > I believe the proposed architecture can be defended as being effective > in the real world, as it allows the root user to use cryptographic > protections of access to the PROVISION bit and to enclave execution in > general. On FLC that is the strongest guarantee that can be > delivered. > > When we speak of 'unauthorized' users I believe we are speaking in the > parlance of discretionary access controls which has a much wider TCB > scope then the cryptographic model we are proposing. The model we > propose allows the platform owner (root) to effectively implement the > same level of security over the PROVISION bit that current locked > platforms have, in a free and open fashion of course. > > We can certainly attempt to explain our position further. I think kernel controlled provision would in all cases lower the mitigations of thread scenarios (at least what you've presented so far) assuming that a compromissed kernel could be detected fairly quickly, wouldn't it? Even without SGX, having a compromissed kernel, you can anyhow stealth your malware in many ways. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 05:00:39AM -0600, Dr. Greg wrote: > We will be interested in your comments as to why the proposal is > insufficient in the real world of FLC. > > I believe the proposed architecture can be defended as being effective > in the real world, as it allows the root user to use cryptographic > protections of access to the PROVISION bit and to enclave execution in > general. On FLC that is the strongest guarantee that can be > delivered. > > When we speak of 'unauthorized' users I believe we are speaking in the > parlance of discretionary access controls which has a much wider TCB > scope then the cryptographic model we are proposing. The model we > propose allows the platform owner (root) to effectively implement the > same level of security over the PROVISION bit that current locked > platforms have, in a free and open fashion of course. > > We can certainly attempt to explain our position further. I think kernel controlled provision would in all cases lower the mitigations of thread scenarios (at least what you've presented so far) assuming that a compromissed kernel could be detected fairly quickly, wouldn't it? Even without SGX, having a compromissed kernel, you can anyhow stealth your malware in many ways. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sun, Nov 25, 2018 at 08:22:35AM -0800, Andy Lutomirski wrote: > Agreed. What I’m proposing adds additional security if the kernel is > *not* compromised. And even if the kernel is compromised evil use will detected quicker i.e. compromissed kernel is "better" than a kernel that allows to use provisioning freely. > There are other ways to accomplish it that might be better in some > respects. For example, there could be /dev/sgx and > /dev/sgx_rights/provision. The former exposes the whole sgx API, > except that it doesn’t allow provisioning by default. The latter does > nothing by itself. To run a provisioning enclave, you open both nodes, > then do something like: > > ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); > > This requires extra syscalls, but it doesn’t have the combinatorial > explosion problem. I like this design because it is extendable. I'm now also in the same page why we need to protect provisioning in the first place. I would slight restructure this as /dev/sgx/control /dev/sgx/attributes/provision Looks cleaner and the root /dev directory is less polluted. BTW, off-topic from this but should we remove ENCLAVE from IOC names as they all concern enclaves anyway? Seems kind of redundant. I.e. SGX_IOC_ENCLAVE_CREATE -> SGX_IOC_CREATE SGX_IOC_ENCLAVE_ADD_PAGE -> SGX_IOC_ADD_PAGE SGX_IOC_ENCLAVE_INIT -> SGX_IOC_INIT /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sun, Nov 25, 2018 at 08:22:35AM -0800, Andy Lutomirski wrote: > Agreed. What I’m proposing adds additional security if the kernel is > *not* compromised. And even if the kernel is compromised evil use will detected quicker i.e. compromissed kernel is "better" than a kernel that allows to use provisioning freely. > There are other ways to accomplish it that might be better in some > respects. For example, there could be /dev/sgx and > /dev/sgx_rights/provision. The former exposes the whole sgx API, > except that it doesn’t allow provisioning by default. The latter does > nothing by itself. To run a provisioning enclave, you open both nodes, > then do something like: > > ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); > > This requires extra syscalls, but it doesn’t have the combinatorial > explosion problem. I like this design because it is extendable. I'm now also in the same page why we need to protect provisioning in the first place. I would slight restructure this as /dev/sgx/control /dev/sgx/attributes/provision Looks cleaner and the root /dev directory is less polluted. BTW, off-topic from this but should we remove ENCLAVE from IOC names as they all concern enclaves anyway? Seems kind of redundant. I.e. SGX_IOC_ENCLAVE_CREATE -> SGX_IOC_CREATE SGX_IOC_ENCLAVE_ADD_PAGE -> SGX_IOC_ADD_PAGE SGX_IOC_ENCLAVE_INIT -> SGX_IOC_INIT /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sat, Nov 24, 2018 at 02:13:18PM -0600, Dr. Greg wrote: > This isn't about an enclave being able to tell that it is really an > enclave. As I noted in my previous reply, access to the provisioning > bit allows an enclave author to create a perpetual hardware identifier > for a platform based on a signing key of their choosing, along with a > few other incidentals, all of which are completely under the control > of the enclave author. I think I'm now in the same page with the issue now. Thanks for the patience explaining this. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sat, Nov 24, 2018 at 02:13:18PM -0600, Dr. Greg wrote: > This isn't about an enclave being able to tell that it is really an > enclave. As I noted in my previous reply, access to the provisioning > bit allows an enclave author to create a perpetual hardware identifier > for a platform based on a signing key of their choosing, along with a > few other incidentals, all of which are completely under the control > of the enclave author. I think I'm now in the same page with the issue now. Thanks for the patience explaining this. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sat, Nov 24, 2018 at 01:24:54PM -0600, Dr. Greg wrote: > On Sat, Nov 24, 2018 at 08:15:21AM -0800, Jarkko Sakkinen wrote: > > > On Tue, Nov 20, 2018 at 05:15:08AM -0600, Dr. Greg wrote: > > > Malware would not necessarily need the Intel attestation service. > > > Once access to the PROVISION bit is available, malware teams could > > > simply build their own attestation service. > > > AFAIK not possible as they wouldn't have access to the root > > provisioning key. Can be confirmed from the SDM's key derivation > > table (41-56). > > What provisioning and attestation is all about is establishing an > identity binding for a platform in question. The standard Intel > service binds the identity of a platform to an EPID private key. > > With access to the SGX_FLAGS_PROVISION_BIT an enclave can generate a > perpetual identity for a platform based on the identity modulus > signature (MRSIGNER) of the key that signs the signature structure of > the enclave. Without access to the root provisioning key a security > quorum or group has to be implemented via a subscription or enrollment > model but that is arguably not much of an obstacle. > > That is pretty much the way standard botware works now. > > Without provisions for cryptographically secure authorization and > policy enforcement in the driver, we will be creating infrastructure > for a new generation of botware/malware whose mothership will know > that a participating platform is running with full confidentiality and > integrity protections. OK, I think I got what you mean. With free access to the provision the bot net controller could be sure that a node is running inside an enclave. Is this what you are worried about? Please correct if not or even if there is a slight drift on what you are trying to state. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sat, Nov 24, 2018 at 01:24:54PM -0600, Dr. Greg wrote: > On Sat, Nov 24, 2018 at 08:15:21AM -0800, Jarkko Sakkinen wrote: > > > On Tue, Nov 20, 2018 at 05:15:08AM -0600, Dr. Greg wrote: > > > Malware would not necessarily need the Intel attestation service. > > > Once access to the PROVISION bit is available, malware teams could > > > simply build their own attestation service. > > > AFAIK not possible as they wouldn't have access to the root > > provisioning key. Can be confirmed from the SDM's key derivation > > table (41-56). > > What provisioning and attestation is all about is establishing an > identity binding for a platform in question. The standard Intel > service binds the identity of a platform to an EPID private key. > > With access to the SGX_FLAGS_PROVISION_BIT an enclave can generate a > perpetual identity for a platform based on the identity modulus > signature (MRSIGNER) of the key that signs the signature structure of > the enclave. Without access to the root provisioning key a security > quorum or group has to be implemented via a subscription or enrollment > model but that is arguably not much of an obstacle. > > That is pretty much the way standard botware works now. > > Without provisions for cryptographically secure authorization and > policy enforcement in the driver, we will be creating infrastructure > for a new generation of botware/malware whose mothership will know > that a participating platform is running with full confidentiality and > integrity protections. OK, I think I got what you mean. With free access to the provision the bot net controller could be sure that a node is running inside an enclave. Is this what you are worried about? Please correct if not or even if there is a slight drift on what you are trying to state. /Jarkko
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 3:00 AM Dr. Greg wrote: > > On Sun, Nov 25, 2018 at 04:37:00PM -0800, Andy Lutomirski wrote: > > > Bah, I hit send on a partially written draft. I???ll try again soon. > > > > --Andy > > Your first issue seems to be complete so I will respond to that in > order to make sure we are not talking past one another. It wasn't, but your answer is enlightening! I've read the SGX *manual*, but I hadn't dug through the actual Intel-supplied enclaves. So, when I said that the LE isn't an important part of the overall trust model, I meant that it isn't *in hardware*. It's certainly possible to write SGX software that weakens the security of the overall system, and Intel seems to have done so: > > The Intel supplied launch enclave (LE) specifically denies > initialization of enclaves which have the PROVISION attribute set, > with the exception of enclaves whose MRSIGNER values match those of > the keys that Intel uses to sign the PCE and PVE enclaves. See line > 219 of psw/ae/le/launch_enclave.cpp in the Intel SDK. This seems entirely reasonable. (But see below...) > This report, along with the PEK, is submitted to the PCE enclave in > order to generate the Platform Provisioning IDentity (PPID), which is > the privacy critical identifier. The PCE verifies the report > generated by the PVE and rejects the request to generate the PPID if > the report was generated by an enclave that was not initialized with > the PROVISION bit set, see line 109 of psw/ae/pce/pce.cpp. ...and this seems entirely unreasonable. Your description does indeed appear consistent with the code: the PCE will hand out the PPID to any requesting enclave that has the PROVISION bit set, so you are correct that: > Once again, the important design factor in all of this is the premise > that the launch enclave will not allow enclaves other then the PCE and > PVE to access the PROVISION bit. But here's where the whole thing goes off the rails. I would argue that the Intel-supplied (and Intel-signed, apparently!) PCE is just straight-up buggy. What Intel is *trying* to do is to hand out the PPID to an appropriately signed enclave. What they actually did is to hand out the PPID to any enclave that has the PROVISION bit set. This is poor design because it overload the PROVISION bit. That bit is supposed to mean "may use EGETKEY to obtain provisioning and provisioning seal keys", which is not actually what Intel wants here. It's also poor design without FLC because it pointlessly relies on the LE to enforce a restriction on the use of provisioning enclaves, when the code could instead have checked MRSIGNER.. And it's just straight up wrong with FLC because there is no guarantee whatsoever that the LE being used is Intel's. And, for that matter, there is no guarantee that the requesting enclave doesn't have the DEBUG bit set. (It's also poor design because the PCE doesn't appear to verify that the report passed in is actually intended to be associated with a call to get_ppid(). There appear to be reports associated with provision "msg1" and "msg3". If it's possible to get a valid report for msg3 to be accepted as a msg1 report of vice versa, then it might be game over.) Sorry, but this is not Linux's problem. The right fix is, in my opinion, entirely clear: the PCE should check MRSIGNER and possibly even MRENCLAVE in the report it receives. Intel needs to fix their PCE, sign a fixed version, and find some way to revoke, to the extent possible, the old one. And the SGX enclave authors need to understand something that is apparently subtle: THE LAUNCH POLICY IS NOT PART OF THE TCB AND SHOULD NOT BE RELIED UPON. Enclaves can and should be written to remain secure even in the complete absence of any form of launch control. I went and filed a bug on github. Let's see what happens: https://github.com/intel/linux-sgx/issues/345 Also, the whole SGX report mechanism seems to be misused in the SDK. An SGX report is a cryptographic primitive that essentially acts like a signed blob. Building a secure protocol on top of signed messages or on top of reports takes more than just making up ad hoc blob formats and signing them. There needs to be domain separation between different messages, and this seems to be entirely missing. Do you know if Intel has had a serious audit of their platform enclave code done? > > > No, I have no clue why Intel did it this way. I consider it to be a > > mistake. > > Are you referring to there being a mistake in the trust relationships > that the provisioning protocol implements or the overall concept of a > provisioning key? I'm referring to the hardware's policy as to when keys that don't depend on OWNEREPOCH can be obtained. As far as I know, the only real need for such keys is to verify that the running platform is a real Intel platform, which means that access to the provisioning key is only useful to Intel-approved services. Why didn't Intel enforce this in hardware or
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Mon, Nov 26, 2018 at 3:00 AM Dr. Greg wrote: > > On Sun, Nov 25, 2018 at 04:37:00PM -0800, Andy Lutomirski wrote: > > > Bah, I hit send on a partially written draft. I???ll try again soon. > > > > --Andy > > Your first issue seems to be complete so I will respond to that in > order to make sure we are not talking past one another. It wasn't, but your answer is enlightening! I've read the SGX *manual*, but I hadn't dug through the actual Intel-supplied enclaves. So, when I said that the LE isn't an important part of the overall trust model, I meant that it isn't *in hardware*. It's certainly possible to write SGX software that weakens the security of the overall system, and Intel seems to have done so: > > The Intel supplied launch enclave (LE) specifically denies > initialization of enclaves which have the PROVISION attribute set, > with the exception of enclaves whose MRSIGNER values match those of > the keys that Intel uses to sign the PCE and PVE enclaves. See line > 219 of psw/ae/le/launch_enclave.cpp in the Intel SDK. This seems entirely reasonable. (But see below...) > This report, along with the PEK, is submitted to the PCE enclave in > order to generate the Platform Provisioning IDentity (PPID), which is > the privacy critical identifier. The PCE verifies the report > generated by the PVE and rejects the request to generate the PPID if > the report was generated by an enclave that was not initialized with > the PROVISION bit set, see line 109 of psw/ae/pce/pce.cpp. ...and this seems entirely unreasonable. Your description does indeed appear consistent with the code: the PCE will hand out the PPID to any requesting enclave that has the PROVISION bit set, so you are correct that: > Once again, the important design factor in all of this is the premise > that the launch enclave will not allow enclaves other then the PCE and > PVE to access the PROVISION bit. But here's where the whole thing goes off the rails. I would argue that the Intel-supplied (and Intel-signed, apparently!) PCE is just straight-up buggy. What Intel is *trying* to do is to hand out the PPID to an appropriately signed enclave. What they actually did is to hand out the PPID to any enclave that has the PROVISION bit set. This is poor design because it overload the PROVISION bit. That bit is supposed to mean "may use EGETKEY to obtain provisioning and provisioning seal keys", which is not actually what Intel wants here. It's also poor design without FLC because it pointlessly relies on the LE to enforce a restriction on the use of provisioning enclaves, when the code could instead have checked MRSIGNER.. And it's just straight up wrong with FLC because there is no guarantee whatsoever that the LE being used is Intel's. And, for that matter, there is no guarantee that the requesting enclave doesn't have the DEBUG bit set. (It's also poor design because the PCE doesn't appear to verify that the report passed in is actually intended to be associated with a call to get_ppid(). There appear to be reports associated with provision "msg1" and "msg3". If it's possible to get a valid report for msg3 to be accepted as a msg1 report of vice versa, then it might be game over.) Sorry, but this is not Linux's problem. The right fix is, in my opinion, entirely clear: the PCE should check MRSIGNER and possibly even MRENCLAVE in the report it receives. Intel needs to fix their PCE, sign a fixed version, and find some way to revoke, to the extent possible, the old one. And the SGX enclave authors need to understand something that is apparently subtle: THE LAUNCH POLICY IS NOT PART OF THE TCB AND SHOULD NOT BE RELIED UPON. Enclaves can and should be written to remain secure even in the complete absence of any form of launch control. I went and filed a bug on github. Let's see what happens: https://github.com/intel/linux-sgx/issues/345 Also, the whole SGX report mechanism seems to be misused in the SDK. An SGX report is a cryptographic primitive that essentially acts like a signed blob. Building a secure protocol on top of signed messages or on top of reports takes more than just making up ad hoc blob formats and signing them. There needs to be domain separation between different messages, and this seems to be entirely missing. Do you know if Intel has had a serious audit of their platform enclave code done? > > > No, I have no clue why Intel did it this way. I consider it to be a > > mistake. > > Are you referring to there being a mistake in the trust relationships > that the provisioning protocol implements or the overall concept of a > provisioning key? I'm referring to the hardware's policy as to when keys that don't depend on OWNEREPOCH can be obtained. As far as I know, the only real need for such keys is to verify that the running platform is a real Intel platform, which means that access to the provisioning key is only useful to Intel-approved services. Why didn't Intel enforce this in hardware or
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sun, Nov 25, 2018 at 04:37:00PM -0800, Andy Lutomirski wrote: > Bah, I hit send on a partially written draft. I???ll try again soon. > > --Andy Your first issue seems to be complete so I will respond to that in order to make sure we are not talking past one another. > > On Nov 25, 2018, at 1:59 PM, Andy Lutomirski wrote: > >> On Nov 25, 2018, at 10:55 AM, Dr. Greg wrote: > >> > >> The notion of a launch enclave is critical to establishing these > >> guarantees. As soon as the kernel becomes involved in implementing > >> SGX security policy the architecture becomes vulnerable to kernel > >> and/or privilege modification attacks. > >> > >> We've talked at length about the provisioning bit, I won't go into > >> details unless people are interested, but the EPID provisioning > >> protocol implements an SGX mediated cryptographic contract that a > >> perpetual platform identifier will not be disclosed to anyone but > >> Intel. > As a reviewer, and as an occasional academic cryptographer, I need > to put my foot down here. This is inaccurate. I certainly wouldn't try to engage in a debate at the level of academic cryptography, so I want to clarify that I was speaking specifically with respect to the ability to use the Intel supplied Platform Certification Enclave (PCE) to obtain a perpetual platform identifier. There could certainly be an academic level weakness in SGX, or in the provisioning protocol, but at the end of the day the important issue seems to be whether or not a PCE enclave can be exploited by anyone with execution access to the enclave to generate a perpetual identifier for a platform. > There is an SGX-mediated contract that says: > > 1. For any given public key p, a perpetual platform identifier ID_p > exists and will only be disclosed to the holder of the corresponding > private key p_priv or to someone to whom the private key holder > permits (intentionally or otherwise) to use that identifier. > > 2. The ability described in #1 is available to anyone whom the > kernel and launch enclave (if the MSRs are locked ) permits > (intentionally or otherwise) to use it. Let me see if I can respond directly to point two as it seems the most important. In the EPID provisioning model, the PCE and the ProVisioning Enclave (PVE) both have posession of the PROVISION attribute and thus access to the derivation of an MRSIGNER specific provisioning key. The Intel supplied launch enclave (LE) specifically denies initialization of enclaves which have the PROVISION attribute set, with the exception of enclaves whose MRSIGNER values match those of the keys that Intel uses to sign the PCE and PVE enclaves. See line 219 of psw/ae/le/launch_enclave.cpp in the Intel SDK. In the message one phase of the provisioning protocol Intel supplies the platform with a 3K RSA public key (PEK). The identity of that key is confirmed by an ECC256 based signature over the key. Intel embeds the public gx and gy curve points for the signature key in their SDK, see ae/data/constants/linux/peksk_pub.hh. The PVE verifies the signature of the PEK and generates a SHA256 verification hash over a message containing the key and uses that as the data field in an attestation report that is generated against the target information for the PCE. The PVE rejects generation of the report if the PCE target information does not have the PROVISION attribute set. See line 124 of psw/ae/pve/provision_msg1.cpp. This report, along with the PEK, is submitted to the PCE enclave in order to generate the Platform Provisioning IDentity (PPID), which is the privacy critical identifier. The PCE verifies the report generated by the PVE and rejects the request to generate the PPID if the report was generated by an enclave that was not initialized with the PROVISION bit set, see line 109 of psw/ae/pce/pce.cpp. The PCE enclave then recomputes the message hash using the PEK that it is provided and verifies that the value matches the value in the data field of the attestation report from the PVE enclave. If the values do not match the PPID generation request is denied. The PCE enclave then encrypts the PPID with the PEK key and the encrypted PPID is returned to Intel to use as the platform identifier. The PPID is the CMAC over a 16 byte null message which uses a derived provisioning key based on CPUSVN and ISVSVN values set to zero. See the get_ppid() function in psw/ae/pce/pce_helper.cpp. I believe this effectively denies the ability of anyone other then Intel, who holds the private portion of the ECC256 signature key used to authenticate the PEK, from using the PCE enclave to generate a platform identifier. As I conceded above, there could be an academic deficiency in all this, I'm not qualified to comment. I believe there is a reasonably solid functional guarantee that on a locked platform the process cannot be easily subverted by a privacy aggressor. We contend that the model we propose below can also deliver this guarantee
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
On Sun, Nov 25, 2018 at 04:37:00PM -0800, Andy Lutomirski wrote: > Bah, I hit send on a partially written draft. I???ll try again soon. > > --Andy Your first issue seems to be complete so I will respond to that in order to make sure we are not talking past one another. > > On Nov 25, 2018, at 1:59 PM, Andy Lutomirski wrote: > >> On Nov 25, 2018, at 10:55 AM, Dr. Greg wrote: > >> > >> The notion of a launch enclave is critical to establishing these > >> guarantees. As soon as the kernel becomes involved in implementing > >> SGX security policy the architecture becomes vulnerable to kernel > >> and/or privilege modification attacks. > >> > >> We've talked at length about the provisioning bit, I won't go into > >> details unless people are interested, but the EPID provisioning > >> protocol implements an SGX mediated cryptographic contract that a > >> perpetual platform identifier will not be disclosed to anyone but > >> Intel. > As a reviewer, and as an occasional academic cryptographer, I need > to put my foot down here. This is inaccurate. I certainly wouldn't try to engage in a debate at the level of academic cryptography, so I want to clarify that I was speaking specifically with respect to the ability to use the Intel supplied Platform Certification Enclave (PCE) to obtain a perpetual platform identifier. There could certainly be an academic level weakness in SGX, or in the provisioning protocol, but at the end of the day the important issue seems to be whether or not a PCE enclave can be exploited by anyone with execution access to the enclave to generate a perpetual identifier for a platform. > There is an SGX-mediated contract that says: > > 1. For any given public key p, a perpetual platform identifier ID_p > exists and will only be disclosed to the holder of the corresponding > private key p_priv or to someone to whom the private key holder > permits (intentionally or otherwise) to use that identifier. > > 2. The ability described in #1 is available to anyone whom the > kernel and launch enclave (if the MSRs are locked ) permits > (intentionally or otherwise) to use it. Let me see if I can respond directly to point two as it seems the most important. In the EPID provisioning model, the PCE and the ProVisioning Enclave (PVE) both have posession of the PROVISION attribute and thus access to the derivation of an MRSIGNER specific provisioning key. The Intel supplied launch enclave (LE) specifically denies initialization of enclaves which have the PROVISION attribute set, with the exception of enclaves whose MRSIGNER values match those of the keys that Intel uses to sign the PCE and PVE enclaves. See line 219 of psw/ae/le/launch_enclave.cpp in the Intel SDK. In the message one phase of the provisioning protocol Intel supplies the platform with a 3K RSA public key (PEK). The identity of that key is confirmed by an ECC256 based signature over the key. Intel embeds the public gx and gy curve points for the signature key in their SDK, see ae/data/constants/linux/peksk_pub.hh. The PVE verifies the signature of the PEK and generates a SHA256 verification hash over a message containing the key and uses that as the data field in an attestation report that is generated against the target information for the PCE. The PVE rejects generation of the report if the PCE target information does not have the PROVISION attribute set. See line 124 of psw/ae/pve/provision_msg1.cpp. This report, along with the PEK, is submitted to the PCE enclave in order to generate the Platform Provisioning IDentity (PPID), which is the privacy critical identifier. The PCE verifies the report generated by the PVE and rejects the request to generate the PPID if the report was generated by an enclave that was not initialized with the PROVISION bit set, see line 109 of psw/ae/pce/pce.cpp. The PCE enclave then recomputes the message hash using the PEK that it is provided and verifies that the value matches the value in the data field of the attestation report from the PVE enclave. If the values do not match the PPID generation request is denied. The PCE enclave then encrypts the PPID with the PEK key and the encrypted PPID is returned to Intel to use as the platform identifier. The PPID is the CMAC over a 16 byte null message which uses a derived provisioning key based on CPUSVN and ISVSVN values set to zero. See the get_ppid() function in psw/ae/pce/pce_helper.cpp. I believe this effectively denies the ability of anyone other then Intel, who holds the private portion of the ECC256 signature key used to authenticate the PEK, from using the PCE enclave to generate a platform identifier. As I conceded above, there could be an academic deficiency in all this, I'm not qualified to comment. I believe there is a reasonably solid functional guarantee that on a locked platform the process cannot be easily subverted by a privacy aggressor. We contend that the model we propose below can also deliver this guarantee
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
Bah, I hit send on a partially written draft. I’ll try again soon. --Andy > On Nov 25, 2018, at 1:59 PM, Andy Lutomirski wrote: > > > >> On Nov 25, 2018, at 10:55 AM, Dr. Greg wrote: >> > >> >> >> The notion of a launch enclave is critical to establishing these >> guarantees. As soon as the kernel becomes involved in implementing >> SGX security policy the architecture becomes vulnerable to kernel >> and/or privilege modification attacks. >> >> We've talked at length about the provisioning bit, I won't go into >> details unless people are interested, but the EPID provisioning >> protocol implements an SGX mediated cryptographic contract that a >> perpetual platform identifier will not be disclosed to anyone but >> Intel. > > As a reviewer, and as an occasional academic cryptographer, I need to put my > foot down here. This is inaccurate. > > There is an SGX-mediated contract that says: > > 1. For any given public key p, a perpetual platform identifier ID_p exists > and will only be disclosed to the holder of the corresponding private key > p_priv or to someone to whom the private key holder permits (intentionally or > otherwise) to use that identifier. > > 2. The ability described in #1 is available to anyone whom the kernel and > launch enclave (if the MSRs are locked ) permits (intentionally or otherwise) > to use it. > > No, I have no clue why Intel did it this way. I consider it to be a mistake. > >> The launch enclave is critical to that guarantee. >> >> It is completely understandable why a locked down, (non-FLC) hardware >> platform, is undesirable in this community. That doesn't mean that a >> launch enclave as a concept is unneeded or necessarily evil. >> >> In an FLC environment the kernel assumes responsibility for SGX >> privacy and security. This means that we need to do at least as well >> with a kernel based model as to what is currently available. >> >>> There are other ways to accomplish it that might be better in some >>> respects. For example, there could be /dev/sgx and >>> /dev/sgx_rights/provision. The former exposes the whole sgx API, >>> except that it doesn???t allow provisioning by default. The latter >>> does nothing by itself. To run a provisioning enclave, you open both >>> nodes, then do something like: >>> >>> ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); >>> >>> This requires extra syscalls, but it doesn't have the combinatorial >>> explosion problem. >> >> Here is a proposal for the driver to add the needed policy control >> that is 'SGXy' in nature. The 'SGXy' way is to use MRSIGNER values as >> the currency for security policy management. >> >> The driver should establish the equivalent of three linked lists, >> maintainable via /sysfs pseudo-files or equivalent plumbing. The >> lists are referenced by the kernel to enforce the following policies. >> >> 1.) The right to initialize an enclave without special attributes. >> >> 2.) The right to initialize an enclave with the PROVISION_KEY attribute set. >> >> 3.) The right to initialize an enclave with the LICENSE_KEY attribute set. >> >> The lists are populated with MRSIGNER values of enclaves that are >> allowed to initialize under the specified conditions. > > NAK because this is insufficient. You’re thinking of a model in which > SGX-like protection is all that’s needed. This is an inadequate model of the > real world. The attack I’m most concerned about wrt provisioning is an > attack in which an unauthorized user is permitted > > The use case I see for attestation *privacy* > >> >> The driver should either establish a 'seal' file or value, >> ie. MRSIGNER value of all zero's, that once written will not allow >> further modifications of the list(s). This will allow >> cryptographically guaranteed policies to be setup at early boot that >> will limit the ability of subsequent DAC compromises to affect policy >> management. >> >> The lists are obviously vulnerable to a kernel compromise but the >> vulnerability scope is significantly limited vs. 'can I get root or >> some other userid'. If we are really concerned about the scope of >> that vulnerability there could be an option on TPM based systems to >> verify a hash value over the lists once sealed on each enclave >> initialization. We have already conceded that EINIT isn't going to be >> any type of speed daemon. >> >> On an FLC system the driver verifies that the submitted enclave has an >> MRSIGNER value on one of the lists consistent with the attributes of >> the enclave before loading the value into the identity modulus >> signature registers. >> >> In this model, I would argue that the driver does not need to >> arbitrarily exclude launch enclaves as it does now, since the kernel >> has the ability to specify acceptable launch enclaves. The driver API >> can alaso continue to accept an EINITTOKEN which maintains >> compatibility with the current ABI. Punishment can be inflicted on >> non-FLC
Re: [PATCH v17 18/23] platform/x86: Intel SGX driver
Bah, I hit send on a partially written draft. I’ll try again soon. --Andy > On Nov 25, 2018, at 1:59 PM, Andy Lutomirski wrote: > > > >> On Nov 25, 2018, at 10:55 AM, Dr. Greg wrote: >> > >> >> >> The notion of a launch enclave is critical to establishing these >> guarantees. As soon as the kernel becomes involved in implementing >> SGX security policy the architecture becomes vulnerable to kernel >> and/or privilege modification attacks. >> >> We've talked at length about the provisioning bit, I won't go into >> details unless people are interested, but the EPID provisioning >> protocol implements an SGX mediated cryptographic contract that a >> perpetual platform identifier will not be disclosed to anyone but >> Intel. > > As a reviewer, and as an occasional academic cryptographer, I need to put my > foot down here. This is inaccurate. > > There is an SGX-mediated contract that says: > > 1. For any given public key p, a perpetual platform identifier ID_p exists > and will only be disclosed to the holder of the corresponding private key > p_priv or to someone to whom the private key holder permits (intentionally or > otherwise) to use that identifier. > > 2. The ability described in #1 is available to anyone whom the kernel and > launch enclave (if the MSRs are locked ) permits (intentionally or otherwise) > to use it. > > No, I have no clue why Intel did it this way. I consider it to be a mistake. > >> The launch enclave is critical to that guarantee. >> >> It is completely understandable why a locked down, (non-FLC) hardware >> platform, is undesirable in this community. That doesn't mean that a >> launch enclave as a concept is unneeded or necessarily evil. >> >> In an FLC environment the kernel assumes responsibility for SGX >> privacy and security. This means that we need to do at least as well >> with a kernel based model as to what is currently available. >> >>> There are other ways to accomplish it that might be better in some >>> respects. For example, there could be /dev/sgx and >>> /dev/sgx_rights/provision. The former exposes the whole sgx API, >>> except that it doesn???t allow provisioning by default. The latter >>> does nothing by itself. To run a provisioning enclave, you open both >>> nodes, then do something like: >>> >>> ioctl(sgx, SGX_IOC_ADD_RIGHT, sgx_provisioning); >>> >>> This requires extra syscalls, but it doesn't have the combinatorial >>> explosion problem. >> >> Here is a proposal for the driver to add the needed policy control >> that is 'SGXy' in nature. The 'SGXy' way is to use MRSIGNER values as >> the currency for security policy management. >> >> The driver should establish the equivalent of three linked lists, >> maintainable via /sysfs pseudo-files or equivalent plumbing. The >> lists are referenced by the kernel to enforce the following policies. >> >> 1.) The right to initialize an enclave without special attributes. >> >> 2.) The right to initialize an enclave with the PROVISION_KEY attribute set. >> >> 3.) The right to initialize an enclave with the LICENSE_KEY attribute set. >> >> The lists are populated with MRSIGNER values of enclaves that are >> allowed to initialize under the specified conditions. > > NAK because this is insufficient. You’re thinking of a model in which > SGX-like protection is all that’s needed. This is an inadequate model of the > real world. The attack I’m most concerned about wrt provisioning is an > attack in which an unauthorized user is permitted > > The use case I see for attestation *privacy* > >> >> The driver should either establish a 'seal' file or value, >> ie. MRSIGNER value of all zero's, that once written will not allow >> further modifications of the list(s). This will allow >> cryptographically guaranteed policies to be setup at early boot that >> will limit the ability of subsequent DAC compromises to affect policy >> management. >> >> The lists are obviously vulnerable to a kernel compromise but the >> vulnerability scope is significantly limited vs. 'can I get root or >> some other userid'. If we are really concerned about the scope of >> that vulnerability there could be an option on TPM based systems to >> verify a hash value over the lists once sealed on each enclave >> initialization. We have already conceded that EINIT isn't going to be >> any type of speed daemon. >> >> On an FLC system the driver verifies that the submitted enclave has an >> MRSIGNER value on one of the lists consistent with the attributes of >> the enclave before loading the value into the identity modulus >> signature registers. >> >> In this model, I would argue that the driver does not need to >> arbitrarily exclude launch enclaves as it does now, since the kernel >> has the ability to specify acceptable launch enclaves. The driver API >> can alaso continue to accept an EINITTOKEN which maintains >> compatibility with the current ABI. Punishment can be inflicted on >> non-FLC