Re: [PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
On 10/31/2014 06:38 AM, Dave Hansen wrote: +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ + struct mpx_insn insn; + uint8_t bndregno; + unsigned long addr_vio; + + addr_vio = mpx_insn_decode(, regs); + + bndregno = X86_MODRM_REG(insn.modrm.value); + if (bndregno > 3) + return; + + /* Note: the upper 32 bits are ignored in 32-bit mode. */ + info->si_lower = (void __user *)(unsigned long) + (xsave_buf->bndregs.bndregs[2*bndregno]); + info->si_upper = (void __user *)(unsigned long) + (~xsave_buf->bndregs.bndregs[2*bndregno+1]); + info->si_addr_lsb = 0; + info->si_signo = SIGSEGV; + info->si_errno = 0; + info->si_code = SEGV_BNDERR; + info->si_addr = (void __user *)addr_vio; +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 611b6ec..b2a916b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -284,6 +284,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) unsigned long status; struct xsave_struct *xsave_buf; struct task_struct *tsk = current; + siginfo_t info; prev_state = exception_enter(); if (notify_die(DIE_TRAP, "bounds", regs, error_code, @@ -316,6 +317,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) break; case 1: /* Bound violation. */ + do_mpx_bounds(regs, , xsave_buf); + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, + error_code, ); + break; + case 0: /* No exception caused by Intel MPX operations. */ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); break; So, siginfo is stack-allocarted here. do_mpx_bounds() can error out if it sees an invalid bndregno. We still send the signal with the whether or not we filled the 'info' in do_mpx_bounds(). Can't this leak some kernel stack out in the 'info'? This should check the return value of do_mpx_bounds and should be fixed. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
On 10/31/2014 06:38 AM, Dave Hansen wrote: +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ + struct mpx_insn insn; + uint8_t bndregno; + unsigned long addr_vio; + + addr_vio = mpx_insn_decode(insn, regs); + + bndregno = X86_MODRM_REG(insn.modrm.value); + if (bndregno 3) + return; + + /* Note: the upper 32 bits are ignored in 32-bit mode. */ + info-si_lower = (void __user *)(unsigned long) + (xsave_buf-bndregs.bndregs[2*bndregno]); + info-si_upper = (void __user *)(unsigned long) + (~xsave_buf-bndregs.bndregs[2*bndregno+1]); + info-si_addr_lsb = 0; + info-si_signo = SIGSEGV; + info-si_errno = 0; + info-si_code = SEGV_BNDERR; + info-si_addr = (void __user *)addr_vio; +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 611b6ec..b2a916b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -284,6 +284,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) unsigned long status; struct xsave_struct *xsave_buf; struct task_struct *tsk = current; + siginfo_t info; prev_state = exception_enter(); if (notify_die(DIE_TRAP, bounds, regs, error_code, @@ -316,6 +317,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) break; case 1: /* Bound violation. */ + do_mpx_bounds(regs, info, xsave_buf); + do_trap(X86_TRAP_BR, SIGSEGV, bounds, regs, + error_code, info); + break; + case 0: /* No exception caused by Intel MPX operations. */ do_trap(X86_TRAP_BR, SIGSEGV, bounds, regs, error_code, NULL); break; So, siginfo is stack-allocarted here. do_mpx_bounds() can error out if it sees an invalid bndregno. We still send the signal with the info whether or not we filled the 'info' in do_mpx_bounds(). Can't this leak some kernel stack out in the 'info'? This should check the return value of do_mpx_bounds and should be fixed. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
On 10/28/2014 04:36 AM, Thomas Gleixner wrote: On Mon, 27 Oct 2014, Ren, Qiaowei wrote: On 2014-10-24, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. My question still stands why using the existing decoder is an issue. Performance is a complete non issue in case of a bounds violation and the interface argument is just silly, really. As hpa said, we only need to decode several mpx instructions including BNDCL/BNDCU, and general decoder looks like a little heavy. Peter, what do you think about it? You're repeating yourself. Care to read the discussion about this from the last round of review again? Ok. I will go through it again. Thanks. - Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 11/12] x86, mpx: cleanup unused bound tables
On 10/28/2014 04:49 AM, Thomas Gleixner wrote: On Mon, 27 Oct 2014, Ren Qiaowei wrote: If so, I guess that there are some questions needed to be considered: 1) Almost all palces which call do_munmap() will need to add mpx_pre_unmap/post_unmap calls, like vm_munmap(), mremap(), shmdt(), etc.. What's the problem with that? For example: shmdt() down_write(mm->mmap_sem); vma = find_vma(); while (vma) do_munmap(); up_write(mm->mmap_sem); We could not simply add mpx_pre_unmap() before do_munmap() or down_write(). And seems like it is a little hard for shmdt() to be changed to match this solution, right? 2) before mpx_post_unmap() call, it is possible for those bounds tables within mm->bd_remove_vmas to be re-used. In this case, userspace may do new mapping and access one address which will cover one of those bounds tables. During this period, HW will check if one bounds table exist, if yes one fault won't be produced. Errm. Before user space can use the bounds table for the new mapping it needs to add the entries, right? So: CPU 0 CPU 1 down_write(mm->bd_sem); mpx_pre_unmap(); clear bounds directory entries unmap(); map() write_bounds_entry() trap() down_read(mm->bd_sem); mpx_post_unmap(); up_write(mm->bd_sem); allocate_bounds_table(); That's the whole point of bd_sem. Yes. Got it. 3) According to Dave, those bounds tables related to adjacent VMAs within the start and the end possibly don't have to be fully unmmaped, and we only need free the part of backing physical memory. Care to explain why that's a problem? I guess you mean one new field mm->bd_remove_vmas should be added into staruct mm, right? For those VMAs which we only need to free part of backing physical memory, we could not clear bounds directory entries and should also mark the range of backing physical memory within this vma. If so, maybe there are too many new fields which will be added into mm struct, right? Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 10/12] x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT
On 10/28/2014 04:38 AM, Thomas Gleixner wrote: On Mon, 27 Oct 2014, Ren, Qiaowei wrote: On 2014-10-24, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: +int mpx_enable_management(struct task_struct *tsk) { + struct mm_struct *mm = tsk->mm; + void __user *bd_base = MPX_INVALID_BOUNDS_DIR; What's the point of initializing bd_base here. I had to look twice to figure out that it gets overwritten by task_get_bounds_dir() I just want to put task_get_bounds_dir() outside mm->mmap_sem holding. What you want is not interesting at all. What's interesting is what you do and what you send for review. I see. Thanks. - Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 11/12] x86, mpx: cleanup unused bound tables
On 10/28/2014 04:49 AM, Thomas Gleixner wrote: On Mon, 27 Oct 2014, Ren Qiaowei wrote: If so, I guess that there are some questions needed to be considered: 1) Almost all palces which call do_munmap() will need to add mpx_pre_unmap/post_unmap calls, like vm_munmap(), mremap(), shmdt(), etc.. What's the problem with that? For example: shmdt() down_write(mm-mmap_sem); vma = find_vma(); while (vma) do_munmap(); up_write(mm-mmap_sem); We could not simply add mpx_pre_unmap() before do_munmap() or down_write(). And seems like it is a little hard for shmdt() to be changed to match this solution, right? 2) before mpx_post_unmap() call, it is possible for those bounds tables within mm-bd_remove_vmas to be re-used. In this case, userspace may do new mapping and access one address which will cover one of those bounds tables. During this period, HW will check if one bounds table exist, if yes one fault won't be produced. Errm. Before user space can use the bounds table for the new mapping it needs to add the entries, right? So: CPU 0 CPU 1 down_write(mm-bd_sem); mpx_pre_unmap(); clear bounds directory entries unmap(); map() write_bounds_entry() trap() down_read(mm-bd_sem); mpx_post_unmap(); up_write(mm-bd_sem); allocate_bounds_table(); That's the whole point of bd_sem. Yes. Got it. 3) According to Dave, those bounds tables related to adjacent VMAs within the start and the end possibly don't have to be fully unmmaped, and we only need free the part of backing physical memory. Care to explain why that's a problem? I guess you mean one new field mm-bd_remove_vmas should be added into staruct mm, right? For those VMAs which we only need to free part of backing physical memory, we could not clear bounds directory entries and should also mark the range of backing physical memory within this vma. If so, maybe there are too many new fields which will be added into mm struct, right? Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 10/12] x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT
On 10/28/2014 04:38 AM, Thomas Gleixner wrote: On Mon, 27 Oct 2014, Ren, Qiaowei wrote: On 2014-10-24, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: +int mpx_enable_management(struct task_struct *tsk) { + struct mm_struct *mm = tsk-mm; + void __user *bd_base = MPX_INVALID_BOUNDS_DIR; What's the point of initializing bd_base here. I had to look twice to figure out that it gets overwritten by task_get_bounds_dir() I just want to put task_get_bounds_dir() outside mm-mmap_sem holding. What you want is not interesting at all. What's interesting is what you do and what you send for review. I see. Thanks. - Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
On 10/28/2014 04:36 AM, Thomas Gleixner wrote: On Mon, 27 Oct 2014, Ren, Qiaowei wrote: On 2014-10-24, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. My question still stands why using the existing decoder is an issue. Performance is a complete non issue in case of a bounds violation and the interface argument is just silly, really. As hpa said, we only need to decode several mpx instructions including BNDCL/BNDCU, and general decoder looks like a little heavy. Peter, what do you think about it? You're repeating yourself. Care to read the discussion about this from the last round of review again? Ok. I will go through it again. Thanks. - Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 05/12] x86, mpx: on-demand kernel allocation of bounds tables
On 10/24/2014 08:08 PM, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: + /* +* Go poke the address of the new bounds table in to the +* bounds directory entry out in userspace memory. Note: +* we may race with another CPU instantiating the same table. +* In that case the cmpxchg will see an unexpected +* 'actual_old_val'. +*/ + ret = user_atomic_cmpxchg_inatomic(_old_val, bd_entry, + expected_old_val, bt_addr); This is fully preemptible non-atomic context, right? So this wants a proper comment, why using user_atomic_cmpxchg_inatomic() is the right thing to do here. Well, we will address it. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 11/12] x86, mpx: cleanup unused bound tables
On 10/24/2014 10:40 PM, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. For now, to avoid deadlock, we disable page faults while touching the bounds directory entry. This keeps us from being able to free the tables in this case. This deficiency will be addressed in later patches. This is wrong to begin with. You need a proper design for addressing this short coming in the first place. Retrofitting it into your current design will just not work at all or end up with some major mess. The problem to solve is, that the kernel needs to unmap the bounds table if the data mapping which it covers goes away. You decided to do that at the end of unmap with mmap_sem write held. As I explained last time, that's not an absolute requirement, it's just a random choice. And that choice is obvioulsy not a really good one as your own observation of the page fault issue proves. So perhaps we should look at it the other way round. We already know before the actual unmap happens what's the start and the end of the area. So we can be way smarter and do the following: mpx_pre_unmap(from, len); down_write(mmap_sem); do_unmap(from, len); up_write(mmap_sem); mpx_post_unmap(mpx_unmap); int mpx_pre_unmap(...) { down_write(mm->bd_sem); down_read(mm->mmap_sem); Here we do the following: 1) Invalidate the bounds table entries for the to be unmapped area in the bounds directory. This can fault as we only hold mmap sem for read. 2) Mark the corresponding VMAs which should be unmapped and removed. This can be done with mmap sem down read held as the VMA chain cannot be changed and the 'Mark for removal" field is protected by mm->bd_sem. For each to be removed VMA we increment mm->bd_remove_vmas Holding mm->bd_sem also prevents that a new bound table to be inserted, if we do the whole protection thing right. up_read(mm->mmap_sem); } void mpx_post_unmap(void) { if (mm->bd_remove_vmas) { down_write(mm->mmap_sem); cleanup_to_be_removed_vmas(); up_write(mm->mmap_sem); } up_write(mm->bd_sem); } If so, I guess that there are some questions needed to be considered: 1) Almost all palces which call do_munmap() will need to add mpx_pre_unmap/post_unmap calls, like vm_munmap(), mremap(), shmdt(), etc.. 2) before mpx_post_unmap() call, it is possible for those bounds tables within mm->bd_remove_vmas to be re-used. In this case, userspace may do new mapping and access one address which will cover one of those bounds tables. During this period, HW will check if one bounds table exist, if yes one fault won't be produced. 3) According to Dave, those bounds tables related to adjacent VMAs within the start and the end possibly don't have to be fully unmmaped, and we only need free the part of backing physical memory. The mpx_pre_unmap/post_unmap calls can be either added explicit to the relevant down_write/unmap/up_write code pathes or you hide it behind some wrapper function. Now you need to acquire mm->bd_sem for the fault handling code as well in order to serialize the creation of new bound table entries. In that case a down_read(mm->bd_sem) is sufficient as the cmpxchg() prevents the insertion of multiple tables for the same directory entries. So we'd end up with the following serialization scheme: prctl enable/disable management: down_write(mm->bd_sem); bounds violation: down_read(mm->bd_sem); directory entry allocation: down_read(mm->bd_sem); directory entry removal: down_write(mm->bd_sem); Now we need to check whether there is a potential deadlock lurking around the corner. We get the following nesting scenarios: - prctl enable/disable management: down_write(mm->bd_sem); No mmap_sem nesting at all - bounds violation: down_read(mm->bd_sem); Might nest down_read(mm->mmap_sem) if the copy code from user space faults. - directory entry allocation:down_read(mm->bd_sem); Nests down_write(mm->mmap_sem) for the VMA allocation. Might nest down_read(mm->mmap_sem) if the cmpxchg() for the directory entry faults - directory entry removal:down_write(mm->bd_sem); Might nest down_read(mm->mmap_sem) if the invalidation of the directory entry faults In other words here is the possible nesting: #1 down_read(mm->bd_sem); down_read(mm->mmap_sem); #2 down_read(mm->bd_sem); down_write(mm->mmap_sem); #3 down_write(mm->bd_sem); down_read(mm->mmap_sem); #4 down_write(mm->bd_sem); down_write(mm->mmap_sem); We never execute any of those nested on a single task. The bounds fault handler is never nested as it always comes
RE: [PATCH v9 10/12] x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT
On 2014-10-24, Thomas Gleixner wrote: > On Sun, 12 Oct 2014, Qiaowei Ren wrote: >> +int mpx_enable_management(struct task_struct *tsk) { >> +struct mm_struct *mm = tsk->mm; >> +void __user *bd_base = MPX_INVALID_BOUNDS_DIR; > > What's the point of initializing bd_base here. I had to look twice to > figure out that it gets overwritten by task_get_bounds_dir() > I just want to put task_get_bounds_dir() outside mm->mmap_sem holding. >> @@ -285,6 +285,7 @@ dotraplinkage void do_bounds(struct pt_regs >> *regs, > long error_code) >> struct xsave_struct *xsave_buf; >> struct task_struct *tsk = current; >> siginfo_t info; >> +int ret = 0; >> >> prev_state = exception_enter(); >> if (notify_die(DIE_TRAP, "bounds", regs, error_code, @@ -312,8 >> +313,35 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long > error_code) >> */ >> switch (status & MPX_BNDSTA_ERROR_CODE) { >> case 2: /* Bound directory has invalid entry. */ >> -if (do_mpx_bt_fault(xsave_buf)) >> +down_write(>mm->mmap_sem); > > The handling of mm->mmap_sem here is horrible. The only reason why you > want to hold mmap_sem write locked in the first place is that you want > to cover the allocation and the mm->bd_addr check. > > I think it's wrong to tie this to mmap_sem in the first place. If MPX > is enabled then you should have mm->bd_addr and an explicit mutex to protect > it. > > So the logic would look like this: > >mutex_lock(>bd_mutex); >if (!kernel_managed(mm)) > do_trap(); else if (do_mpx_bt_fault()) force_sig(); >mutex_unlock(>bd_mutex); > No tricks with mmap_sem, no special return value handling. Straight > forward code instead of a convoluted and error prone mess. > > Hmm? > I guess this is a good solution. If so, new field 'bd_sem' have to be added into struct mm_struct. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
On 2014-10-24, Thomas Gleixner wrote: > On Sun, 12 Oct 2014, Qiaowei Ren wrote: > >> This patch sets bound violation fields of siginfo struct in #BR >> exception handler by decoding the user instruction and constructing >> the faulting pointer. >> >> This patch does't use the generic decoder, and implements a limited >> special-purpose decoder to decode MPX instructions, simply because >> the generic decoder is very heavyweight not just in terms of >> performance but in terms of interface -- because it has to. > > My question still stands why using the existing decoder is an issue. > Performance is a complete non issue in case of a bounds violation and > the interface argument is just silly, really. > As hpa said, we only need to decode several mpx instructions including BNDCL/BNDCU, and general decoder looks like a little heavy. Peter, what do you think about it? Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
On 2014-10-24, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. My question still stands why using the existing decoder is an issue. Performance is a complete non issue in case of a bounds violation and the interface argument is just silly, really. As hpa said, we only need to decode several mpx instructions including BNDCL/BNDCU, and general decoder looks like a little heavy. Peter, what do you think about it? Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v9 10/12] x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT
On 2014-10-24, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: +int mpx_enable_management(struct task_struct *tsk) { +struct mm_struct *mm = tsk-mm; +void __user *bd_base = MPX_INVALID_BOUNDS_DIR; What's the point of initializing bd_base here. I had to look twice to figure out that it gets overwritten by task_get_bounds_dir() I just want to put task_get_bounds_dir() outside mm-mmap_sem holding. @@ -285,6 +285,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) struct xsave_struct *xsave_buf; struct task_struct *tsk = current; siginfo_t info; +int ret = 0; prev_state = exception_enter(); if (notify_die(DIE_TRAP, bounds, regs, error_code, @@ -312,8 +313,35 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) */ switch (status MPX_BNDSTA_ERROR_CODE) { case 2: /* Bound directory has invalid entry. */ -if (do_mpx_bt_fault(xsave_buf)) +down_write(current-mm-mmap_sem); The handling of mm-mmap_sem here is horrible. The only reason why you want to hold mmap_sem write locked in the first place is that you want to cover the allocation and the mm-bd_addr check. I think it's wrong to tie this to mmap_sem in the first place. If MPX is enabled then you should have mm-bd_addr and an explicit mutex to protect it. So the logic would look like this: mutex_lock(mm-bd_mutex); if (!kernel_managed(mm)) do_trap(); else if (do_mpx_bt_fault()) force_sig(); mutex_unlock(mm-bd_mutex); No tricks with mmap_sem, no special return value handling. Straight forward code instead of a convoluted and error prone mess. Hmm? I guess this is a good solution. If so, new field 'bd_sem' have to be added into struct mm_struct. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 11/12] x86, mpx: cleanup unused bound tables
On 10/24/2014 10:40 PM, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. For now, to avoid deadlock, we disable page faults while touching the bounds directory entry. This keeps us from being able to free the tables in this case. This deficiency will be addressed in later patches. This is wrong to begin with. You need a proper design for addressing this short coming in the first place. Retrofitting it into your current design will just not work at all or end up with some major mess. The problem to solve is, that the kernel needs to unmap the bounds table if the data mapping which it covers goes away. You decided to do that at the end of unmap with mmap_sem write held. As I explained last time, that's not an absolute requirement, it's just a random choice. And that choice is obvioulsy not a really good one as your own observation of the page fault issue proves. So perhaps we should look at it the other way round. We already know before the actual unmap happens what's the start and the end of the area. So we can be way smarter and do the following: mpx_pre_unmap(from, len); down_write(mmap_sem); do_unmap(from, len); up_write(mmap_sem); mpx_post_unmap(mpx_unmap); int mpx_pre_unmap(...) { down_write(mm-bd_sem); down_read(mm-mmap_sem); Here we do the following: 1) Invalidate the bounds table entries for the to be unmapped area in the bounds directory. This can fault as we only hold mmap sem for read. 2) Mark the corresponding VMAs which should be unmapped and removed. This can be done with mmap sem down read held as the VMA chain cannot be changed and the 'Mark for removal field is protected by mm-bd_sem. For each to be removed VMA we increment mm-bd_remove_vmas Holding mm-bd_sem also prevents that a new bound table to be inserted, if we do the whole protection thing right. up_read(mm-mmap_sem); } void mpx_post_unmap(void) { if (mm-bd_remove_vmas) { down_write(mm-mmap_sem); cleanup_to_be_removed_vmas(); up_write(mm-mmap_sem); } up_write(mm-bd_sem); } If so, I guess that there are some questions needed to be considered: 1) Almost all palces which call do_munmap() will need to add mpx_pre_unmap/post_unmap calls, like vm_munmap(), mremap(), shmdt(), etc.. 2) before mpx_post_unmap() call, it is possible for those bounds tables within mm-bd_remove_vmas to be re-used. In this case, userspace may do new mapping and access one address which will cover one of those bounds tables. During this period, HW will check if one bounds table exist, if yes one fault won't be produced. 3) According to Dave, those bounds tables related to adjacent VMAs within the start and the end possibly don't have to be fully unmmaped, and we only need free the part of backing physical memory. The mpx_pre_unmap/post_unmap calls can be either added explicit to the relevant down_write/unmap/up_write code pathes or you hide it behind some wrapper function. Now you need to acquire mm-bd_sem for the fault handling code as well in order to serialize the creation of new bound table entries. In that case a down_read(mm-bd_sem) is sufficient as the cmpxchg() prevents the insertion of multiple tables for the same directory entries. So we'd end up with the following serialization scheme: prctl enable/disable management: down_write(mm-bd_sem); bounds violation: down_read(mm-bd_sem); directory entry allocation: down_read(mm-bd_sem); directory entry removal: down_write(mm-bd_sem); Now we need to check whether there is a potential deadlock lurking around the corner. We get the following nesting scenarios: - prctl enable/disable management: down_write(mm-bd_sem); No mmap_sem nesting at all - bounds violation: down_read(mm-bd_sem); Might nest down_read(mm-mmap_sem) if the copy code from user space faults. - directory entry allocation:down_read(mm-bd_sem); Nests down_write(mm-mmap_sem) for the VMA allocation. Might nest down_read(mm-mmap_sem) if the cmpxchg() for the directory entry faults - directory entry removal:down_write(mm-bd_sem); Might nest down_read(mm-mmap_sem) if the invalidation of the directory entry faults In other words here is the possible nesting: #1 down_read(mm-bd_sem); down_read(mm-mmap_sem); #2 down_read(mm-bd_sem); down_write(mm-mmap_sem); #3 down_write(mm-bd_sem); down_read(mm-mmap_sem); #4 down_write(mm-bd_sem); down_write(mm-mmap_sem); We never execute any of those nested on a single task. The bounds fault handler is never nested as it always comes from user space. So we should be
Re: [PATCH v9 05/12] x86, mpx: on-demand kernel allocation of bounds tables
On 10/24/2014 08:08 PM, Thomas Gleixner wrote: On Sun, 12 Oct 2014, Qiaowei Ren wrote: + /* +* Go poke the address of the new bounds table in to the +* bounds directory entry out in userspace memory. Note: +* we may race with another CPU instantiating the same table. +* In that case the cmpxchg will see an unexpected +* 'actual_old_val'. +*/ + ret = user_atomic_cmpxchg_inatomic(actual_old_val, bd_entry, + expected_old_val, bt_addr); This is fully preemptible non-atomic context, right? So this wants a proper comment, why using user_atomic_cmpxchg_inatomic() is the right thing to do here. Well, we will address it. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-10-14, Hansen, Dave wrote: > On 07/20/2014 11:09 PM, Andi Kleen wrote: >> Qiaowei Ren writes: >>> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() >>> commands. These commands can be used to register and unregister MPX >>> related resource on the x86 platform. >> >> Please provide a manpage for the API. This is needed for proper >> review. Your description is far too vague. > > Qiaowei, have you written this manpage yet? I see the new patches, > but no manpage yet. It will be added into subsequent version or another mainline patchset. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-10-14, Hansen, Dave wrote: On 07/20/2014 11:09 PM, Andi Kleen wrote: Qiaowei Ren qiaowei@intel.com writes: This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. Please provide a manpage for the API. This is needed for proper review. Your description is far too vague. Qiaowei, have you written this manpage yet? I see the new patches, but no manpage yet. It will be added into subsequent version or another mainline patchset. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-18, Kevin Easton wrote: > On Thu, Sep 18, 2014 at 12:40:29AM +0000, Ren, Qiaowei wrote: >>> Would it be prudent to use an error code other than EINVAL for the >>> "hardware doesn't support it" case? >>> >> Seems like no specific error code for this case. > > ENXIO would probably be OK. It's not too important as long as it's > documented. > Yes. Looks like that ENXIO will be better. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-16, Kevin Easton wrote: > On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote: >> + >> +int mpx_register(struct task_struct *tsk) { >> +struct mm_struct *mm = tsk->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +/* >> + * runtime in the userspace will be responsible for allocation of >> + * the bounds directory. Then, it will save the base of the bounds >> + * directory into XSAVE/XRSTOR Save Area and enable MPX through >> + * XRSTOR instruction. >> + * >> + * fpu_xsave() is expected to be very expensive. In order to do >> + * performance optimization, here we get the base of the bounds >> + * directory and then save it into mm_struct to be used in future. >> + */ >> +mm->bd_addr = task_get_bounds_dir(tsk); >> +if (!mm->bd_addr) >> +return -EINVAL; >> + >> +return 0; >> +} >> + >> +int mpx_unregister(struct task_struct *tsk) { >> +struct mm_struct *mm = current->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +mm->bd_addr = NULL; >> +return 0; >> +} > > If that's changed, then mpx_register() and mpx_unregister() don't need > a task_struct, just an mm_struct. > Yes. An mm_struct is enough. > Probably these functions should be locking mmap_sem. > > Would it be prudent to use an error code other than EINVAL for the > "hardware doesn't support it" case? > Seems like no specific error code for this case. >> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned > long, arg2, unsigned long, arg3, >> me->mm->def_flags &= ~VM_NOHUGEPAGE; >> up_write(>mm->mmap_sem); >> break; >> +case PR_MPX_REGISTER: >> +error = MPX_REGISTER(me); >> +break; >> +case PR_MPX_UNREGISTER: >> +error = MPX_UNREGISTER(me); >> +break; > > If you pass me->mm from prctl, that makes it clear that it's > per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. > > This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if > it's not using them, otherwise you'll be sunk if you ever want to use them > later. > > It seems like it only makes sense for all threads using the mm to have > the same bounds directory set. If the interface was changed to > directly pass the address, then could the kernel take care of setting > it for *all* of the threads in the process? This seems like something > that would be easier for the kernel to do than userspace. > If the interface was changed to this, it will be possible for insane application to pass error bounds directory address to kernel. We still have to call fpu_xsave() to check this. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-16, Kevin Easton wrote: On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote: + +int mpx_register(struct task_struct *tsk) { +struct mm_struct *mm = tsk-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +/* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ +mm-bd_addr = task_get_bounds_dir(tsk); +if (!mm-bd_addr) +return -EINVAL; + +return 0; +} + +int mpx_unregister(struct task_struct *tsk) { +struct mm_struct *mm = current-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +mm-bd_addr = NULL; +return 0; +} If that's changed, then mpx_register() and mpx_unregister() don't need a task_struct, just an mm_struct. Yes. An mm_struct is enough. Probably these functions should be locking mmap_sem. Would it be prudent to use an error code other than EINVAL for the hardware doesn't support it case? Seems like no specific error code for this case. @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, me-mm-def_flags = ~VM_NOHUGEPAGE; up_write(me-mm-mmap_sem); break; +case PR_MPX_REGISTER: +error = MPX_REGISTER(me); +break; +case PR_MPX_UNREGISTER: +error = MPX_UNREGISTER(me); +break; If you pass me-mm from prctl, that makes it clear that it's per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's not using them, otherwise you'll be sunk if you ever want to use them later. It seems like it only makes sense for all threads using the mm to have the same bounds directory set. If the interface was changed to directly pass the address, then could the kernel take care of setting it for *all* of the threads in the process? This seems like something that would be easier for the kernel to do than userspace. If the interface was changed to this, it will be possible for insane application to pass error bounds directory address to kernel. We still have to call fpu_xsave() to check this. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-18, Kevin Easton wrote: On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote: Would it be prudent to use an error code other than EINVAL for the hardware doesn't support it case? Seems like no specific error code for this case. ENXIO would probably be OK. It's not too important as long as it's documented. Yes. Looks like that ENXIO will be better. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
On 2014-09-16, Hansen, Dave wrote: > On 09/11/2014 01:46 AM, Qiaowei Ren wrote: >> +/* >> + * Free the backing physical pages of bounds table 'bt_addr'. >> + * Assume start...end is within that bounds table. >> + */ >> +static int __must_check zap_bt_entries(struct mm_struct *mm, >> +unsigned long bt_addr, >> +unsigned long start, unsigned long end) { >> +struct vm_area_struct *vma; >> + >> +/* Find the vma which overlaps this bounds table */ >> +vma = find_vma(mm, bt_addr); >> +/* >> + * The table entry comes from userspace and could be >> + * pointing anywhere, so make sure it is at least >> + * pointing to valid memory. >> + */ >> +if (!vma || !(vma->vm_flags & VM_MPX) || >> +vma->vm_start > bt_addr || >> +vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES) >> +return -EINVAL; > > If someone did *ANYTHING* to split the VMA, this check would fail. I > think that's a little draconian, considering that somebody could do a > NUMA policy on part of a VM_MPX VMA and cause it to be split. > > This check should look across the entire 'bt_addr -> > bt_addr+MPX_BT_SIZE_BYTES' range, find all of the VM_MPX VMAs, and zap > only those. > > If we encounter a non-VM_MPX vma, it should be ignored. > Ok. >> +if (ret == -EFAULT) >> +return ret; >> + >> +/* >> + * unmap those bounds table which are entirely covered in this >> + * virtual address region. >> + */ > > Entirely covered *AND* not at the edges, right? > Yes. >> +bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start); >> +bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1); >> +for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) { > > This needs a big fat comment that it is only freeing the bounds tables that > are 1. > fully covered 2. not at the edges of the mapping, even if full aligned > > Does this get any nicer if we have unmap_side_bts() *ONLY* go after > bounds tables that are partially owned by the region being unmapped? > > It seems like we really should do this: > > for (each bt fully owned) > unmap_single_bt() > if (start edge unaligned) > free start edge > if (end edge unaligned) > free end edge > > I bet the unmap_side_bts() code gets simpler if we do that, too. > Maybe. I will try this. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
On 2014-09-16, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static int __must_check zap_bt_entries(struct mm_struct *mm, +unsigned long bt_addr, +unsigned long start, unsigned long end) { +struct vm_area_struct *vma; + +/* Find the vma which overlaps this bounds table */ +vma = find_vma(mm, bt_addr); +/* + * The table entry comes from userspace and could be + * pointing anywhere, so make sure it is at least + * pointing to valid memory. + */ +if (!vma || !(vma-vm_flags VM_MPX) || +vma-vm_start bt_addr || +vma-vm_end bt_addr+MPX_BT_SIZE_BYTES) +return -EINVAL; If someone did *ANYTHING* to split the VMA, this check would fail. I think that's a little draconian, considering that somebody could do a NUMA policy on part of a VM_MPX VMA and cause it to be split. This check should look across the entire 'bt_addr - bt_addr+MPX_BT_SIZE_BYTES' range, find all of the VM_MPX VMAs, and zap only those. If we encounter a non-VM_MPX vma, it should be ignored. Ok. +if (ret == -EFAULT) +return ret; + +/* + * unmap those bounds table which are entirely covered in this + * virtual address region. + */ Entirely covered *AND* not at the edges, right? Yes. +bde_start = mm-bd_addr + MPX_GET_BD_ENTRY_OFFSET(start); +bde_end = mm-bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1); +for (bd_entry = bde_start + 1; bd_entry bde_end; bd_entry++) { This needs a big fat comment that it is only freeing the bounds tables that are 1. fully covered 2. not at the edges of the mapping, even if full aligned Does this get any nicer if we have unmap_side_bts() *ONLY* go after bounds tables that are partially owned by the region being unmapped? It seems like we really should do this: for (each bt fully owned) unmap_single_bt() if (start edge unaligned) free start edge if (end edge unaligned) free end edge I bet the unmap_side_bts() code gets simpler if we do that, too. Maybe. I will try this. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-15, One Thousand Gnomes wrote: >> The base of the bounds directory is set into mm_struct during >> PR_MPX_REGISTER command execution. This member can be used to check >> whether one application is mpx enabled. > > Not really because by the time you ask the question another thread > might have decided to unregister it. > > >> +int mpx_register(struct task_struct *tsk) { >> +struct mm_struct *mm = tsk->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +/* >> + * runtime in the userspace will be responsible for allocation of >> + * the bounds directory. Then, it will save the base of the bounds >> + * directory into XSAVE/XRSTOR Save Area and enable MPX through >> + * XRSTOR instruction. >> + * >> + * fpu_xsave() is expected to be very expensive. In order to do >> + * performance optimization, here we get the base of the bounds >> + * directory and then save it into mm_struct to be used in future. >> + */ >> +mm->bd_addr = task_get_bounds_dir(tsk); >> +if (!mm->bd_addr) >> +return -EINVAL; > > What stops two threads calling this in parallel ? >> + >> +return 0; >> +} >> + >> +int mpx_unregister(struct task_struct *tsk) { >> +struct mm_struct *mm = current->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +mm->bd_addr = NULL; > > or indeed calling this in parallel > > What are the semantics across execve() ? > This will not impact on the semantics of execve(). One runtime library for MPX will be provided (or merged into Glibc), and when the application starts, this runtime will be called to initialize MPX runtime environment, including calling prctl() to notify the kernel to start managing the bounds directories. You can see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 It would be extremely unusual for an application to have some MPX and some non-MPX threads, since they would share the same address space and the non-MPX threads would mess up the bounds. That is to say, it looks like be unusual for one of these threads to call prctl() to enable or disable MPX. I guess we need to add some rules into documentation. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-15, One Thousand Gnomes wrote: The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Not really because by the time you ask the question another thread might have decided to unregister it. +int mpx_register(struct task_struct *tsk) { +struct mm_struct *mm = tsk-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +/* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ +mm-bd_addr = task_get_bounds_dir(tsk); +if (!mm-bd_addr) +return -EINVAL; What stops two threads calling this in parallel ? + +return 0; +} + +int mpx_unregister(struct task_struct *tsk) { +struct mm_struct *mm = current-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +mm-bd_addr = NULL; or indeed calling this in parallel What are the semantics across execve() ? This will not impact on the semantics of execve(). One runtime library for MPX will be provided (or merged into Glibc), and when the application starts, this runtime will be called to initialize MPX runtime environment, including calling prctl() to notify the kernel to start managing the bounds directories. You can see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 It would be extremely unusual for an application to have some MPX and some non-MPX threads, since they would share the same address space and the non-MPX threads would mess up the bounds. That is to say, it looks like be unusual for one of these threads to call prctl() to enable or disable MPX. I guess we need to add some rules into documentation. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
On 2014-09-13, Hansen, Dave wrote: > On 09/11/2014 01:46 AM, Qiaowei Ren wrote: >> +static int allocate_bt(long __user *bd_entry) { >> +unsigned long bt_addr, old_val = 0; >> +int ret = 0; >> + >> +bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); >> +if (IS_ERR((void *)bt_addr)) >> +return bt_addr; >> +bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | > MPX_BD_ENTRY_VALID_FLAG; > > Qiaowei, why do we need the "& MPX_BT_ADDR_MASK" here? It should be not necessary, and can be removed. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
On 2014-09-12, Thomas Gleixner wrote: > On Fri, 12 Sep 2014, Ren, Qiaowei wrote: >> On 2014-09-12, Thomas Gleixner wrote: >>> On Thu, 11 Sep 2014, Qiaowei Ren wrote: >>> >>>> Due to new fields about bound violation added into struct >>>> siginfo, this patch syncs it with general version to avoid build issue. >>> >>> You completely fail to explain which build issue is addressed by >>> this patch. The code you added to kernel/signal.c which accesses >>> _addr_bnd is guarded by >>> >>> +#ifdef SEGV_BNDERR >>> >>> which is not defined my MIPS. Also why is this only affecting MIPS >>> and not any other architecture which provides its own struct siginfo ? >>> >>> That patch makes no sense at all, at least not without a proper explanation. >>> >> For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will >> include general siginfo.h, and only replace general stuct siginfo >> with mips specific struct siginfo. So SEGV_BNDERR will be defined >> for all archs, and we will get error like "no _lower in struct >> siginfo" when arch=mips. >> >> In addition, only MIPS arch define its own struct siginfo, so this >> is only affecting MIPS. > > So IA64 does not count as an architecture and therefor does not need > the same treatment, right? > struct siginfo for IA64 should be also synced. I will do this next post. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
On 2014-09-12, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Ren, Qiaowei wrote: On 2014-09-12, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Qiaowei Ren wrote: Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. You completely fail to explain which build issue is addressed by this patch. The code you added to kernel/signal.c which accesses _addr_bnd is guarded by +#ifdef SEGV_BNDERR which is not defined my MIPS. Also why is this only affecting MIPS and not any other architecture which provides its own struct siginfo ? That patch makes no sense at all, at least not without a proper explanation. For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will include general siginfo.h, and only replace general stuct siginfo with mips specific struct siginfo. So SEGV_BNDERR will be defined for all archs, and we will get error like no _lower in struct siginfo when arch=mips. In addition, only MIPS arch define its own struct siginfo, so this is only affecting MIPS. So IA64 does not count as an architecture and therefor does not need the same treatment, right? struct siginfo for IA64 should be also synced. I will do this next post. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
On 2014-09-13, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: +static int allocate_bt(long __user *bd_entry) { +unsigned long bt_addr, old_val = 0; +int ret = 0; + +bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); +if (IS_ERR((void *)bt_addr)) +return bt_addr; +bt_addr = (bt_addr MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; Qiaowei, why do we need the MPX_BT_ADDR_MASK here? It should be not necessary, and can be removed. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-11, Hansen, Dave wrote: > On 09/11/2014 01:46 AM, Qiaowei Ren wrote: >> + >> +return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u & >> +MPX_BNDCFG_ADDR_MASK); >> +} > > I don't think casting a u64 to a ulong, then to a pointer is useful. > Just take the '(unsigned long)' out. If so, this will spits out a warning on 32-bit: arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir': arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
On 2014-09-11, Hansen, Dave wrote: > On 09/11/2014 01:46 AM, Qiaowei Ren wrote: >> + * This function will be called by do_munmap(), and the VMAs >> + covering >> + * the virtual address region start...end have already been split >> + if >> + * necessary and remvoed from the VMA list. > > "remvoed" -> "removed" > >> +void mpx_unmap(struct mm_struct *mm, >> +unsigned long start, unsigned long end) { >> +int ret; >> + >> +ret = mpx_try_unmap(mm, start, end); >> +if (ret == -EINVAL) >> +force_sig(SIGSEGV, current); >> +} > > In the case of a fault during an unmap, this just ignores the > situation and returns silently. Where is the code to retry the > freeing operation outside of mmap_sem? Dave, you mean delayed_work code? According to our discussion, it will be deferred to another mainline post. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
On 2014-09-12, Thomas Gleixner wrote: > On Thu, 11 Sep 2014, Qiaowei Ren wrote: > >> Due to new fields about bound violation added into struct siginfo, >> this patch syncs it with general version to avoid build issue. > > You completely fail to explain which build issue is addressed by this > patch. The code you added to kernel/signal.c which accesses _addr_bnd > is guarded by > > +#ifdef SEGV_BNDERR > > which is not defined my MIPS. Also why is this only affecting MIPS and > not any other architecture which provides its own struct siginfo ? > > That patch makes no sense at all, at least not without a proper explanation. > For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will include general siginfo.h, and only replace general stuct siginfo with mips specific struct siginfo. So SEGV_BNDERR will be defined for all archs, and we will get error like "no _lower in struct siginfo" when arch=mips. In addition, only MIPS arch define its own struct siginfo, so this is only affecting MIPS. Thanks, Qiaowei > >> Signed-off-by: Qiaowei Ren >> --- >> arch/mips/include/uapi/asm/siginfo.h |4 >> 1 files changed, 4 insertions(+), 0 deletions(-) >> diff --git a/arch/mips/include/uapi/asm/siginfo.h >> b/arch/mips/include/uapi/asm/siginfo.h >> index e811744..d08f83f 100644 >> --- a/arch/mips/include/uapi/asm/siginfo.h >> +++ b/arch/mips/include/uapi/asm/siginfo.h >> @@ -92,6 +92,10 @@ typedef struct siginfo { >> int _trapno;/* TRAP # which caused the signal */ >> #endif >> short _addr_lsb; >> +struct { >> +void __user *_lower; >> +void __user *_upper; >> +} _addr_bnd; >> } _sigfault; >> >> /* SIGPOLL, SIGXFSZ (To do ...) */ >> -- >> 1.7.1 >> >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
On 2014-09-12, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Qiaowei Ren wrote: Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. You completely fail to explain which build issue is addressed by this patch. The code you added to kernel/signal.c which accesses _addr_bnd is guarded by +#ifdef SEGV_BNDERR which is not defined my MIPS. Also why is this only affecting MIPS and not any other architecture which provides its own struct siginfo ? That patch makes no sense at all, at least not without a proper explanation. For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will include general siginfo.h, and only replace general stuct siginfo with mips specific struct siginfo. So SEGV_BNDERR will be defined for all archs, and we will get error like no _lower in struct siginfo when arch=mips. In addition, only MIPS arch define its own struct siginfo, so this is only affecting MIPS. Thanks, Qiaowei Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; +struct { +void __user *_lower; +void __user *_upper; +} _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
On 2014-09-11, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: + * This function will be called by do_munmap(), and the VMAs + covering + * the virtual address region start...end have already been split + if + * necessary and remvoed from the VMA list. remvoed - removed +void mpx_unmap(struct mm_struct *mm, +unsigned long start, unsigned long end) { +int ret; + +ret = mpx_try_unmap(mm, start, end); +if (ret == -EINVAL) +force_sig(SIGSEGV, current); +} In the case of a fault during an unmap, this just ignores the situation and returns silently. Where is the code to retry the freeing operation outside of mmap_sem? Dave, you mean delayed_work code? According to our discussion, it will be deferred to another mainline post. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-11, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: + +return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u +MPX_BNDCFG_ADDR_MASK); +} I don't think casting a u64 to a ulong, then to a pointer is useful. Just take the '(unsigned long)' out. If so, this will spits out a warning on 32-bit: arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir': arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
On 2014-07-24, Hansen, Dave wrote: > On 07/23/2014 05:56 PM, Ren, Qiaowei wrote: >> On 2014-07-24, Hansen, Dave wrote: >>> On 07/22/2014 07:35 PM, Ren, Qiaowei wrote: >>>> The checking about MPX feature should be as follow: >>>> >>>> if (pcntxt_mask & XSTATE_EAGER) { >>>> if (eagerfpu == DISABLE) { >>>> pr_err("eagerfpu not present, disabling > some >>> xstate features: 0x%llx\n", >>>> pcntxt_mask & >>> XSTATE_EAGER); >>>> pcntxt_mask &= ~XSTATE_EAGER; } else { >>>> eagerfpu = ENABLE; >>>> } >>>> } >>>> This patch was merged into kernel the ending of last year >>>> (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/c >>>> om >>>> mi >>>> t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 ) >>> >>> Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here? >>> >>> This isn't major, but I can't _ever_ imagine a user being able to >>> track down why MPX is not working from this message. Should we >>> spruce it up somehow? >> >> Maybe. If the error log "disabling some xstate features:" is changed >> to "disabling MPX xstate features:", do you think it is OK? > > That's better. Is it really disabling MPX, though? > > And shouldn't we clear the cpu feature bit too? I am not sure. I am suspecting whether this checking should be moved before xstate_enable(). Peter, what do you think of it? Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables
On 2014-07-24, Hansen, Dave wrote: > On 07/23/2014 05:49 PM, Ren, Qiaowei wrote: >> I can check a lot of debug information when one VMA and related >> bounds tables are allocated and freed through adding a lot of >> print() like log into kernel/runtime. Do you think this is enough? > > I thought the entire reason we grabbed a VM_ flag was to make it > possible to figure out without resorting to this. All cleanup work certainly depends on this VM_ flag. In addition, as we discussed, this new VM_ flag can mainly have runtime know how much memory is occupied by MPX. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
On 2014-07-24, Hansen, Dave wrote: > On 07/22/2014 07:35 PM, Ren, Qiaowei wrote: >> The checking about MPX feature should be as follow: >> >> if (pcntxt_mask & XSTATE_EAGER) { >> if (eagerfpu == DISABLE) { >> pr_err("eagerfpu not present, disabling some > xstate features: 0x%llx\n", >> pcntxt_mask & > XSTATE_EAGER); >> pcntxt_mask &= ~XSTATE_EAGER; } else { eagerfpu >> = ENABLE; >> } >> } >> This patch was merged into kernel the ending of last year >> (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com >> mi >> t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 ) > > Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here? > > This isn't major, but I can't _ever_ imagine a user being able to > track down why MPX is not working from this message. Should we spruce it up > somehow? Maybe. If the error log "disabling some xstate features:" is changed to "disabling MPX xstate features:", do you think it is OK? Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables
On 2014-07-24, Hansen, Dave wrote: > On 07/20/2014 10:38 PM, Qiaowei Ren wrote: >> Since the kernel allocated those tables on-demand without userspace >> knowledge, it is also responsible for freeing them when the >> associated mappings go away. >> >> Here, the solution for this issue is to hook do_munmap() to check >> whether one process is MPX enabled. If yes, those bounds tables >> covered in the virtual address region which is being unmapped will >> be freed > also. > > This is the part of the code that I'm the most concerned about. > > Could you elaborate on how you've tested this to make sure it works OK? I can check a lot of debug information when one VMA and related bounds tables are allocated and freed through adding a lot of print() like log into kernel/runtime. Do you think this is enough? Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables
On 2014-07-24, Hansen, Dave wrote: On 07/20/2014 10:38 PM, Qiaowei Ren wrote: Since the kernel allocated those tables on-demand without userspace knowledge, it is also responsible for freeing them when the associated mappings go away. Here, the solution for this issue is to hook do_munmap() to check whether one process is MPX enabled. If yes, those bounds tables covered in the virtual address region which is being unmapped will be freed also. This is the part of the code that I'm the most concerned about. Could you elaborate on how you've tested this to make sure it works OK? I can check a lot of debug information when one VMA and related bounds tables are allocated and freed through adding a lot of print() like log into kernel/runtime. Do you think this is enough? Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
On 2014-07-24, Hansen, Dave wrote: On 07/22/2014 07:35 PM, Ren, Qiaowei wrote: The checking about MPX feature should be as follow: if (pcntxt_mask XSTATE_EAGER) { if (eagerfpu == DISABLE) { pr_err(eagerfpu not present, disabling some xstate features: 0x%llx\n, pcntxt_mask XSTATE_EAGER); pcntxt_mask = ~XSTATE_EAGER; } else { eagerfpu = ENABLE; } } This patch was merged into kernel the ending of last year (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com mi t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 ) Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here? This isn't major, but I can't _ever_ imagine a user being able to track down why MPX is not working from this message. Should we spruce it up somehow? Maybe. If the error log disabling some xstate features: is changed to disabling MPX xstate features:, do you think it is OK? Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables
On 2014-07-24, Hansen, Dave wrote: On 07/23/2014 05:49 PM, Ren, Qiaowei wrote: I can check a lot of debug information when one VMA and related bounds tables are allocated and freed through adding a lot of print() like log into kernel/runtime. Do you think this is enough? I thought the entire reason we grabbed a VM_ flag was to make it possible to figure out without resorting to this. All cleanup work certainly depends on this VM_ flag. In addition, as we discussed, this new VM_ flag can mainly have runtime know how much memory is occupied by MPX. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
On 2014-07-24, Hansen, Dave wrote: On 07/23/2014 05:56 PM, Ren, Qiaowei wrote: On 2014-07-24, Hansen, Dave wrote: On 07/22/2014 07:35 PM, Ren, Qiaowei wrote: The checking about MPX feature should be as follow: if (pcntxt_mask XSTATE_EAGER) { if (eagerfpu == DISABLE) { pr_err(eagerfpu not present, disabling some xstate features: 0x%llx\n, pcntxt_mask XSTATE_EAGER); pcntxt_mask = ~XSTATE_EAGER; } else { eagerfpu = ENABLE; } } This patch was merged into kernel the ending of last year (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/c om mi t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 ) Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here? This isn't major, but I can't _ever_ imagine a user being able to track down why MPX is not working from this message. Should we spruce it up somehow? Maybe. If the error log disabling some xstate features: is changed to disabling MPX xstate features:, do you think it is OK? That's better. Is it really disabling MPX, though? And shouldn't we clear the cpu feature bit too? I am not sure. I am suspecting whether this checking should be moved before xstate_enable(). Peter, what do you think of it? Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
On 2014-07-23, Hansen, Dave wrote: > On 07/20/2014 10:38 PM, Qiaowei Ren wrote: >> +#ifdef CONFIG_X86_INTEL_MPX >> +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) #else #define >> +cpu_has_mpx 0 #endif /* CONFIG_X86_INTEL_MPX */ > > Is this enough checking? Looking at the extension reference, it says: > >> 9.3.3 Enabling of Intel MPX States An OS can enable Intel MPX states to >> support software operation using bounds registers with the following >> steps: * Verify the processor supports XSAVE/XRSTOR/XSETBV/XGETBV >> instructions and XCR0 by checking CPUID.1.ECX.XSAVE[bit 26]=1. > > That, I assume the xsave code is already doing. > >> * Verify the processor supports both Intel MPX states by checking > CPUID.(EAX=0x0D, ECX=0):EAX[4:3] is 11b. > > I see these bits _attempting_ to get set in pcntxt_mask via XCNTXT_MASK. > But, I don't see us ever actually checking that they _do_ get set. > For instance, we do this for: > >> if ((pcntxt_mask & XSTATE_FPSSE) != XSTATE_FPSSE) { >> pr_err("FP/SSE not shown under xsave features > 0x%llx\n", >>pcntxt_mask); >> BUG(); >> } The checking about MPX feature should be as follow: if (pcntxt_mask & XSTATE_EAGER) { if (eagerfpu == DISABLE) { pr_err("eagerfpu not present, disabling some xstate features: 0x%llx\n", pcntxt_mask & XSTATE_EAGER); pcntxt_mask &= ~XSTATE_EAGER; } else { eagerfpu = ENABLE; } } This patch was merged into kernel the ending of last year (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e7d820a5e549b3eb6c3f9467507566565646a669 ) Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
On 2014-07-23, Hansen, Dave wrote: On 07/20/2014 10:38 PM, Qiaowei Ren wrote: +#ifdef CONFIG_X86_INTEL_MPX +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) #else #define +cpu_has_mpx 0 #endif /* CONFIG_X86_INTEL_MPX */ Is this enough checking? Looking at the extension reference, it says: 9.3.3 Enabling of Intel MPX States An OS can enable Intel MPX states to support software operation using bounds registers with the following steps: * Verify the processor supports XSAVE/XRSTOR/XSETBV/XGETBV instructions and XCR0 by checking CPUID.1.ECX.XSAVE[bit 26]=1. That, I assume the xsave code is already doing. * Verify the processor supports both Intel MPX states by checking CPUID.(EAX=0x0D, ECX=0):EAX[4:3] is 11b. I see these bits _attempting_ to get set in pcntxt_mask via XCNTXT_MASK. But, I don't see us ever actually checking that they _do_ get set. For instance, we do this for: if ((pcntxt_mask XSTATE_FPSSE) != XSTATE_FPSSE) { pr_err(FP/SSE not shown under xsave features 0x%llx\n, pcntxt_mask); BUG(); } The checking about MPX feature should be as follow: if (pcntxt_mask XSTATE_EAGER) { if (eagerfpu == DISABLE) { pr_err(eagerfpu not present, disabling some xstate features: 0x%llx\n, pcntxt_mask XSTATE_EAGER); pcntxt_mask = ~XSTATE_EAGER; } else { eagerfpu = ENABLE; } } This patch was merged into kernel the ending of last year (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e7d820a5e549b3eb6c3f9467507566565646a669 ) Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-07-21, Andi Kleen wrote: > Qiaowei Ren writes: >> + */ >> +#ifdef CONFIG_X86_64 >> +insn->x86_64 = 1; >> +insn->addr_bytes = 8; >> +#else >> +insn->x86_64 = 0; >> +insn->addr_bytes = 4; >> +#endif > > How would that handle compat mode on a 64bit kernel? > Should likely look at the code segment instead of ifdef. >> +/* Note: the upper 32 bits are ignored in 32-bit mode. */ > > Again correct for compat mode? I believe the upper bits are undefined. > Compat mode will be supported on next patch in future, as 0/ patch mentioned. ^-^ Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-07-21, Andi Kleen wrote: Qiaowei Ren qiaowei@intel.com writes: + */ +#ifdef CONFIG_X86_64 +insn-x86_64 = 1; +insn-addr_bytes = 8; +#else +insn-x86_64 = 0; +insn-addr_bytes = 4; +#endif How would that handle compat mode on a 64bit kernel? Should likely look at the code segment instead of ifdef. +/* Note: the upper 32 bits are ignored in 32-bit mode. */ Again correct for compat mode? I believe the upper bits are undefined. Compat mode will be supported on next patch in future, as 0/ patch mentioned. ^-^ Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface
On 2014-06-25, Andy Lutomirski wrote: > On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei > wrote: >> On 2014-06-24, Andy Lutomirski wrote: >>>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>>> Can the new vm_operation "name" be use for this? The magic >>>>> "always written to core dumps" feature might need to be reconsidered. >>>> >>>> One thing I'd like to avoid is an MPX vma getting merged with a >>>> non-MPX vma. I don't see any code to prevent two VMAs with >>>> different vm_ops->names from getting merged. That seems like a >>>> bit of a design oversight for ->name. Right? >>> >>> AFAIK there are no ->name users that don't also set ->close, for >>> exactly that reason. I'd be okay with adding a check for ->name, too. >>> >>> Hmm. If MPX vmas had a real struct file attached, this would all >>> come for free. Maybe vmas with non-default vm_ops and file != NULL >>> should never be mergeable? >>> >>>> >>>> Thinking out loud a bit... There are also some more complicated >>>> but more performant cleanup mechanisms that I'd like to go after in the >>>> future. >>>> Given a page, we might want to figure out if it is an MPX page or not. >>>> I wonder if we'll ever collide with some other user of vm_ops->name. >>>> It looks fairly narrowly used at the moment, but would this keep >>>> us from putting these pages on, say, a tmpfs mount? Doesn't look >>>> that way at the moment. >>> >>> You could always check the vm_ops pointer to see if it's MPX. >>> >>> One feature I've wanted: a way to have special per-process vmas that >>> can be easily found. For example, I want to be able to efficiently >>> find out where the vdso and vvar vmas are. I don't think this is >>> currently supported. >>> >> Andy, if you add a check for ->name to avoid the MPX vmas merged >> with > non-MPX vmas, I guess the work flow should be as follow (use > _install_special_mapping to get a new vma): >> >> unsigned long mpx_mmap(unsigned long len) { >> .. >> static struct vm_special_mapping mpx_mapping = { >> .name = "[mpx]", >> .pages = no_pages, >> }; >> >> ... vma = _install_special_mapping(mm, addr, len, vm_flags, >> _mapping); .. >> } >> >> Then, we could check the ->name to see if the VMA is MPX specific. Right? > > Does this actually create a vma backed with real memory? Doesn't this > need to go through anon_vma or something? _install_special_mapping > completely prevents merging. > Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. Thanks, Qiaowei N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface
On 2014-06-25, Andy Lutomirski wrote: On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei qiaowei@intel.com wrote: On 2014-06-24, Andy Lutomirski wrote: On 06/23/2014 01:06 PM, Andy Lutomirski wrote: Can the new vm_operation name be use for this? The magic always written to core dumps feature might need to be reconsidered. One thing I'd like to avoid is an MPX vma getting merged with a non-MPX vma. I don't see any code to prevent two VMAs with different vm_ops-names from getting merged. That seems like a bit of a design oversight for -name. Right? AFAIK there are no -name users that don't also set -close, for exactly that reason. I'd be okay with adding a check for -name, too. Hmm. If MPX vmas had a real struct file attached, this would all come for free. Maybe vmas with non-default vm_ops and file != NULL should never be mergeable? Thinking out loud a bit... There are also some more complicated but more performant cleanup mechanisms that I'd like to go after in the future. Given a page, we might want to figure out if it is an MPX page or not. I wonder if we'll ever collide with some other user of vm_ops-name. It looks fairly narrowly used at the moment, but would this keep us from putting these pages on, say, a tmpfs mount? Doesn't look that way at the moment. You could always check the vm_ops pointer to see if it's MPX. One feature I've wanted: a way to have special per-process vmas that can be easily found. For example, I want to be able to efficiently find out where the vdso and vvar vmas are. I don't think this is currently supported. Andy, if you add a check for -name to avoid the MPX vmas merged with non-MPX vmas, I guess the work flow should be as follow (use _install_special_mapping to get a new vma): unsigned long mpx_mmap(unsigned long len) { .. static struct vm_special_mapping mpx_mapping = { .name = [mpx], .pages = no_pages, }; ... vma = _install_special_mapping(mm, addr, len, vm_flags, mpx_mapping); .. } Then, we could check the -name to see if the VMA is MPX specific. Right? Does this actually create a vma backed with real memory? Doesn't this need to go through anon_vma or something? _install_special_mapping completely prevents merging. Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. So, could you tell me how to set MPX specific -name to the vma when it is created? Seems like that I could not find such interface. Thanks, Qiaowei N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface
On 2014-06-24, Andy Lutomirski wrote: >> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>> Can the new vm_operation "name" be use for this? The magic "always >>> written to core dumps" feature might need to be reconsidered. >> >> One thing I'd like to avoid is an MPX vma getting merged with a >> non-MPX vma. I don't see any code to prevent two VMAs with >> different vm_ops->names from getting merged. That seems like a bit >> of a design oversight for ->name. Right? > > AFAIK there are no ->name users that don't also set ->close, for > exactly that reason. I'd be okay with adding a check for ->name, too. > > Hmm. If MPX vmas had a real struct file attached, this would all come > for free. Maybe vmas with non-default vm_ops and file != NULL should > never be mergeable? > >> >> Thinking out loud a bit... There are also some more complicated but >> more performant cleanup mechanisms that I'd like to go after in the future. >> Given a page, we might want to figure out if it is an MPX page or not. >> I wonder if we'll ever collide with some other user of vm_ops->name. >> It looks fairly narrowly used at the moment, but would this keep us >> from putting these pages on, say, a tmpfs mount? Doesn't look that >> way at the moment. > > You could always check the vm_ops pointer to see if it's MPX. > > One feature I've wanted: a way to have special per-process vmas that > can be easily found. For example, I want to be able to efficiently > find out where the vdso and vvar vmas are. I don't think this is currently > supported. > Andy, if you add a check for ->name to avoid the MPX vmas merged with non-MPX vmas, I guess the work flow should be as follow (use _install_special_mapping to get a new vma): unsigned long mpx_mmap(unsigned long len) { .. static struct vm_special_mapping mpx_mapping = { .name = "[mpx]", .pages = no_pages, }; ... vma = _install_special_mapping(mm, addr, len, vm_flags, _mapping); .. } Then, we could check the ->name to see if the VMA is MPX specific. Right? Thanks, Qiaowei
RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface
On 2014-06-24, Andy Lutomirski wrote: >> +/* Make bounds tables and bouds directory unlocked. */ >> +if (vm_flags & VM_LOCKED) >> +vm_flags &= ~VM_LOCKED; > > Why? I would expect MCL_FUTURE to lock these. > Andy, I was just a little confused about LOCKED & POPULATE earlier and I thought VM_LOCKED is not necessary for MPX specific bounds tables. Now, this checking should be removed, and there should be mm_populate() for VM_LOCKED case after mmap_region(): if (!IS_ERR_VALUE(addr) && (vm_flags & VM_LOCKED)) mm_populate(addr, len); Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
On 2014-06-24, Andy Lutomirski wrote: > On 06/18/2014 02:44 AM, Qiaowei Ren wrote: >> +/* >> + * The error code field of the BNDSTATUS register communicates status >> + * information of a bound range exception #BR or operation involving >> + * bound directory. >> + */ >> +switch (status & MPX_BNDSTA_ERROR_CODE) { >> +case 2: >> +/* >> + * Bound directory has invalid entry. >> + * No signal will be sent to the user space. > > This comment is a lie. > Hmm, thanks. >> + */ >> +if (do_mpx_bt_fault(xsave_buf)) >> +force_sig(SIGBUS, tsk); > > Would it make sense to assign and use a new si_code value here? > There is a new si_code SEGV_BNDERR for bounds violation reported by MPX. But in this case, it is mainly due to the failure caused by allocation of bounds table. I guess it is not necessary to add another new si_code value. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
On 2014-06-24, Andy Lutomirski wrote: On 06/18/2014 02:44 AM, Qiaowei Ren wrote: +/* + * The error code field of the BNDSTATUS register communicates status + * information of a bound range exception #BR or operation involving + * bound directory. + */ +switch (status MPX_BNDSTA_ERROR_CODE) { +case 2: +/* + * Bound directory has invalid entry. + * No signal will be sent to the user space. This comment is a lie. Hmm, thanks. + */ +if (do_mpx_bt_fault(xsave_buf)) +force_sig(SIGBUS, tsk); Would it make sense to assign and use a new si_code value here? There is a new si_code SEGV_BNDERR for bounds violation reported by MPX. But in this case, it is mainly due to the failure caused by allocation of bounds table. I guess it is not necessary to add another new si_code value. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface
On 2014-06-24, Andy Lutomirski wrote: +/* Make bounds tables and bouds directory unlocked. */ +if (vm_flags VM_LOCKED) +vm_flags = ~VM_LOCKED; Why? I would expect MCL_FUTURE to lock these. Andy, I was just a little confused about LOCKED POPULATE earlier and I thought VM_LOCKED is not necessary for MPX specific bounds tables. Now, this checking should be removed, and there should be mm_populate() for VM_LOCKED case after mmap_region(): if (!IS_ERR_VALUE(addr) (vm_flags VM_LOCKED)) mm_populate(addr, len); Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface
On 2014-06-24, Andy Lutomirski wrote: On 06/23/2014 01:06 PM, Andy Lutomirski wrote: Can the new vm_operation name be use for this? The magic always written to core dumps feature might need to be reconsidered. One thing I'd like to avoid is an MPX vma getting merged with a non-MPX vma. I don't see any code to prevent two VMAs with different vm_ops-names from getting merged. That seems like a bit of a design oversight for -name. Right? AFAIK there are no -name users that don't also set -close, for exactly that reason. I'd be okay with adding a check for -name, too. Hmm. If MPX vmas had a real struct file attached, this would all come for free. Maybe vmas with non-default vm_ops and file != NULL should never be mergeable? Thinking out loud a bit... There are also some more complicated but more performant cleanup mechanisms that I'd like to go after in the future. Given a page, we might want to figure out if it is an MPX page or not. I wonder if we'll ever collide with some other user of vm_ops-name. It looks fairly narrowly used at the moment, but would this keep us from putting these pages on, say, a tmpfs mount? Doesn't look that way at the moment. You could always check the vm_ops pointer to see if it's MPX. One feature I've wanted: a way to have special per-process vmas that can be easily found. For example, I want to be able to efficiently find out where the vdso and vvar vmas are. I don't think this is currently supported. Andy, if you add a check for -name to avoid the MPX vmas merged with non-MPX vmas, I guess the work flow should be as follow (use _install_special_mapping to get a new vma): unsigned long mpx_mmap(unsigned long len) { .. static struct vm_special_mapping mpx_mapping = { .name = [mpx], .pages = no_pages, }; ... vma = _install_special_mapping(mm, addr, len, vm_flags, mpx_mapping); .. } Then, we could check the -name to see if the VMA is MPX specific. Right? Thanks, Qiaowei
RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-06-20, H. Peter Anvin wrote: > On 06/19/2014 10:04 AM, Dave Hansen wrote: >> >> Could you please support this position with some data? I'm a bit >> skeptical that instruction decoding is going to be a >> performance-critical path. >> >> I also don't see the extra field that you talked about in the >> previous thread? What's the extra field? I see a 'limit' vs. >> 'length', but you don't use 'length' at all, so I think you can use >> it instead, or at least union it. >> >> I've taken a quick stab at trying to consolidate things. I think I >> may have screwed up this: >> >> insn->limit = MAX_MPX_INSN_SIZE - bytes; >> >> Qiaowei, is there anything fundamentally broken with what I've got here? >> > Firstly instructions will be got from user pointer stored in 'ip', and then validate_next() will use 'limit' to make sure that next sizeof(t) bytes can be on the same instruction. As hpa said, generic decoder, including struct insn and implementation of decoding, is very heavyweight because it has to. So MPX specific decoding should be better choice. > So I encouraged Qiaowei to do a limited special-purpose decoder, > simply because the glue to use the generic decoder was almost as > large. I am overall not a huge fan of using the generic decoder in > constrained situation, because the generic decoder is very heavyweight > not just in terms of performance but in terms of interface -- because it has > to. > Thanks, Qiaowei
RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-06-19, Borislav Petkov wrote: > On Thu, Jun 19, 2014 at 01:13:48AM +0000, Ren, Qiaowei wrote: >> On 2014-06-18, Borislav Petkov wrote: >>> On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: >>> >>> This whole insn decoding machinery above looks like adapted from >>> arch/x86/lib/insn.c. You should merge it with the generic code in >>> insn.c instead of homegrowing it here only for the purposes of MPX. >>> And if it doesn't work for your needs, you should should extend >>> the generic code to do so. > >> Petkov, as we discussed on initial version of this patchset, general >> insn framework didn't work out well and I have tried to use generic >> struct insn, insn_field, etc. for obvious benefits. > > Let me repeat myself: "And if it doesn't work for your needs, you > should extend the generic code to do so." > > We don't do homegrown almost-copies of generic code. > I see. If possible, I will be very happy to use or extend generic code. But due to extra overhead caused by MPX, I have to use MPX specific decoding to do performance optimization. You can check the discussion on this: https://lkml.org/lkml/2014/1/11/190 Thanks, Qiaowei N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-06-19, Borislav Petkov wrote: On Thu, Jun 19, 2014 at 01:13:48AM +, Ren, Qiaowei wrote: On 2014-06-18, Borislav Petkov wrote: On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: This whole insn decoding machinery above looks like adapted from arch/x86/lib/insn.c. You should merge it with the generic code in insn.c instead of homegrowing it here only for the purposes of MPX. And if it doesn't work for your needs, you should should extend the generic code to do so. Petkov, as we discussed on initial version of this patchset, general insn framework didn't work out well and I have tried to use generic struct insn, insn_field, etc. for obvious benefits. Let me repeat myself: And if it doesn't work for your needs, you should extend the generic code to do so. We don't do homegrown almost-copies of generic code. I see. If possible, I will be very happy to use or extend generic code. But due to extra overhead caused by MPX, I have to use MPX specific decoding to do performance optimization. You can check the discussion on this: https://lkml.org/lkml/2014/1/11/190 Thanks, Qiaowei N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-06-20, H. Peter Anvin wrote: On 06/19/2014 10:04 AM, Dave Hansen wrote: Could you please support this position with some data? I'm a bit skeptical that instruction decoding is going to be a performance-critical path. I also don't see the extra field that you talked about in the previous thread? What's the extra field? I see a 'limit' vs. 'length', but you don't use 'length' at all, so I think you can use it instead, or at least union it. I've taken a quick stab at trying to consolidate things. I think I may have screwed up this: insn-limit = MAX_MPX_INSN_SIZE - bytes; Qiaowei, is there anything fundamentally broken with what I've got here? Firstly instructions will be got from user pointer stored in 'ip', and then validate_next() will use 'limit' to make sure that next sizeof(t) bytes can be on the same instruction. As hpa said, generic decoder, including struct insn and implementation of decoding, is very heavyweight because it has to. So MPX specific decoding should be better choice. So I encouraged Qiaowei to do a limited special-purpose decoder, simply because the glue to use the generic decoder was almost as large. I am overall not a huge fan of using the generic decoder in constrained situation, because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. Thanks, Qiaowei
RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-06-18, Borislav Petkov wrote: > On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: > > This whole insn decoding machinery above looks like adapted from > arch/x86/lib/insn.c. You should merge it with the generic code in > insn.c instead of homegrowing it here only for the purposes of MPX. > And if it doesn't work for your needs, you should should extend the > generic code to do so. I think we even talked about this last time. > > Also, make sure you run all your patches through checkpatch.pl before > submitting. > Petkov, as we discussed on initial version of this patchset, general insn framework didn't work out well and I have tried to use generic struct insn, insn_field, etc. for obvious benefits. Thanks, Qiaowei
RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 2014-06-18, Borislav Petkov wrote: On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: This whole insn decoding machinery above looks like adapted from arch/x86/lib/insn.c. You should merge it with the generic code in insn.c instead of homegrowing it here only for the purposes of MPX. And if it doesn't work for your needs, you should should extend the generic code to do so. I think we even talked about this last time. Also, make sure you run all your patches through checkpatch.pl before submitting. Petkov, as we discussed on initial version of this patchset, general insn framework didn't work out well and I have tried to use generic struct insn, insn_field, etc. for obvious benefits. Thanks, Qiaowei
Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX
On 02/27/2014 05:44 AM, H. Peter Anvin wrote: On 02/26/2014 11:17 AM, Dave Hansen wrote: On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +Bounds Directory (BD) and Bounds Tables (BT) are stored in +application memory and are allocated by the application (in case +of kernel use, the structures will be in kernel memory). The +bound directory and each instance of bound table are in contiguous +linear memory. Hi Qiaowei, Does this mean that if userspace decided to map something in the way of the bounds tables that it would break MPX? Also, in the description, could we s/linear/virtual/? Linear seems to be the term that Intel likes to use in its documents, but we almost universally call them virtual addresses in the kernel. It might be useful to clarify, though, just so noone gets confused with effective addresses (offsets). Ok. Some words here come from some MPX public documents. I need to go through more carefully. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX
On 02/27/2014 05:19 AM, Dave Hansen wrote: On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +The other case that generates a #BR is when a BNDSTX instruction +attempts to save bounds to a BD entry marked as invalid. This is +an indication that no BT exists for this entry. In this case the +fault handler will allocate a new BT. Hi Qiaowei, Can you make an effort to move these comments over to the code? The bounds-table allocation function is a much more appropriate place for this text than the Documentation/ file. Ok. They will be moved. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX
On 02/27/2014 05:19 AM, Dave Hansen wrote: On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +The other case that generates a #BR is when a BNDSTX instruction +attempts to save bounds to a BD entry marked as invalid. This is +an indication that no BT exists for this entry. In this case the +fault handler will allocate a new BT. Hi Qiaowei, Can you make an effort to move these comments over to the code? The bounds-table allocation function is a much more appropriate place for this text than the Documentation/ file. Ok. They will be moved. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX
On 02/27/2014 05:44 AM, H. Peter Anvin wrote: On 02/26/2014 11:17 AM, Dave Hansen wrote: On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +Bounds Directory (BD) and Bounds Tables (BT) are stored in +application memory and are allocated by the application (in case +of kernel use, the structures will be in kernel memory). The +bound directory and each instance of bound table are in contiguous +linear memory. Hi Qiaowei, Does this mean that if userspace decided to map something in the way of the bounds tables that it would break MPX? Also, in the description, could we s/linear/virtual/? Linear seems to be the term that Intel likes to use in its documents, but we almost universally call them virtual addresses in the kernel. It might be useful to clarify, though, just so noone gets confused with effective addresses (offsets). Ok. Some words here come from some MPX public documents. I need to go through more carefully. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables
> -Original Message- > From: H. Peter Anvin [mailto:h...@zytor.com] > Sent: Tuesday, February 25, 2014 1:52 AM > To: Hansen, Dave; Ren, Qiaowei; Thomas Gleixner; Ingo Molnar > Cc: x...@kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate > bound tables > > On 02/24/2014 09:27 AM, Dave Hansen wrote: > > > > Can you talk a little bit about what the design is here? Why does the > > kernel have to do the allocation of the virtual address space? Does > > it really need to MAP_POPULATE? bt_size looks like 4MB, and that's an > > awful lot of memory to eat up at once. Shouldn't we just let the > > kernel demand-fault this like everything else? > > > > MAP_POPULATE definitely seems like the wrong thing. > Oh. This option should be removed. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables
-Original Message- From: H. Peter Anvin [mailto:h...@zytor.com] Sent: Tuesday, February 25, 2014 1:52 AM To: Hansen, Dave; Ren, Qiaowei; Thomas Gleixner; Ingo Molnar Cc: x...@kernel.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables On 02/24/2014 09:27 AM, Dave Hansen wrote: Can you talk a little bit about what the design is here? Why does the kernel have to do the allocation of the virtual address space? Does it really need to MAP_POPULATE? bt_size looks like 4MB, and that's an awful lot of memory to eat up at once. Shouldn't we just let the kernel demand-fault this like everything else? MAP_POPULATE definitely seems like the wrong thing. Oh. This option should be removed. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/3] x86, mpx: hook #BR exception handler to allocate bound tables
On 02/13/2014 04:19 AM, Andy Lutomirski wrote: On 02/12/2014 10:36 AM, Qiaowei Ren wrote: An access to an invalid bound directory entry will cause a #BR exception. This patch hook #BR exception handler to allocate one bound table and bind it with that buond directory entry. This will avoid the need of forwarding the #BR exception to the user space when bound directory has invalid entry. Signed-off-by: Qiaowei Ren --- +void do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + unsigned long bd_size = 1UL << (MPX_L1_BITS+MPX_L1_SHIFT); + + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; + status = xsave_buf->bndcsr.status_reg; + + bd_entry = status & MPX_BNDSTA_ADDR_MASK; + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size)) + allocate_bt(bd_entry); +} This still just loops on failure, right? Seems like that SIGBUS should be raised if the allocation fail. if (!do_mpx_bt_fault(xsave_buf)) force_sig(SIGBUS, tsk); Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/3] x86, mpx: hook #BR exception handler to allocate bound tables
On 02/13/2014 04:19 AM, Andy Lutomirski wrote: On 02/12/2014 10:36 AM, Qiaowei Ren wrote: An access to an invalid bound directory entry will cause a #BR exception. This patch hook #BR exception handler to allocate one bound table and bind it with that buond directory entry. This will avoid the need of forwarding the #BR exception to the user space when bound directory has invalid entry. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- +void do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + unsigned long bd_size = 1UL (MPX_L1_BITS+MPX_L1_SHIFT); + + bd_base = xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ADDR_MASK; + status = xsave_buf-bndcsr.status_reg; + + bd_entry = status MPX_BNDSTA_ADDR_MASK; + if ((bd_entry = bd_base) (bd_entry bd_base + bd_size)) + allocate_bt(bd_entry); +} This still just loops on failure, right? Seems like that SIGBUS should be raised if the allocation fail. if (!do_mpx_bt_fault(xsave_buf)) force_sig(SIGBUS, tsk); Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/4] Intel MPX support
On 01/28/2014 02:42 PM, Ingo Molnar wrote: * Ren Qiaowei wrote: MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. AFAICS the kernel side implementation causes no runtime overhead for non-MPX workloads, and also causes no runtime overhead for non-MPX hardware, right? Yes. Actually, I think that's not entirely true. For example if within the same mm there's a lot of non-MPX threads and an MPX thread, then the MMU notifier will be called for MMU operations of every non-MPX thread as well! So MPX state of a thread will slow down all the other non-MPX threads as well. The statement is only true for non-MPX tasks that have their separate mm's that does not have a single MPX thread. Yes. Though all non-MPX threads are slowed down, the whole process benefit from MPX. Anyway, HPA suggest these syscalls, which use MMU notifier, should be not needed, we can do what they do in userspace runtime. What do you think about it? I guess that I should remove the third patch which adds new prctl() syscalls in next version of this patchset. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables
On 01/28/2014 02:42 PM, Andy Lutomirski wrote: I just read it. do_trap_no_signal presumably calls fixup_exception because #UD uses it and #UD needs that handling. (I'm guessing that there is actually a legitimate use for a kernel fixup on #UD somewhere -- there's probably something that isn't covered by cpuid.) There should not be a #BR from the kernel using the fixup mechanism. IMO if the exception comes from the kernel, it should unconditionally call die. Oh. I agree with you, and if a #BR from the kernel it should unconditionally call die. if (!user_mode(regs)) die("bounds", regs, error_code); Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables
On 01/28/2014 01:21 PM, Andy Lutomirski wrote: On Mon, Jan 27, 2014 at 7:35 PM, Ren Qiaowei wrote: On 01/28/2014 04:36 AM, Andy Lutomirski wrote: + bd_entry = status & MPX_BNDSTA_ADDR_MASK; + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size)) + allocate_bt(bd_entry); What happens if this fails? Retrying forever isn't very nice. If allocation of the bound table fail, the related entry in the bound directory is still invalid. The following access to this entry still produce #BR fault. By the "following access" I think you mean the same instruction that just trapped -- it will trap again because the exception hasn't been fixed up. Then mmap will fail again, and you'll retry again, leading to an infinite loop. I don't mean the same instruction that just trapped. I think that failure to fix up the exception should either let the normal bounds error through or should raise SIGBUS. Maybe we need HPA help answer this question. Peter, what do you think about it? If allocation of the bound table fail, what should we do? + if (!user_mode(regs)) { + if (!fixup_exception(regs)) { + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_BR; + die("bounds", regs, error_code); + } Why the fixup? Unless I'm missing something, the kernel has no business getting #BR on access to a user address. Or are you adding code to allow the kernel to use MPX itself? If so, shouldn't this use an MPX-specific fixup to allow normal C code to use this stuff? It checks whether #BR come from user-space. You can see do_trap_no_signal(). Wasn't #BR using do_trap before? do_trap doesn't call fixup_exception. I don't see why it should do it now. (I also don't think it should come from kernel space until someone adds kernel-mode MPX support.) do_trap() -> do_trap_no_signal() call similar code to check if the fault occurred in userspace or kernel space. You can see previous discussion for the first version of this patchset. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/4] x86, mpx: add documentation on Intel MPX
On 01/28/2014 04:27 AM, Andy Lutomirski wrote: On 01/26/2014 01:08 AM, Qiaowei Ren wrote: +1) Providing handlers for bounds faults (#BR). + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. If a bounds overflow occurs then a #BR is generated. +The fault handler will decode MPX instructions to get violation +address and set this address into extended struct siginfo. Can you document exactly where the insn address and pointer value end up? (If it's in the (IMO hideous) cr2 field in ucontext, this needs to be documented. If it's somewhere useful in siginfo, that should also be documented to save people the time it takes to figure it out.) Ok. I will describe extended siginfo at this documentation. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables
On 01/28/2014 04:36 AM, Andy Lutomirski wrote: + bd_entry = status & MPX_BNDSTA_ADDR_MASK; + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size)) + allocate_bt(bd_entry); What happens if this fails? Retrying forever isn't very nice. If allocation of the bound table fail, the related entry in the bound directory is still invalid. The following access to this entry still produce #BR fault. + if (!user_mode(regs)) { + if (!fixup_exception(regs)) { + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_BR; + die("bounds", regs, error_code); + } Why the fixup? Unless I'm missing something, the kernel has no business getting #BR on access to a user address. Or are you adding code to allow the kernel to use MPX itself? If so, shouldn't this use an MPX-specific fixup to allow normal C code to use this stuff? It checks whether #BR come from user-space. You can see do_trap_no_signal(). + goto exit; + } + + if (!boot_cpu_has(X86_FEATURE_MPX)) { + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); + goto exit; This, as well as the status == 0 case, should probably document that the exception is from BOUND, not MPX. Ok. I will add one comment for this. + break; + + case 1: /* Bound violation. */ + case 0: /* No MPX exception. */ + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); + break; What does "No Intel MPX exception" mean? Surely that has business sending #BR. Oh. It comes from spec, and just mean it is not from MPX. :) I will change it to be accurate. + + default: + break; What does status 3 mean? The docs say "reserved". Presumably this should log and kill the process. I guess it should be a good suggestion. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information
On 01/28/2014 05:58 AM, Andy Lutomirski wrote: On 01/26/2014 01:08 AM, Qiaowei Ren wrote: (Why on earth does Intel not expose this stuff in cr2 or an MSR or something?) I guess it is due to some design reason. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information
On 01/28/2014 05:58 AM, Andy Lutomirski wrote: On 01/26/2014 01:08 AM, Qiaowei Ren wrote: (Why on earth does Intel not expose this stuff in cr2 or an MSR or something?) I guess it is due to some design reason. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables
On 01/28/2014 04:36 AM, Andy Lutomirski wrote: + bd_entry = status MPX_BNDSTA_ADDR_MASK; + if ((bd_entry = bd_base) (bd_entry bd_base + bd_size)) + allocate_bt(bd_entry); What happens if this fails? Retrying forever isn't very nice. If allocation of the bound table fail, the related entry in the bound directory is still invalid. The following access to this entry still produce #BR fault. + if (!user_mode(regs)) { + if (!fixup_exception(regs)) { + tsk-thread.error_code = error_code; + tsk-thread.trap_nr = X86_TRAP_BR; + die(bounds, regs, error_code); + } Why the fixup? Unless I'm missing something, the kernel has no business getting #BR on access to a user address. Or are you adding code to allow the kernel to use MPX itself? If so, shouldn't this use an MPX-specific fixup to allow normal C code to use this stuff? It checks whether #BR come from user-space. You can see do_trap_no_signal(). + goto exit; + } + + if (!boot_cpu_has(X86_FEATURE_MPX)) { + do_trap(X86_TRAP_BR, SIGSEGV, bounds, regs, error_code, NULL); + goto exit; This, as well as the status == 0 case, should probably document that the exception is from BOUND, not MPX. Ok. I will add one comment for this. + break; + + case 1: /* Bound violation. */ + case 0: /* No MPX exception. */ + do_trap(X86_TRAP_BR, SIGSEGV, bounds, regs, error_code, NULL); + break; What does No Intel MPX exception mean? Surely that has business sending #BR. Oh. It comes from spec, and just mean it is not from MPX. :) I will change it to be accurate. + + default: + break; What does status 3 mean? The docs say reserved. Presumably this should log and kill the process. I guess it should be a good suggestion. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/4] x86, mpx: add documentation on Intel MPX
On 01/28/2014 04:27 AM, Andy Lutomirski wrote: On 01/26/2014 01:08 AM, Qiaowei Ren wrote: +1) Providing handlers for bounds faults (#BR). + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. If a bounds overflow occurs then a #BR is generated. +The fault handler will decode MPX instructions to get violation +address and set this address into extended struct siginfo. Can you document exactly where the insn address and pointer value end up? (If it's in the (IMO hideous) cr2 field in ucontext, this needs to be documented. If it's somewhere useful in siginfo, that should also be documented to save people the time it takes to figure it out.) Ok. I will describe extended siginfo at this documentation. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables
On 01/28/2014 01:21 PM, Andy Lutomirski wrote: On Mon, Jan 27, 2014 at 7:35 PM, Ren Qiaowei qiaowei@intel.com wrote: On 01/28/2014 04:36 AM, Andy Lutomirski wrote: + bd_entry = status MPX_BNDSTA_ADDR_MASK; + if ((bd_entry = bd_base) (bd_entry bd_base + bd_size)) + allocate_bt(bd_entry); What happens if this fails? Retrying forever isn't very nice. If allocation of the bound table fail, the related entry in the bound directory is still invalid. The following access to this entry still produce #BR fault. By the following access I think you mean the same instruction that just trapped -- it will trap again because the exception hasn't been fixed up. Then mmap will fail again, and you'll retry again, leading to an infinite loop. I don't mean the same instruction that just trapped. I think that failure to fix up the exception should either let the normal bounds error through or should raise SIGBUS. Maybe we need HPA help answer this question. Peter, what do you think about it? If allocation of the bound table fail, what should we do? + if (!user_mode(regs)) { + if (!fixup_exception(regs)) { + tsk-thread.error_code = error_code; + tsk-thread.trap_nr = X86_TRAP_BR; + die(bounds, regs, error_code); + } Why the fixup? Unless I'm missing something, the kernel has no business getting #BR on access to a user address. Or are you adding code to allow the kernel to use MPX itself? If so, shouldn't this use an MPX-specific fixup to allow normal C code to use this stuff? It checks whether #BR come from user-space. You can see do_trap_no_signal(). Wasn't #BR using do_trap before? do_trap doesn't call fixup_exception. I don't see why it should do it now. (I also don't think it should come from kernel space until someone adds kernel-mode MPX support.) do_trap() - do_trap_no_signal() call similar code to check if the fault occurred in userspace or kernel space. You can see previous discussion for the first version of this patchset. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables
On 01/28/2014 02:42 PM, Andy Lutomirski wrote: I just read it. do_trap_no_signal presumably calls fixup_exception because #UD uses it and #UD needs that handling. (I'm guessing that there is actually a legitimate use for a kernel fixup on #UD somewhere -- there's probably something that isn't covered by cpuid.) There should not be a #BR from the kernel using the fixup mechanism. IMO if the exception comes from the kernel, it should unconditionally call die. Oh. I agree with you, and if a #BR from the kernel it should unconditionally call die. if (!user_mode(regs)) die(bounds, regs, error_code); Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/4] Intel MPX support
On 01/28/2014 02:42 PM, Ingo Molnar wrote: * Ren Qiaowei qiaowei@intel.com wrote: MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. AFAICS the kernel side implementation causes no runtime overhead for non-MPX workloads, and also causes no runtime overhead for non-MPX hardware, right? Yes. Actually, I think that's not entirely true. For example if within the same mm there's a lot of non-MPX threads and an MPX thread, then the MMU notifier will be called for MMU operations of every non-MPX thread as well! So MPX state of a thread will slow down all the other non-MPX threads as well. The statement is only true for non-MPX tasks that have their separate mm's that does not have a single MPX thread. Yes. Though all non-MPX threads are slowed down, the whole process benefit from MPX. Anyway, HPA suggest these syscalls, which use MMU notifier, should be not needed, we can do what they do in userspace runtime. What do you think about it? I guess that I should remove the third patch which adds new prctl() syscalls in next version of this patchset. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/27/2014 10:10 AM, H. Peter Anvin wrote: On 01/26/2014 05:55 PM, Ren Qiaowei wrote: Peter, you mean we should remove these two call and do what they do in user-space, right? Unless we think there is a benefit to the kernel to have a on/off switch for the #BR exception (if disabled, all #BR exceptions are signals, regardless of source.) There might be, would like other people's opinion. These two syscalls are only used to release automaticlly the bound tables, and they will not set BNDSTATUS and BNDCFGU. I want to remove them also. :) But if so, we have to release the bound tables in user-space. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/26/2014 11:14 PM, Ingo Molnar wrote: * Ren, Qiaowei wrote: The size of one bound table is 4M bytes for 64bit, and 16K bytes for 32bit. It can not be accessed by user-space, and it will be accessed automatically by hardware. So, here's the bound-table allocation AFAICS: +static bool allocate_bt(unsigned long bd_entry) +{ + unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT); + unsigned long bt_addr, old_val = 0; + + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0); What ensures that user-space cannot access (and in particular, modify) the pages at bt_addr? It's a read-write anonymous mapping AFAICS. Looks like that we can not be able to ensure this. I just mean that user-space doesn't know the bound tables, and it should not access them also. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information
On 01/27/2014 09:53 AM, H. Peter Anvin wrote: On 01/26/2014 05:34 PM, Ren Qiaowei wrote: I agree with you and we should suppress all the warnings as possible. If I use (unsgined long) to cast like the following code, what do you think about it? sizeof(long) will be 4 for 32-bit. info->si_lower = (void __user *)(unsigned long) (xsave_buf->bndregs.bndregs[2*bndregno]); info->si_upper = (void __user *)(unsigned long) (~xsave_buf->bndregs.bndregs[2*bndregno+1]); That is the way it is usually done, yes. Add a comment saying something like: /* Note: the upper 32 bits are ignored in 32-bit mode. */ It is worth watching out a bit here, though, because you could be running a 32-bit process on top of a 64-bit kernel. Ok. I will update it in next version. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/27/2014 09:50 AM, H. Peter Anvin wrote: On 01/26/2014 12:39 AM, Ingo Molnar wrote: It will be only once per startup. In that case it would be more efficient to make this part of the binary execution environment so that exec() sets it up automatically, not a separate prctl() syscall. This is not necessarily possible, and in particular it might need to be deferred until the MPX runtime has initialized. What isn't clear to me is if these syscalls are needed at all, or if it would be better to just let the MPX runtile set BNDSTATUS and BNDCFGU directly in userspace. The kernel cannot rely on them staying consistent across userspace anyway. Now, it might be beneficial for the kernel to have them anyway. It's a bit of a tough call. -hpa Peter, you mean we should remove these two call and do what they do in user-space, right? Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information
On 01/27/2014 05:38 AM, David Rientjes wrote: On Sun, 26 Jan 2014, Ren Qiaowei wrote: arch/x86/kernel/mpx.c: In function ‘do_mpx_bounds’: arch/x86/kernel/mpx.c:407:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] arch/x86/kernel/mpx.c:409:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] and the documentation says you explicitly want to support this config. These types of warnings are usually indicative of real problems when you're storing upper and lower bits in 32-bit fields after casting them from 64-bit values. I'm also not sure if the added fields to the generic struct siginfo can be justified for this. According to MPX spec, for 32-bit case, the upper 32-bits of 64-bits bound register are ignored, and so casting to pointer from 64-bit values should be not produce any problems. Ok, so this is intended per the spec which nobody reading the code is going to know and people who report the compile warnings are going to continue to question it. How are you planning on suppressing the warnings? It will probably require either - separate 64-bit and 32-bit helper functions to do_mpx_bounds() to do appropriate casts before casting to a pointer, or - a macro defined as a no-op for 64-bit and as a cast to 32-bit value for 32-bit configs that will be used in do_mpx_bounds() and casted to the pointer. I agree with you and we should suppress all the warnings as possible. If I use (unsgined long) to cast like the following code, what do you think about it? sizeof(long) will be 4 for 32-bit. info->si_lower = (void __user *)(unsigned long) (xsave_buf->bndregs.bndregs[2*bndregno]); info->si_upper = (void __user *)(unsigned long) (~xsave_buf->bndregs.bndregs[2*bndregno+1]); Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
> -Original Message- > From: Ingo Molnar [mailto:mingo.kernel@gmail.com] On Behalf Of Ingo > Molnar > Sent: Sunday, January 26, 2014 5:08 PM > To: Ren, Qiaowei > Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; x...@kernel.org; > linux-kernel@vger.kernel.org; Peter Zijlstra; Linus Torvalds; Andrew Morton > Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, > PR_MPX_RELEASE > > > * Qiaowei Ren wrote: > > > @@ -7,6 +9,88 @@ > > #include > > #include > > > > +static struct mmu_notifier mpx_mn; > > > +static struct mmu_notifier_ops mpx_mmuops = { > > + .invalidate_range_end = mpx_invl_range_end, }; > > + > > +int mpx_init(struct task_struct *tsk) { > > + if (!boot_cpu_has(X86_FEATURE_MPX)) > > + return -EINVAL; > > + > > + /* register mmu_notifier to delallocate the bound tables */ > > + mpx_mn.ops = _mmuops; > > + mmu_notifier_register(_mn, current->mm); > > 0) > > I do think MPX should be supported by Linux, but this is the best thing I can > say > about the code at the moment: > > 1) > > The above MMU notifier bit is absolutely disgusting: it adds an > O(nr_mpx_tasks) overhead to every MMU operation! > > MPX needs to be called from architecture methods. (And the MMU notifier > should probably be removed, it literally invites such abuse.) > > 2) > > The whole MPX_INIT() macro wrappery is ugly beyond relief. > > 3) > > The kernel/sys.c bits are x86-only, it probably won't even build on other > architectures. > > 4) > > I also argue that MPX setup should be automatic through the ELF > loader: > > - MPX setup could also be initiated through the ELF notes section or >so - similar to the executable bit stack attribute in ELF binaries. >(See PT_GNU_STACK handling in fs/binfmt_elf.c.) > > - What is the typical life time of the bounds table? Does user-space >get access to it? I don't see where it's discoverable to >user-space. (for example for debuggers) > > 5) > > Teardown can be done through regular munmap() if the notifier is eliminated, > so that step falls away as well. > > 6) > > How many entries are in the bounds table? Because mpx_invl_range_end() > looks utterly unscalable if it has any size beyond 1-2 entries. > The size of one bound table is 4M bytes for 64bit, and 16K bytes for 32bit. It can not be accessed by user-space, and it will be accessed automatically by hardware. When #BR faults come, and the entry of the bound directory is invalid, one bound table have to be allocated to save value of the bounds. It is what the second patch does. As for the lifetime of the bound tables, we can see the following case: User application use malloc or mmap to dynamically allocate memory. when address in the memory region is first accessed, related entry in the bound directory will be checked and it should be invalid. Then one new bound table will be allocated. After a time, this memory area is released, and so the bound tables related with this memory area become meaningless and may be released. If we don't release these unnecessary bound tables, it will save the workload of allocation of new bound tables when the memory area related with the entry of the bound directory is allocated and accessed again. But in this way the loss of virtual space will have to be worried. Anyway, what this patch does is only that when a piece of address space is unmapped, we destroy/unmap the bounds tables that correspond to that same address space. MMU notifier will add some overhead for those tasks which use MPX. But I have no better way to know when one address space is unmapped, and then destroy the related bounds tables. Do you have any suggestion? Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Ingo Molnar > Sent: Sunday, January 26, 2014 4:40 PM > To: Ren, Qiaowei > Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; x...@kernel.org; > linux-kernel@vger.kernel.org; Peter Zijlstra > Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, > PR_MPX_RELEASE > > > * Ren Qiaowei wrote: > > > On 01/26/2014 04:22 PM, Ingo Molnar wrote: > > > > > >* Qiaowei Ren wrote: > > > > > >>This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl() commands > > >>on the x86 platform. These commands can be used to init and release > > >>MPX related resource. > > >> > > >>A MMU notifier will be registered during PR_MPX_INIT command > > >>execution. So the bound tables can be automatically deallocated when > > >>one memory area is unmapped. > > >> > > >>Signed-off-by: Qiaowei Ren > > >>--- > > >> arch/x86/Kconfig |4 ++ > > >> arch/x86/include/asm/mpx.h |9 > > >> arch/x86/include/asm/processor.h | 16 +++ > > >> arch/x86/kernel/mpx.c| 84 > ++ > > >> include/uapi/linux/prctl.h |6 +++ > > >> kernel/sys.c | 12 + > > >> 6 files changed, 131 insertions(+), 0 deletions(-) > > > > > > Hm. What is the expected typical frequency of these syscalls for a > > > larger application like a web browser? Only once per startup > > > typically, or will they be called more frequently? > > > > It will be only once per startup. > > In that case it would be more efficient to make this part of the binary > execution > environment so that exec() sets it up automatically, not a separate prctl() > syscall. > Sorry, I guess what I said is not accurate. Normally it will be only once per startup. But user application maybe only partly want to use MPX, e.g. only one thread of process. For those cases, user application need to enable MPX for specific part of the code. So it is not enough to set it up automatically. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/26/2014 04:22 PM, Ingo Molnar wrote: * Qiaowei Ren wrote: This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl() commands on the x86 platform. These commands can be used to init and release MPX related resource. A MMU notifier will be registered during PR_MPX_INIT command execution. So the bound tables can be automatically deallocated when one memory area is unmapped. Signed-off-by: Qiaowei Ren --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h |9 arch/x86/include/asm/processor.h | 16 +++ arch/x86/kernel/mpx.c| 84 ++ include/uapi/linux/prctl.h |6 +++ kernel/sys.c | 12 + 6 files changed, 131 insertions(+), 0 deletions(-) Hm. What is the expected typical frequency of these syscalls for a larger application like a web browser? Only once per startup typically, or will they be called more frequently? It will be only once per startup. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/4] Intel MPX support
On 01/26/2014 04:19 PM, Ingo Molnar wrote: * Qiaowei Ren wrote: This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in "Intel(R) Architecture Instruction Set Extensions Programming Reference". To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. AFAICS the kernel side implementation causes no runtime overhead for non-MPX workloads, and also causes no runtime overhead for non-MPX hardware, right? Yes. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Qiaowei Ren (4): x86, mpx: add documentation on Intel MPX x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE x86, mpx: extend siginfo structure to include bound violation information Documentation/x86/intel_mpx.txt| 226 arch/x86/Kconfig |4 + arch/x86/include/asm/mpx.h | 63 ++ arch/x86/include/asm/processor.h | 16 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 415 arch/x86/kernel/traps.c| 61 +- include/uapi/asm-generic/siginfo.h |9 +- include/uapi/linux/prctl.h |6 + kernel/signal.c|4 + kernel/sys.c | 12 + 11 files changed, 815 insertions(+), 2 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/kernel/mpx.c Ok, this summary was pretty good - it addresses my prior objections wrt. submission quality. Once the details are fleshed out this sure looks like a useful feature. Thanks. I apologize for previous submission. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/4] Intel MPX support
On 01/26/2014 04:19 PM, Ingo Molnar wrote: * Qiaowei Ren qiaowei@intel.com wrote: This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in Intel(R) Architecture Instruction Set Extensions Programming Reference. To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. AFAICS the kernel side implementation causes no runtime overhead for non-MPX workloads, and also causes no runtime overhead for non-MPX hardware, right? Yes. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Qiaowei Ren (4): x86, mpx: add documentation on Intel MPX x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE x86, mpx: extend siginfo structure to include bound violation information Documentation/x86/intel_mpx.txt| 226 arch/x86/Kconfig |4 + arch/x86/include/asm/mpx.h | 63 ++ arch/x86/include/asm/processor.h | 16 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 415 arch/x86/kernel/traps.c| 61 +- include/uapi/asm-generic/siginfo.h |9 +- include/uapi/linux/prctl.h |6 + kernel/signal.c|4 + kernel/sys.c | 12 + 11 files changed, 815 insertions(+), 2 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/kernel/mpx.c Ok, this summary was pretty good - it addresses my prior objections wrt. submission quality. Once the details are fleshed out this sure looks like a useful feature. Thanks. I apologize for previous submission. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/26/2014 04:22 PM, Ingo Molnar wrote: * Qiaowei Ren qiaowei@intel.com wrote: This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl() commands on the x86 platform. These commands can be used to init and release MPX related resource. A MMU notifier will be registered during PR_MPX_INIT command execution. So the bound tables can be automatically deallocated when one memory area is unmapped. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h |9 arch/x86/include/asm/processor.h | 16 +++ arch/x86/kernel/mpx.c| 84 ++ include/uapi/linux/prctl.h |6 +++ kernel/sys.c | 12 + 6 files changed, 131 insertions(+), 0 deletions(-) Hm. What is the expected typical frequency of these syscalls for a larger application like a web browser? Only once per startup typically, or will they be called more frequently? It will be only once per startup. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
-Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Ingo Molnar Sent: Sunday, January 26, 2014 4:40 PM To: Ren, Qiaowei Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; x...@kernel.org; linux-kernel@vger.kernel.org; Peter Zijlstra Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE * Ren Qiaowei qiaowei@intel.com wrote: On 01/26/2014 04:22 PM, Ingo Molnar wrote: * Qiaowei Ren qiaowei@intel.com wrote: This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl() commands on the x86 platform. These commands can be used to init and release MPX related resource. A MMU notifier will be registered during PR_MPX_INIT command execution. So the bound tables can be automatically deallocated when one memory area is unmapped. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h |9 arch/x86/include/asm/processor.h | 16 +++ arch/x86/kernel/mpx.c| 84 ++ include/uapi/linux/prctl.h |6 +++ kernel/sys.c | 12 + 6 files changed, 131 insertions(+), 0 deletions(-) Hm. What is the expected typical frequency of these syscalls for a larger application like a web browser? Only once per startup typically, or will they be called more frequently? It will be only once per startup. In that case it would be more efficient to make this part of the binary execution environment so that exec() sets it up automatically, not a separate prctl() syscall. Sorry, I guess what I said is not accurate. Normally it will be only once per startup. But user application maybe only partly want to use MPX, e.g. only one thread of process. For those cases, user application need to enable MPX for specific part of the code. So it is not enough to set it up automatically. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
-Original Message- From: Ingo Molnar [mailto:mingo.kernel@gmail.com] On Behalf Of Ingo Molnar Sent: Sunday, January 26, 2014 5:08 PM To: Ren, Qiaowei Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; x...@kernel.org; linux-kernel@vger.kernel.org; Peter Zijlstra; Linus Torvalds; Andrew Morton Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE * Qiaowei Ren qiaowei@intel.com wrote: @@ -7,6 +9,88 @@ #include asm/fpu-internal.h #include asm/alternative.h +static struct mmu_notifier mpx_mn; +static struct mmu_notifier_ops mpx_mmuops = { + .invalidate_range_end = mpx_invl_range_end, }; + +int mpx_init(struct task_struct *tsk) { + if (!boot_cpu_has(X86_FEATURE_MPX)) + return -EINVAL; + + /* register mmu_notifier to delallocate the bound tables */ + mpx_mn.ops = mpx_mmuops; + mmu_notifier_register(mpx_mn, current-mm); 0) I do think MPX should be supported by Linux, but this is the best thing I can say about the code at the moment: 1) The above MMU notifier bit is absolutely disgusting: it adds an O(nr_mpx_tasks) overhead to every MMU operation! MPX needs to be called from architecture methods. (And the MMU notifier should probably be removed, it literally invites such abuse.) 2) The whole MPX_INIT() macro wrappery is ugly beyond relief. 3) The kernel/sys.c bits are x86-only, it probably won't even build on other architectures. 4) I also argue that MPX setup should be automatic through the ELF loader: - MPX setup could also be initiated through the ELF notes section or so - similar to the executable bit stack attribute in ELF binaries. (See PT_GNU_STACK handling in fs/binfmt_elf.c.) - What is the typical life time of the bounds table? Does user-space get access to it? I don't see where it's discoverable to user-space. (for example for debuggers) 5) Teardown can be done through regular munmap() if the notifier is eliminated, so that step falls away as well. 6) How many entries are in the bounds table? Because mpx_invl_range_end() looks utterly unscalable if it has any size beyond 1-2 entries. The size of one bound table is 4M bytes for 64bit, and 16K bytes for 32bit. It can not be accessed by user-space, and it will be accessed automatically by hardware. When #BR faults come, and the entry of the bound directory is invalid, one bound table have to be allocated to save value of the bounds. It is what the second patch does. As for the lifetime of the bound tables, we can see the following case: User application use malloc or mmap to dynamically allocate memory. when address in the memory region is first accessed, related entry in the bound directory will be checked and it should be invalid. Then one new bound table will be allocated. After a time, this memory area is released, and so the bound tables related with this memory area become meaningless and may be released. If we don't release these unnecessary bound tables, it will save the workload of allocation of new bound tables when the memory area related with the entry of the bound directory is allocated and accessed again. But in this way the loss of virtual space will have to be worried. Anyway, what this patch does is only that when a piece of address space is unmapped, we destroy/unmap the bounds tables that correspond to that same address space. MMU notifier will add some overhead for those tasks which use MPX. But I have no better way to know when one address space is unmapped, and then destroy the related bounds tables. Do you have any suggestion? Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information
On 01/27/2014 05:38 AM, David Rientjes wrote: On Sun, 26 Jan 2014, Ren Qiaowei wrote: arch/x86/kernel/mpx.c: In function ‘do_mpx_bounds’: arch/x86/kernel/mpx.c:407:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] arch/x86/kernel/mpx.c:409:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] and the documentation says you explicitly want to support this config. These types of warnings are usually indicative of real problems when you're storing upper and lower bits in 32-bit fields after casting them from 64-bit values. I'm also not sure if the added fields to the generic struct siginfo can be justified for this. According to MPX spec, for 32-bit case, the upper 32-bits of 64-bits bound register are ignored, and so casting to pointer from 64-bit values should be not produce any problems. Ok, so this is intended per the spec which nobody reading the code is going to know and people who report the compile warnings are going to continue to question it. How are you planning on suppressing the warnings? It will probably require either - separate 64-bit and 32-bit helper functions to do_mpx_bounds() to do appropriate casts before casting to a pointer, or - a macro defined as a no-op for 64-bit and as a cast to 32-bit value for 32-bit configs that will be used in do_mpx_bounds() and casted to the pointer. I agree with you and we should suppress all the warnings as possible. If I use (unsgined long) to cast like the following code, what do you think about it? sizeof(long) will be 4 for 32-bit. info-si_lower = (void __user *)(unsigned long) (xsave_buf-bndregs.bndregs[2*bndregno]); info-si_upper = (void __user *)(unsigned long) (~xsave_buf-bndregs.bndregs[2*bndregno+1]); Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/27/2014 09:50 AM, H. Peter Anvin wrote: On 01/26/2014 12:39 AM, Ingo Molnar wrote: It will be only once per startup. In that case it would be more efficient to make this part of the binary execution environment so that exec() sets it up automatically, not a separate prctl() syscall. This is not necessarily possible, and in particular it might need to be deferred until the MPX runtime has initialized. What isn't clear to me is if these syscalls are needed at all, or if it would be better to just let the MPX runtile set BNDSTATUS and BNDCFGU directly in userspace. The kernel cannot rely on them staying consistent across userspace anyway. Now, it might be beneficial for the kernel to have them anyway. It's a bit of a tough call. -hpa Peter, you mean we should remove these two call and do what they do in user-space, right? Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information
On 01/27/2014 09:53 AM, H. Peter Anvin wrote: On 01/26/2014 05:34 PM, Ren Qiaowei wrote: I agree with you and we should suppress all the warnings as possible. If I use (unsgined long) to cast like the following code, what do you think about it? sizeof(long) will be 4 for 32-bit. info-si_lower = (void __user *)(unsigned long) (xsave_buf-bndregs.bndregs[2*bndregno]); info-si_upper = (void __user *)(unsigned long) (~xsave_buf-bndregs.bndregs[2*bndregno+1]); That is the way it is usually done, yes. Add a comment saying something like: /* Note: the upper 32 bits are ignored in 32-bit mode. */ It is worth watching out a bit here, though, because you could be running a 32-bit process on top of a 64-bit kernel. Ok. I will update it in next version. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/26/2014 11:14 PM, Ingo Molnar wrote: * Ren, Qiaowei qiaowei@intel.com wrote: The size of one bound table is 4M bytes for 64bit, and 16K bytes for 32bit. It can not be accessed by user-space, and it will be accessed automatically by hardware. So, here's the bound-table allocation AFAICS: +static bool allocate_bt(unsigned long bd_entry) +{ + unsigned long bt_size = 1UL (MPX_L2_BITS+MPX_L2_SHIFT); + unsigned long bt_addr, old_val = 0; + + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0); What ensures that user-space cannot access (and in particular, modify) the pages at bt_addr? It's a read-write anonymous mapping AFAICS. Looks like that we can not be able to ensure this. I just mean that user-space doesn't know the bound tables, and it should not access them also. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
On 01/27/2014 10:10 AM, H. Peter Anvin wrote: On 01/26/2014 05:55 PM, Ren Qiaowei wrote: Peter, you mean we should remove these two call and do what they do in user-space, right? Unless we think there is a benefit to the kernel to have a on/off switch for the #BR exception (if disabled, all #BR exceptions are signals, regardless of source.) There might be, would like other people's opinion. These two syscalls are only used to release automaticlly the bound tables, and they will not set BNDSTATUS and BNDCFGU. I want to remove them also. :) But if so, we have to release the bound tables in user-space. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/