Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/18/2014 12:17 AM, Kevin Easton wrote: > I was assuming that if an application did want to enable MPX after threads > had already been created, it would generally want to enable it > simultaneously across all threads. This would be a lot easier for the > kernel than for the application. The current gcc setup mechanism would set up MPX even before main(). So I think it's pretty unlikely that help is needed to coordinate threads. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Wed, Sep 17, 2014 at 09:43:09PM -0700, Dave Hansen wrote: > On 09/17/2014 08:23 PM, Kevin Easton wrote: > > I was actually thinking that the kernel would take care of the xsave / > > xrstor (for current), updating tsk->thread.fpu.state (for non-running > > threads) and sending an IPI for threads running on other CPUs. > > > > Of course userspace can always then manually change the bounds directory > > address itself, but then it's quite clear that they're doing something > > unsupported. Just an idea, anyway. > > What's the benefit of that? > > As it stands now, MPX is likely to be enabled well before any threads > are created, and the MPX enabling state will be inherited by the new > thread at clone() time. The current mechanism allows a thread to > individually enable or disable MPX independently of the other threads. > > I think it makes it both more complicated and less flexible. I was assuming that if an application did want to enable MPX after threads had already been created, it would generally want to enable it simultaneously across all threads. This would be a lot easier for the kernel than for the application. - Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Wed, Sep 17, 2014 at 09:43:09PM -0700, Dave Hansen wrote: On 09/17/2014 08:23 PM, Kevin Easton wrote: I was actually thinking that the kernel would take care of the xsave / xrstor (for current), updating tsk-thread.fpu.state (for non-running threads) and sending an IPI for threads running on other CPUs. Of course userspace can always then manually change the bounds directory address itself, but then it's quite clear that they're doing something unsupported. Just an idea, anyway. What's the benefit of that? As it stands now, MPX is likely to be enabled well before any threads are created, and the MPX enabling state will be inherited by the new thread at clone() time. The current mechanism allows a thread to individually enable or disable MPX independently of the other threads. I think it makes it both more complicated and less flexible. I was assuming that if an application did want to enable MPX after threads had already been created, it would generally want to enable it simultaneously across all threads. This would be a lot easier for the kernel than for the application. - Kevin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/18/2014 12:17 AM, Kevin Easton wrote: I was assuming that if an application did want to enable MPX after threads had already been created, it would generally want to enable it simultaneously across all threads. This would be a lot easier for the kernel than for the application. The current gcc setup mechanism would set up MPX even before main(). So I think it's pretty unlikely that help is needed to coordinate threads. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/17/2014 08:23 PM, Kevin Easton wrote: > I was actually thinking that the kernel would take care of the xsave / > xrstor (for current), updating tsk->thread.fpu.state (for non-running > threads) and sending an IPI for threads running on other CPUs. > > Of course userspace can always then manually change the bounds directory > address itself, but then it's quite clear that they're doing something > unsupported. Just an idea, anyway. What's the benefit of that? As it stands now, MPX is likely to be enabled well before any threads are created, and the MPX enabling state will be inherited by the new thread at clone() time. The current mechanism allows a thread to individually enable or disable MPX independently of the other threads. I think it makes it both more complicated and less flexible. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-18, Kevin Easton wrote: > On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote: >>> Would it be prudent to use an error code other than EINVAL for the >>> "hardware doesn't support it" case? >>> >> Seems like no specific error code for this case. > > ENXIO would probably be OK. It's not too important as long as it's > documented. > Yes. Looks like that ENXIO will be better. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote: > > Would it be prudent to use an error code other than EINVAL for the > > "hardware doesn't support it" case? > > > Seems like no specific error code for this case. ENXIO would probably be OK. It's not too important as long as it's documented. > > >> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned > > long, arg2, unsigned long, arg3, > >>me->mm->def_flags &= ~VM_NOHUGEPAGE; > >>up_write(>mm->mmap_sem); > >>break; > >> + case PR_MPX_REGISTER: > >> + error = MPX_REGISTER(me); > >> + break; > >> + case PR_MPX_UNREGISTER: > >> + error = MPX_UNREGISTER(me); > >> + break; > > > > If you pass me->mm from prctl, that makes it clear that it's > > per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. > > > > This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if > > it's not using them, otherwise you'll be sunk if you ever want to use them > > later. > > > > It seems like it only makes sense for all threads using the mm to have > > the same bounds directory set. If the interface was changed to > > directly pass the address, then could the kernel take care of setting > > it for *all* of the threads in the process? This seems like something > > that would be easier for the kernel to do than userspace. > > > If the interface was changed to this, it will be possible for insane > application to pass error bounds directory address to kernel. We still > have to call fpu_xsave() to check this. I was actually thinking that the kernel would take care of the xsave / xrstor (for current), updating tsk->thread.fpu.state (for non-running threads) and sending an IPI for threads running on other CPUs. Of course userspace can always then manually change the bounds directory address itself, but then it's quite clear that they're doing something unsupported. Just an idea, anyway. - Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-16, Kevin Easton wrote: > On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote: >> + >> +int mpx_register(struct task_struct *tsk) { >> +struct mm_struct *mm = tsk->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +/* >> + * runtime in the userspace will be responsible for allocation of >> + * the bounds directory. Then, it will save the base of the bounds >> + * directory into XSAVE/XRSTOR Save Area and enable MPX through >> + * XRSTOR instruction. >> + * >> + * fpu_xsave() is expected to be very expensive. In order to do >> + * performance optimization, here we get the base of the bounds >> + * directory and then save it into mm_struct to be used in future. >> + */ >> +mm->bd_addr = task_get_bounds_dir(tsk); >> +if (!mm->bd_addr) >> +return -EINVAL; >> + >> +return 0; >> +} >> + >> +int mpx_unregister(struct task_struct *tsk) { >> +struct mm_struct *mm = current->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +mm->bd_addr = NULL; >> +return 0; >> +} > > If that's changed, then mpx_register() and mpx_unregister() don't need > a task_struct, just an mm_struct. > Yes. An mm_struct is enough. > Probably these functions should be locking mmap_sem. > > Would it be prudent to use an error code other than EINVAL for the > "hardware doesn't support it" case? > Seems like no specific error code for this case. >> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned > long, arg2, unsigned long, arg3, >> me->mm->def_flags &= ~VM_NOHUGEPAGE; >> up_write(>mm->mmap_sem); >> break; >> +case PR_MPX_REGISTER: >> +error = MPX_REGISTER(me); >> +break; >> +case PR_MPX_UNREGISTER: >> +error = MPX_UNREGISTER(me); >> +break; > > If you pass me->mm from prctl, that makes it clear that it's > per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. > > This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if > it's not using them, otherwise you'll be sunk if you ever want to use them > later. > > It seems like it only makes sense for all threads using the mm to have > the same bounds directory set. If the interface was changed to > directly pass the address, then could the kernel take care of setting > it for *all* of the threads in the process? This seems like something > that would be easier for the kernel to do than userspace. > If the interface was changed to this, it will be possible for insane application to pass error bounds directory address to kernel. We still have to call fpu_xsave() to check this. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-16, Kevin Easton wrote: On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote: + +int mpx_register(struct task_struct *tsk) { +struct mm_struct *mm = tsk-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +/* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ +mm-bd_addr = task_get_bounds_dir(tsk); +if (!mm-bd_addr) +return -EINVAL; + +return 0; +} + +int mpx_unregister(struct task_struct *tsk) { +struct mm_struct *mm = current-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +mm-bd_addr = NULL; +return 0; +} If that's changed, then mpx_register() and mpx_unregister() don't need a task_struct, just an mm_struct. Yes. An mm_struct is enough. Probably these functions should be locking mmap_sem. Would it be prudent to use an error code other than EINVAL for the hardware doesn't support it case? Seems like no specific error code for this case. @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, me-mm-def_flags = ~VM_NOHUGEPAGE; up_write(me-mm-mmap_sem); break; +case PR_MPX_REGISTER: +error = MPX_REGISTER(me); +break; +case PR_MPX_UNREGISTER: +error = MPX_UNREGISTER(me); +break; If you pass me-mm from prctl, that makes it clear that it's per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's not using them, otherwise you'll be sunk if you ever want to use them later. It seems like it only makes sense for all threads using the mm to have the same bounds directory set. If the interface was changed to directly pass the address, then could the kernel take care of setting it for *all* of the threads in the process? This seems like something that would be easier for the kernel to do than userspace. If the interface was changed to this, it will be possible for insane application to pass error bounds directory address to kernel. We still have to call fpu_xsave() to check this. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote: Would it be prudent to use an error code other than EINVAL for the hardware doesn't support it case? Seems like no specific error code for this case. ENXIO would probably be OK. It's not too important as long as it's documented. @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, me-mm-def_flags = ~VM_NOHUGEPAGE; up_write(me-mm-mmap_sem); break; + case PR_MPX_REGISTER: + error = MPX_REGISTER(me); + break; + case PR_MPX_UNREGISTER: + error = MPX_UNREGISTER(me); + break; If you pass me-mm from prctl, that makes it clear that it's per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's not using them, otherwise you'll be sunk if you ever want to use them later. It seems like it only makes sense for all threads using the mm to have the same bounds directory set. If the interface was changed to directly pass the address, then could the kernel take care of setting it for *all* of the threads in the process? This seems like something that would be easier for the kernel to do than userspace. If the interface was changed to this, it will be possible for insane application to pass error bounds directory address to kernel. We still have to call fpu_xsave() to check this. I was actually thinking that the kernel would take care of the xsave / xrstor (for current), updating tsk-thread.fpu.state (for non-running threads) and sending an IPI for threads running on other CPUs. Of course userspace can always then manually change the bounds directory address itself, but then it's quite clear that they're doing something unsupported. Just an idea, anyway. - Kevin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-18, Kevin Easton wrote: On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote: Would it be prudent to use an error code other than EINVAL for the hardware doesn't support it case? Seems like no specific error code for this case. ENXIO would probably be OK. It's not too important as long as it's documented. Yes. Looks like that ENXIO will be better. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/17/2014 08:23 PM, Kevin Easton wrote: I was actually thinking that the kernel would take care of the xsave / xrstor (for current), updating tsk-thread.fpu.state (for non-running threads) and sending an IPI for threads running on other CPUs. Of course userspace can always then manually change the bounds directory address itself, but then it's quite clear that they're doing something unsupported. Just an idea, anyway. What's the benefit of that? As it stands now, MPX is likely to be enabled well before any threads are created, and the MPX enabling state will be inherited by the new thread at clone() time. The current mechanism allows a thread to individually enable or disable MPX independently of the other threads. I think it makes it both more complicated and less flexible. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote: > +static __user void *task_get_bounds_dir(struct task_struct *tsk) > +{ > + struct xsave_struct *xsave_buf; > + > + fpu_xsave(>thread.fpu); > + xsave_buf = &(tsk->thread.fpu.state->xsave); > + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) > + return NULL; > + > + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u & > + MPX_BNDCFG_ADDR_MASK); > +} This only makes sense if called with 'current', so is there any need for the function argument? > + > +int mpx_register(struct task_struct *tsk) > +{ > + struct mm_struct *mm = tsk->mm; > + > + if (!cpu_has_mpx) > + return -EINVAL; > + > + /* > + * runtime in the userspace will be responsible for allocation of > + * the bounds directory. Then, it will save the base of the bounds > + * directory into XSAVE/XRSTOR Save Area and enable MPX through > + * XRSTOR instruction. > + * > + * fpu_xsave() is expected to be very expensive. In order to do > + * performance optimization, here we get the base of the bounds > + * directory and then save it into mm_struct to be used in future. > + */ > + mm->bd_addr = task_get_bounds_dir(tsk); > + if (!mm->bd_addr) > + return -EINVAL; > + > + return 0; > +} > + > +int mpx_unregister(struct task_struct *tsk) > +{ > + struct mm_struct *mm = current->mm; > + > + if (!cpu_has_mpx) > + return -EINVAL; > + > + mm->bd_addr = NULL; > + return 0; > +} If that's changed, then mpx_register() and mpx_unregister() don't need a task_struct, just an mm_struct. Probably these functions should be locking mmap_sem. Would it be prudent to use an error code other than EINVAL for the "hardware doesn't support it" case? > @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, > arg2, unsigned long, arg3, > me->mm->def_flags &= ~VM_NOHUGEPAGE; > up_write(>mm->mmap_sem); > break; > + case PR_MPX_REGISTER: > + error = MPX_REGISTER(me); > + break; > + case PR_MPX_UNREGISTER: > + error = MPX_UNREGISTER(me); > + break; If you pass me->mm from prctl, that makes it clear that it's per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's not using them, otherwise you'll be sunk if you ever want to use them later. It seems like it only makes sense for all threads using the mm to have the same bounds directory set. If the interface was changed to directly pass the address, then could the kernel take care of setting it for *all* of the threads in the process? This seems like something that would be easier for the kernel to do than userspace. - Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote: +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(tsk-thread.fpu); + xsave_buf = (tsk-thread.fpu.state-xsave); + if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u + MPX_BNDCFG_ADDR_MASK); +} This only makes sense if called with 'current', so is there any need for the function argument? + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ + mm-bd_addr = task_get_bounds_dir(tsk); + if (!mm-bd_addr) + return -EINVAL; + + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm-bd_addr = NULL; + return 0; +} If that's changed, then mpx_register() and mpx_unregister() don't need a task_struct, just an mm_struct. Probably these functions should be locking mmap_sem. Would it be prudent to use an error code other than EINVAL for the hardware doesn't support it case? @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, me-mm-def_flags = ~VM_NOHUGEPAGE; up_write(me-mm-mmap_sem); break; + case PR_MPX_REGISTER: + error = MPX_REGISTER(me); + break; + case PR_MPX_UNREGISTER: + error = MPX_UNREGISTER(me); + break; If you pass me-mm from prctl, that makes it clear that it's per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE. This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's not using them, otherwise you'll be sunk if you ever want to use them later. It seems like it only makes sense for all threads using the mm to have the same bounds directory set. If the interface was changed to directly pass the address, then could the kernel take care of setting it for *all* of the threads in the process? This seems like something that would be easier for the kernel to do than userspace. - Kevin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/15/2014 08:20 PM, Ren, Qiaowei wrote: >> What are the semantics across execve() ? >> > This will not impact on the semantics of execve(). One runtime > library > for MPX will be provided (or merged into Glibc), and when the > application starts, this runtime will be called to initialize MPX > runtime environment, including calling prctl() to notify the kernel to > start managing the bounds directories. You can see the discussion > about exec(): https://lkml.org/lkml/2014/1/26/199 I think he's asking what happens to the kernel value at execve() time. The short answer is that it is zero'd along with the rest of a new mm. It probably _shouldn't_ be, though. It's actually valid to have a bound directory at 0x0. We probably need to initialize it to -1 instead, and that means initializing to -1 at execve() time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-15, One Thousand Gnomes wrote: >> The base of the bounds directory is set into mm_struct during >> PR_MPX_REGISTER command execution. This member can be used to check >> whether one application is mpx enabled. > > Not really because by the time you ask the question another thread > might have decided to unregister it. > > >> +int mpx_register(struct task_struct *tsk) { >> +struct mm_struct *mm = tsk->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +/* >> + * runtime in the userspace will be responsible for allocation of >> + * the bounds directory. Then, it will save the base of the bounds >> + * directory into XSAVE/XRSTOR Save Area and enable MPX through >> + * XRSTOR instruction. >> + * >> + * fpu_xsave() is expected to be very expensive. In order to do >> + * performance optimization, here we get the base of the bounds >> + * directory and then save it into mm_struct to be used in future. >> + */ >> +mm->bd_addr = task_get_bounds_dir(tsk); >> +if (!mm->bd_addr) >> +return -EINVAL; > > What stops two threads calling this in parallel ? >> + >> +return 0; >> +} >> + >> +int mpx_unregister(struct task_struct *tsk) { >> +struct mm_struct *mm = current->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +mm->bd_addr = NULL; > > or indeed calling this in parallel > > What are the semantics across execve() ? > This will not impact on the semantics of execve(). One runtime library for MPX will be provided (or merged into Glibc), and when the application starts, this runtime will be called to initialize MPX runtime environment, including calling prctl() to notify the kernel to start managing the bounds directories. You can see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 It would be extremely unusual for an application to have some MPX and some non-MPX threads, since they would share the same address space and the non-MPX threads would mess up the bounds. That is to say, it looks like be unusual for one of these threads to call prctl() to enable or disable MPX. I guess we need to add some rules into documentation. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-15, One Thousand Gnomes wrote: The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Not really because by the time you ask the question another thread might have decided to unregister it. +int mpx_register(struct task_struct *tsk) { +struct mm_struct *mm = tsk-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +/* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ +mm-bd_addr = task_get_bounds_dir(tsk); +if (!mm-bd_addr) +return -EINVAL; What stops two threads calling this in parallel ? + +return 0; +} + +int mpx_unregister(struct task_struct *tsk) { +struct mm_struct *mm = current-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +mm-bd_addr = NULL; or indeed calling this in parallel What are the semantics across execve() ? This will not impact on the semantics of execve(). One runtime library for MPX will be provided (or merged into Glibc), and when the application starts, this runtime will be called to initialize MPX runtime environment, including calling prctl() to notify the kernel to start managing the bounds directories. You can see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 It would be extremely unusual for an application to have some MPX and some non-MPX threads, since they would share the same address space and the non-MPX threads would mess up the bounds. That is to say, it looks like be unusual for one of these threads to call prctl() to enable or disable MPX. I guess we need to add some rules into documentation. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/15/2014 08:20 PM, Ren, Qiaowei wrote: What are the semantics across execve() ? This will not impact on the semantics of execve(). One runtime library for MPX will be provided (or merged into Glibc), and when the application starts, this runtime will be called to initialize MPX runtime environment, including calling prctl() to notify the kernel to start managing the bounds directories. You can see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 I think he's asking what happens to the kernel value at execve() time. The short answer is that it is zero'd along with the rest of a new mm. It probably _shouldn't_ be, though. It's actually valid to have a bound directory at 0x0. We probably need to initialize it to -1 instead, and that means initializing to -1 at execve() time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
> The base of the bounds directory is set into mm_struct during > PR_MPX_REGISTER command execution. This member can be used to > check whether one application is mpx enabled. Not really because by the time you ask the question another thread might have decided to unregister it. > +int mpx_register(struct task_struct *tsk) > +{ > + struct mm_struct *mm = tsk->mm; > + > + if (!cpu_has_mpx) > + return -EINVAL; > + > + /* > + * runtime in the userspace will be responsible for allocation of > + * the bounds directory. Then, it will save the base of the bounds > + * directory into XSAVE/XRSTOR Save Area and enable MPX through > + * XRSTOR instruction. > + * > + * fpu_xsave() is expected to be very expensive. In order to do > + * performance optimization, here we get the base of the bounds > + * directory and then save it into mm_struct to be used in future. > + */ > + mm->bd_addr = task_get_bounds_dir(tsk); > + if (!mm->bd_addr) > + return -EINVAL; What stops two threads calling this in parallel ? > + > + return 0; > +} > + > +int mpx_unregister(struct task_struct *tsk) > +{ > + struct mm_struct *mm = current->mm; > + > + if (!cpu_has_mpx) > + return -EINVAL; > + > + mm->bd_addr = NULL; or indeed calling this in parallel What are the semantics across execve() ? Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Not really because by the time you ask the question another thread might have decided to unregister it. +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ + mm-bd_addr = task_get_bounds_dir(tsk); + if (!mm-bd_addr) + return -EINVAL; What stops two threads calling this in parallel ? + + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm-bd_addr = NULL; or indeed calling this in parallel What are the semantics across execve() ? Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Dave Hansen wrote: > On 09/12/2014 10:34 AM, Thomas Gleixner wrote: > > On Fri, 12 Sep 2014, Dave Hansen wrote: > >> There are two mappings in play: > >> 1. The mapping with the actual data, which userspace is munmap()ing or > >>brk()ing away, etc... (never tagged VM_MPX) > > > > It's not tagged that way because it is mapped by user space. > > Correct. It is not tagged because it is mapped by user space. > > > This is the directory, right? > > No. The untagged mapping in question here is for normal user data, like > an mmap() or brk(), unrelated to MPX. Ok. That makes sense. > The directory is a separate matter. It is also (currently) untagged > with VM_MPX since it is also allocated by userspace. So if that gets unmapped my observation holds. You still try to access the directory, take the fault, queue work and in the work you dont know how to handle it either. So if the unmapped region affects bd_addr then we should just release the affected BT mappings, i.e. all vmas flagged with VMA_MPX. > > With the allocation from #BR you make that behaviour > > dynamic and you just provide an empty "no bounds" table to make the > > bound checker happy. > > Kinda. We do provide an empty table, but the first access will always > be a write, so it doesn't stay empty for long. So this comes from adding an entry to a not yet mapped table not from an actual bound check? I still need to digest the details in the manual. > The bounds directory is not being unmapped here. I _think_ I covered > that above, but don't be shy if I'm not being clear. ;) Fair enough. My confusion. > If the bounds directory moved around, this would make sense. Otherwise, > it's a waste of space because all vmas in a given mm would have the > exact same bd_addr, and we might as well just store it in mm->bd_something. Ok. But we really want to do some sanity checking on all of this. > Are you suggesting that we support moving the bounds directory around? No, but the stupid thing CAN move around and we want to think about it now instead of figuring out what to do about it later. So if we go and store bd_addr with the prctl then you can do in the #BR "Invalid BD entry": bd_addr = xsave->xsave_buf->bndcsr.cfg_reg_u; /* * Catch the case that this is not enabled, i.e. mm->bd_addr == 0, * and the case that stupid user space moved the directory * around. */ if (mm->bd_addr != bd_addr) { Yell and whack stupid app over the head; } > Also, the bd_entry can be _calculated_ from vma->vm_start and the > bd_addr. It seems a bit redundant to store it like this. Fair enough. > If you are talking about the VM_MPX VMA that was allocated to hold the > bounds table, this won't work. Sorry yes, that only works for unmapping the bound directory itself. > Once we unmap the bounds table, we would have a bounds directory entry > pointing at empty address space. That address space could now be > allocated for some other (random) use, and the MPX hardware is now going > to go trying to walk it as if it were a bounds table. That would be bad. > > Any unmapping of a bounds table has to be accompanied by a corresponding > write to the bounds directory entry. That write to the bounds directory > can fault. So if it fails you need to keep the bound table around until you can handle that somewhere else, i.e. outside of the mmap sem held region. That's what you are planning to do with the work queue thing. Now I'm asking myself, whether we are forced to do that from the end of do_unmap() rather than doing it from the call site outside of the mmap_sem held region. I can see that adding arch_unmap() to do_unmap() is a very simple solution, but it comes with the price of dealing with faults inside of the mmap_sem held region. It might be worthwhile to think about the following: down_write(mmap_sem); do_stuff() do_munmap(mm, start, len) ... arch_munmap(mm, start, len) { if (!mm->bd_addr) return; bt_work = kmalloc(sizeof(struct bt_work)); bt_work->start = start; bt_work->len = len; hlist_add(_work->list, >bt_work_head); } And then instead of up_write(mmap_sem); arch_up_write(mmap_sem); Which by default is mapped to up_write(mmap_sem); Now for the MPX case you can do: { HLIST_HEAD(bt_work_head); hlist_move_list(>bt_work_head, _work_head); up_write(mmap_sem); hlist_for_each_entry_safe() handle_bt_work(); } So that needs a few more changes vs. the up_write(mmap_sem) at the callsites of do_munmap(), but we might even make that a generic thing, i.e. replace up_write(mmap_sem) with release_write(mmap_sem). I can imagine that we have other use cases for this. Thoughts? tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Dave Hansen wrote: On 09/12/2014 10:34 AM, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Dave Hansen wrote: There are two mappings in play: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... (never tagged VM_MPX) It's not tagged that way because it is mapped by user space. Correct. It is not tagged because it is mapped by user space. This is the directory, right? No. The untagged mapping in question here is for normal user data, like an mmap() or brk(), unrelated to MPX. Ok. That makes sense. The directory is a separate matter. It is also (currently) untagged with VM_MPX since it is also allocated by userspace. So if that gets unmapped my observation holds. You still try to access the directory, take the fault, queue work and in the work you dont know how to handle it either. So if the unmapped region affects bd_addr then we should just release the affected BT mappings, i.e. all vmas flagged with VMA_MPX. With the allocation from #BR you make that behaviour dynamic and you just provide an empty no bounds table to make the bound checker happy. Kinda. We do provide an empty table, but the first access will always be a write, so it doesn't stay empty for long. So this comes from adding an entry to a not yet mapped table not from an actual bound check? I still need to digest the details in the manual. The bounds directory is not being unmapped here. I _think_ I covered that above, but don't be shy if I'm not being clear. ;) Fair enough. My confusion. If the bounds directory moved around, this would make sense. Otherwise, it's a waste of space because all vmas in a given mm would have the exact same bd_addr, and we might as well just store it in mm-bd_something. Ok. But we really want to do some sanity checking on all of this. Are you suggesting that we support moving the bounds directory around? No, but the stupid thing CAN move around and we want to think about it now instead of figuring out what to do about it later. So if we go and store bd_addr with the prctl then you can do in the #BR Invalid BD entry: bd_addr = xsave-xsave_buf-bndcsr.cfg_reg_u; /* * Catch the case that this is not enabled, i.e. mm-bd_addr == 0, * and the case that stupid user space moved the directory * around. */ if (mm-bd_addr != bd_addr) { Yell and whack stupid app over the head; } Also, the bd_entry can be _calculated_ from vma-vm_start and the bd_addr. It seems a bit redundant to store it like this. Fair enough. If you are talking about the VM_MPX VMA that was allocated to hold the bounds table, this won't work. Sorry yes, that only works for unmapping the bound directory itself. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space could now be allocated for some other (random) use, and the MPX hardware is now going to go trying to walk it as if it were a bounds table. That would be bad. Any unmapping of a bounds table has to be accompanied by a corresponding write to the bounds directory entry. That write to the bounds directory can fault. So if it fails you need to keep the bound table around until you can handle that somewhere else, i.e. outside of the mmap sem held region. That's what you are planning to do with the work queue thing. Now I'm asking myself, whether we are forced to do that from the end of do_unmap() rather than doing it from the call site outside of the mmap_sem held region. I can see that adding arch_unmap() to do_unmap() is a very simple solution, but it comes with the price of dealing with faults inside of the mmap_sem held region. It might be worthwhile to think about the following: down_write(mmap_sem); do_stuff() do_munmap(mm, start, len) ... arch_munmap(mm, start, len) { if (!mm-bd_addr) return; bt_work = kmalloc(sizeof(struct bt_work)); bt_work-start = start; bt_work-len = len; hlist_add(bt_work-list, mm-bt_work_head); } And then instead of up_write(mmap_sem); arch_up_write(mmap_sem); Which by default is mapped to up_write(mmap_sem); Now for the MPX case you can do: { HLIST_HEAD(bt_work_head); hlist_move_list(mm-bt_work_head, bt_work_head); up_write(mmap_sem); hlist_for_each_entry_safe() handle_bt_work(); } So that needs a few more changes vs. the up_write(mmap_sem) at the callsites of do_munmap(), but we might even make that a generic thing, i.e. replace up_write(mmap_sem) with release_write(mmap_sem). I can imagine that we have other use cases for this. Thoughts? tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 11:42 AM, Thomas Gleixner wrote: > On Fri, 12 Sep 2014, Thomas Gleixner wrote: >> On Fri, 12 Sep 2014, Dave Hansen wrote: >> The proper solution to this problem is: >> >> do_bounds() >> bd_addr = get_bd_addr_from_xsave(); >> bd_entry = bndstatus & ADDR_MASK: > > Just for clarification. You CANNOT avoid the xsave here because it's > the only way to access BNDSTATUS according to the manual. > > "The BNDCFGU and BNDSTATUS registers are accessible only with > XSAVE/XRSTOR family of instructions" > > So there is no point to cache BNDCFGU as you get it anyway when you > need to retrieve the invalid BD entry. Agreed. It serves no purpose during a bounds fault. However, it does keep you from having to do an xsave during the bounds table free operations, like at unmap() time. That is actually a much more critical path than bounds faults. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 10:42 AM, Thomas Gleixner wrote: > On Fri, 12 Sep 2014, Dave Hansen wrote: >> The prctl(register) is meant to be a signal from userspace to the kernel >> to say, "I would like your help in managing these bounds tables". >> prctl(unregister) is the opposite, meaning "I don't want your help any >> more". > > Fine, but that's a totally different story. I can see the usefulness > of this, but then it's a complete misnomer. It should be: > >prctl(EN/DISABLE_MPX_BT_MANAGEMENT) Agreed. Those are much better names. > So this wants to be a boolean value and not some random user space > address collected at some random point and then ignored until you do > the magic cleanup. See the other reply. I know at this point you think the kernel can not or should not keep a copy of the bounds directory location around. I understand that. Bear with me for a moment, and please just assume for a moment that we need it. It's far from a random userspace address. When you make a syscall, we put the arguments in registers. The register we're putting it in here just happens to be used by the hardware. Right now, we do (ignoring the actual xsave/xrstr): bndcfgu = bnd_dir_ptr | ENABLE_BIT; prctl(ENABLE_MPX_BT_MANAGEMENT); // kernel grabs from xsave buf We could pass it explicitly in %rdi as a syscall argument and not have the prctl() code fetch it from the xsave buffer. I'm just not sure what this buys us: bndcfgu = bnd_dir_ptr | ENABLE_BIT; prctl(ENABLE_MPX_BT_MANAGEMENT, bndcfgu); Also, the "random cleanup" just happens to correspond with memory deallocation, which is something we want to go fast. I'd _prefer_ to keep xsaves out of the unmap path if possible. It's not a strict requirement, but it does seem prudent as an xsave eats a dozen or so cachelines. It's also not "sampled". I can't imagine a situation where the register will change values during the execution of any sane program. It really is essentially fixed. It's probably one of the reasons it is so expensive to access: there's *no* reason to do it frequently. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 10:34 AM, Thomas Gleixner wrote: > On Fri, 12 Sep 2014, Dave Hansen wrote: >> There are two mappings in play: >> 1. The mapping with the actual data, which userspace is munmap()ing or >>brk()ing away, etc... (never tagged VM_MPX) > > It's not tagged that way because it is mapped by user space. Correct. It is not tagged because it is mapped by user space. > This is the directory, right? No. The untagged mapping in question here is for normal user data, like an mmap() or brk(), unrelated to MPX. The directory is a separate matter. It is also (currently) untagged with VM_MPX since it is also allocated by userspace. >> 2. The mapping for the bounds table *backing* the data (is tagged with >>VM_MPX) > > That's the stuff, which gets magically allocated from do_bounds(). And > the reason you do that from the #BR is that user space would have to > allocate a gazillion of bound tables to make sure that every corner > case is covered. Yes. > With the allocation from #BR you make that behaviour > dynamic and you just provide an empty "no bounds" table to make the > bound checker happy. Kinda. We do provide an empty table, but the first access will always be a write, so it doesn't stay empty for long. ... > Now, I have a hard time to see how that is supposed to work. > > do_unmap() > detach_vmas_to_be_unmapped() > unmap_region() >free_pgtables() > arch_unmap() >mpx_unmap() > > So at the point where you try to access the directory to gather the > information about the entries which might be affected, that stuff is > unmapped already and the page tables are gone. > > Brilliant idea, really. And if you run into the fault in mpx_unmap() > you plan to delegate the fixup to a work queue. How is that thing > going to find what belonged to the unmapped directory? The bounds directory is not being unmapped here. I _think_ I covered that above, but don't be shy if I'm not being clear. ;) > Even if the stuff would be accessible at that point, it is a damned > stupid idea to rely on anything userspace is providing to you. I > learned that the hard way in futex.c > > The proper solution to this problem is: > > do_bounds() > bd_addr = get_bd_addr_from_xsave(); > bd_entry = bndstatus & ADDR_MASK: > > bt = mpx_mmap(bd_addr, bd_entry, len); > > set_bt_entry_in_bd(bd_entry, bt); > > And in mpx_mmap() > >. >vma = find_vma(); > >vma->bd_addr = bd_addr; >vma->bd_entry = bd_entry; If the bounds directory moved around, this would make sense. Otherwise, it's a waste of space because all vmas in a given mm would have the exact same bd_addr, and we might as well just store it in mm->bd_something. Are you suggesting that we support moving the bounds directory around? Also, the bd_entry can be _calculated_ from vma->vm_start and the bd_addr. It seems a bit redundant to store it like this. Also this would add 16 bytes to the currently 184-byte VMA. That seems suboptimal to me. It would eat over a megabyte of memory on my *laptop* alone. > Now on mpx_unmap() > > for_each_vma() > if (is_affected(vma->bd_addr, vma->bd_entry)) > unmap(vma); > > That does not require a prctl, no fault handling in the unmap path, it > just works and is robust by design because it does not rely on any > user space crappola. You store the directory context at allocation > time and free it when that context goes away. It's that simple, really. If you are talking about the VM_MPX VMA that was allocated to hold the bounds table, this won't work. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space could now be allocated for some other (random) use, and the MPX hardware is now going to go trying to walk it as if it were a bounds table. That would be bad. Any unmapping of a bounds table has to be accompanied by a corresponding write to the bounds directory entry. That write to the bounds directory can fault. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Thomas Gleixner wrote: > On Fri, 12 Sep 2014, Dave Hansen wrote: > The proper solution to this problem is: > > do_bounds() > bd_addr = get_bd_addr_from_xsave(); > bd_entry = bndstatus & ADDR_MASK: Just for clarification. You CANNOT avoid the xsave here because it's the only way to access BNDSTATUS according to the manual. "The BNDCFGU and BNDSTATUS registers are accessible only with XSAVE/XRSTOR family of instructions" So there is no point to cache BNDCFGU as you get it anyway when you need to retrieve the invalid BD entry. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Dave Hansen wrote: > On 09/12/2014 01:11 AM, Thomas Gleixner wrote: > > So what you are saying is, that if user space sets the pointer to NULL > > via the unregister prctl, kernel can safely ignore vmas which have the > > VM_MPX flag set. I really can't follow that logic. > > > > mmap_mpx(); > > prctl(enable mpx); > > do lots of crap which uses mpx; > > prctl(disable mpx); > > > > So after that point the previous use of MPX is irrelevant, just > > because we set a pointer to NULL? Does it just look like crap because > > I do not get the big picture how all of this is supposed to work? > > The prctl(register) is meant to be a signal from userspace to the kernel > to say, "I would like your help in managing these bounds tables". > prctl(unregister) is the opposite, meaning "I don't want your help any > more". Fine, but that's a totally different story. I can see the usefulness of this, but then it's a complete misnomer. It should be: prctl(EN/DISABLE_MPX_BT_MANAGEMENT) So this wants to be a boolean value and not some random user space address collected at some random point and then ignored until you do the magic cleanup. See the other reply. > If userspace uses MPX, it does not necessarily want the kernel to do > bounds table management all the time (or ever in some cases). Without > the prctl(), the kernel has no way of distinguishing what userspace wants. Fine with me, but it needs to be done proper. And proper means: ON/OFF The kernel has to handle the information for which context it allocated stuff and then tear it down when the context goes away. Relying on a user space address sampled at some random prctl point is just stupid. > > And then you need another bunch of logic in the prctl(disable mpx) > > path to cleanup the mess instead of just setting a random pointer to > > NULL. > > The bounds tables potentially represent a *lot* of state. If userspace > wants to temporarily turn off the kernel's MPX bounds table management, > it does not necessarily want that state destroyed. On the other hand, > if userspace feels the need to go destroying all the state, it is free > to do so and does not need any help to do so from the kernel. Fine with me, but the above still stands. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Dave Hansen wrote: > On 09/12/2014 02:24 AM, Thomas Gleixner wrote: > > On Fri, 12 Sep 2014, Thomas Gleixner wrote: > >> On Thu, 11 Sep 2014, Dave Hansen wrote: > >>> Well, we use it to figure out whether we _potentially_ need to tear down > >>> an VM_MPX-flagged area. There's no guarantee that there will be one. > >> > >> So what you are saying is, that if user space sets the pointer to NULL > >> via the unregister prctl, kernel can safely ignore vmas which have the > >> VM_MPX flag set. I really can't follow that logic. > >> > >>mmap_mpx(); > >>prctl(enable mpx); > >>do lots of crap which uses mpx; > >>prctl(disable mpx); > >> > >> So after that point the previous use of MPX is irrelevant, just > >> because we set a pointer to NULL? Does it just look like crap because > >> I do not get the big picture how all of this is supposed to work? > > > > do_bounds() will happily map new BTs no matter whether the prctl was > > invoked or not. So what's the value of the prctl at all? > > The behavior as it stands is wrong. We should at least have the kernel > refuse to map new BTs if the prctl() hasn't been issued. We'll fix it up. > > > The mapping is flagged VM_MPX. Why is this not sufficient? > > The comment is confusing and only speaks to half of what the if() in > question is doing. We'll get a better comment in there. But, for the > sake of explaining it fully: > > There are two mappings in play: > 1. The mapping with the actual data, which userspace is munmap()ing or >brk()ing away, etc... (never tagged VM_MPX) It's not tagged that way because it is mapped by user space. This is the directory, right? > 2. The mapping for the bounds table *backing* the data (is tagged with >VM_MPX) That's the stuff, which gets magically allocated from do_bounds(). And the reason you do that from the #BR is that user space would have to allocate a gazillion of bound tables to make sure that every corner case is covered. With the allocation from #BR you make that behaviour dynamic and you just provide an empty "no bounds" table to make the bound checker happy. > The code ends up looking like this: > > vm_munmap() > { > do_unmap(vma); // #1 above > if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) > // lookup the backing vma (#2 above) > vm_munmap(vma2) > } > > The bd_addr check is intended to say "could the kernel have possibly > created some VM_MPX vmas?" As you noted above, we will happily go > creating VM_MPX vmas without mm->bd_addr being set. That's will get fixed. > > The VM_MPX _flags_ check on the VMA is there simply to prevent > recursion. vm_munmap() of the VM_MPX vma is called _under_ vm_munmap() > of the data VMA, and we've got to ensure it doesn't recurse. *This* > part of the if() in question is not addressed in the comment. That's > something we can fix up in the next version. Ok, slowly I get the puzzle together :) Now, the question is whether this magic fragile fixup is the right thing to do in the context of unmap/brk. So if the directory is unmapped, you want to free the bounds tables which are referenced from the directory, i.e. those which you allocated in do_bounds(). So you call arch_unmap() at the very end of do_unmap(). This walks the directory to look at the entries and unmaps the bounds table which is referenced from the directory and then clears the directory entry. Now, I have a hard time to see how that is supposed to work. do_unmap() detach_vmas_to_be_unmapped() unmap_region() free_pgtables() arch_unmap() mpx_unmap() So at the point where you try to access the directory to gather the information about the entries which might be affected, that stuff is unmapped already and the page tables are gone. Brilliant idea, really. And if you run into the fault in mpx_unmap() you plan to delegate the fixup to a work queue. How is that thing going to find what belonged to the unmapped directory? Even if the stuff would be accessible at that point, it is a damned stupid idea to rely on anything userspace is providing to you. I learned that the hard way in futex.c The proper solution to this problem is: do_bounds() bd_addr = get_bd_addr_from_xsave(); bd_entry = bndstatus & ADDR_MASK: bt = mpx_mmap(bd_addr, bd_entry, len); set_bt_entry_in_bd(bd_entry, bt); And in mpx_mmap() . vma = find_vma(); vma->bd_addr = bd_addr; vma->bd_entry = bd_entry; Now on mpx_unmap() for_each_vma() if (is_affected(vma->bd_addr, vma->bd_entry)) unmap(vma); That does not require a prctl, no fault handling in the unmap path, it just works and is robust by design because it does not rely on any user space crappola. You store the directory context at allocation time and free it when that context goes away. It's that simple, really. So you can still think about a prctl in order to enable/disable the automatic mapping
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 01:11 AM, Thomas Gleixner wrote: > So what you are saying is, that if user space sets the pointer to NULL > via the unregister prctl, kernel can safely ignore vmas which have the > VM_MPX flag set. I really can't follow that logic. > > mmap_mpx(); > prctl(enable mpx); > do lots of crap which uses mpx; > prctl(disable mpx); > > So after that point the previous use of MPX is irrelevant, just > because we set a pointer to NULL? Does it just look like crap because > I do not get the big picture how all of this is supposed to work? The prctl(register) is meant to be a signal from userspace to the kernel to say, "I would like your help in managing these bounds tables". prctl(unregister) is the opposite, meaning "I don't want your help any more". The kernel won't really ignore VM_MPX vmas, it just won't go looking for them actively in response to the unmapping of other non-VM_MPX vmas. >> Yes. The only other way the kernel can possibly know that it needs to >> go tearing things down is with a potentially frequent and expensive xsave. >> >> Either we change mmap to say "this mmap() is for a bounds directory", or >> we have some other interface that says "the mmap() for the bounds >> directory is at $foo". We could also record the bounds directory the >> first time that we catch userspace using it. I'd rather have an >> explicit interface than an implicit one like that, though I don't feel >> that strongly about it. > > I really have to disagree here. If I follow your logic then we would > have a prctl for using floating point as well instead of catching the > use and handle it from there. Just get it, if you make it simple for > user space to do stupid things, they will happen in all provided ways > and some more. Here's what it boils down to: If userspace uses a floating point register, it wants it saved. If userspace uses MPX, it does not necessarily want the kernel to do bounds table management all the time (or ever in some cases). Without the prctl(), the kernel has no way of distinguishing what userspace wants. >>> The design to support this feature makes no sense at all to me. We >>> have a special mmap interface, some magic kernel side mapping >>> functionality and then on top of it a prctl telling the kernel to >>> ignore/respect it. >> >> That's a good point. We don't seem to have anything in the >> allocate_bt() side of things to tell the kernel to refuse to create >> things if the prctl() hasn't been called. That needs to get added. > > And then you need another bunch of logic in the prctl(disable mpx) > path to cleanup the mess instead of just setting a random pointer to > NULL. The bounds tables potentially represent a *lot* of state. If userspace wants to temporarily turn off the kernel's MPX bounds table management, it does not necessarily want that state destroyed. On the other hand, if userspace feels the need to go destroying all the state, it is free to do so and does not need any help to do so from the kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 02:24 AM, Thomas Gleixner wrote: > On Fri, 12 Sep 2014, Thomas Gleixner wrote: >> On Thu, 11 Sep 2014, Dave Hansen wrote: >>> Well, we use it to figure out whether we _potentially_ need to tear down >>> an VM_MPX-flagged area. There's no guarantee that there will be one. >> >> So what you are saying is, that if user space sets the pointer to NULL >> via the unregister prctl, kernel can safely ignore vmas which have the >> VM_MPX flag set. I really can't follow that logic. >> >> mmap_mpx(); >> prctl(enable mpx); >> do lots of crap which uses mpx; >> prctl(disable mpx); >> >> So after that point the previous use of MPX is irrelevant, just >> because we set a pointer to NULL? Does it just look like crap because >> I do not get the big picture how all of this is supposed to work? > > do_bounds() will happily map new BTs no matter whether the prctl was > invoked or not. So what's the value of the prctl at all? The behavior as it stands is wrong. We should at least have the kernel refuse to map new BTs if the prctl() hasn't been issued. We'll fix it up. > The mapping is flagged VM_MPX. Why is this not sufficient? The comment is confusing and only speaks to half of what the if() in question is doing. We'll get a better comment in there. But, for the sake of explaining it fully: There are two mappings in play: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... (never tagged VM_MPX) 2. The mapping for the bounds table *backing* the data (is tagged with VM_MPX) The code ends up looking like this: vm_munmap() { do_unmap(vma); // #1 above if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) // lookup the backing vma (#2 above) vm_munmap(vma2) } The bd_addr check is intended to say "could the kernel have possibly created some VM_MPX vmas?" As you noted above, we will happily go creating VM_MPX vmas without mm->bd_addr being set. That's will get fixed. The VM_MPX _flags_ check on the VMA is there simply to prevent recursion. vm_munmap() of the VM_MPX vma is called _under_ vm_munmap() of the data VMA, and we've got to ensure it doesn't recurse. *This* part of the if() in question is not addressed in the comment. That's something we can fix up in the next version. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Thomas Gleixner wrote: > On Thu, 11 Sep 2014, Dave Hansen wrote: > > Well, we use it to figure out whether we _potentially_ need to tear down > > an VM_MPX-flagged area. There's no guarantee that there will be one. > > So what you are saying is, that if user space sets the pointer to NULL > via the unregister prctl, kernel can safely ignore vmas which have the > VM_MPX flag set. I really can't follow that logic. > > mmap_mpx(); > prctl(enable mpx); > do lots of crap which uses mpx; > prctl(disable mpx); > > So after that point the previous use of MPX is irrelevant, just > because we set a pointer to NULL? Does it just look like crap because > I do not get the big picture how all of this is supposed to work? do_bounds() will happily map new BTs no matter whether the prctl was invoked or not. So what's the value of the prctl at all? The mapping is flagged VM_MPX. Why is this not sufficient? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, 11 Sep 2014, Dave Hansen wrote: > On 09/11/2014 04:28 PM, Thomas Gleixner wrote: > > On Thu, 11 Sep 2014, Qiaowei Ren wrote: > >> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() > >> commands. These commands can be used to register and unregister MPX > >> related resource on the x86 platform. > > > > I cant see anything which is registered/unregistered. > > This registers the location of the bounds directory with the kernel. > > >From the app's perspective, it says "I'm using MPX, and here is where I > put the root data structure". > > Without this, the kernel would have to do an (expensive) xsave operation > every time it wanted to see if MPX was in use. This also makes the > user/kernel interaction more explicit. We would be in a world of hurt > if userspace was allowed to move the bounds directory around. With this > interface, it's a bit more obvious that userspace can't just move it > around willy-nilly. And what prevents it to do so? Just the fact that you have a prctl does not make userspace better. > >> The base of the bounds directory is set into mm_struct during > >> PR_MPX_REGISTER command execution. This member can be used to > >> check whether one application is mpx enabled. > > > > This changelog is completely useless. > > Yeah, it's pretty bare-bones. Let me know if the explanation above > makes sense, and we'll get it updated. Well, it at least explains what its supposed to do. Whether that itself makes sense is a completely different question. > >> + */ > >> +static __user void *task_get_bounds_dir(struct task_struct *tsk) > >> +{ > >> + struct xsave_struct *xsave_buf; > >> + > >> + fpu_xsave(>thread.fpu); > >> + xsave_buf = &(tsk->thread.fpu.state->xsave); > >> + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) > >> + return NULL; > > > > Now this might be understandable with a proper comment. Right now it's > > a magic check for something uncomprehensible. > > It's a bit ugly to access, but it seems pretty blatantly obvious that > this is a check for "Is the enable flag in a hardware register set?" > > Yes, the registers have names only a mother could love. But that is > what they're really called. > > I guess we could add some comments about why we need to do the xsave. Exactly. > > So we use that information to check, whether we need to tear down a > > VM_MPX flagged region with mpx_unmap(), right? > > Well, we use it to figure out whether we _potentially_ need to tear down > an VM_MPX-flagged area. There's no guarantee that there will be one. So what you are saying is, that if user space sets the pointer to NULL via the unregister prctl, kernel can safely ignore vmas which have the VM_MPX flag set. I really can't follow that logic. mmap_mpx(); prctl(enable mpx); do lots of crap which uses mpx; prctl(disable mpx); So after that point the previous use of MPX is irrelevant, just because we set a pointer to NULL? Does it just look like crap because I do not get the big picture how all of this is supposed to work? > Yes. The only other way the kernel can possibly know that it needs to > go tearing things down is with a potentially frequent and expensive xsave. > > Either we change mmap to say "this mmap() is for a bounds directory", or > we have some other interface that says "the mmap() for the bounds > directory is at $foo". We could also record the bounds directory the > first time that we catch userspace using it. I'd rather have an > explicit interface than an implicit one like that, though I don't feel > that strongly about it. I really have to disagree here. If I follow your logic then we would have a prctl for using floating point as well instead of catching the use and handle it from there. Just get it, if you make it simple for user space to do stupid things, they will happen in all provided ways and some more. > > The design to support this feature makes no sense at all to me. We > > have a special mmap interface, some magic kernel side mapping > > functionality and then on top of it a prctl telling the kernel to > > ignore/respect it. > > That's a good point. We don't seem to have anything in the > allocate_bt() side of things to tell the kernel to refuse to create > things if the prctl() hasn't been called. That needs to get added. And then you need another bunch of logic in the prctl(disable mpx) path to cleanup the mess instead of just setting a random pointer to NULL. > If you don't want to share them in public, I'm happy to take this > off-list, but please do share. I'll let you know once I verified that it might work. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, 11 Sep 2014, Dave Hansen wrote: On 09/11/2014 04:28 PM, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Qiaowei Ren wrote: This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. I cant see anything which is registered/unregistered. This registers the location of the bounds directory with the kernel. From the app's perspective, it says I'm using MPX, and here is where I put the root data structure. Without this, the kernel would have to do an (expensive) xsave operation every time it wanted to see if MPX was in use. This also makes the user/kernel interaction more explicit. We would be in a world of hurt if userspace was allowed to move the bounds directory around. With this interface, it's a bit more obvious that userspace can't just move it around willy-nilly. And what prevents it to do so? Just the fact that you have a prctl does not make userspace better. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. This changelog is completely useless. Yeah, it's pretty bare-bones. Let me know if the explanation above makes sense, and we'll get it updated. Well, it at least explains what its supposed to do. Whether that itself makes sense is a completely different question. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(tsk-thread.fpu); + xsave_buf = (tsk-thread.fpu.state-xsave); + if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) + return NULL; Now this might be understandable with a proper comment. Right now it's a magic check for something uncomprehensible. It's a bit ugly to access, but it seems pretty blatantly obvious that this is a check for Is the enable flag in a hardware register set? Yes, the registers have names only a mother could love. But that is what they're really called. I guess we could add some comments about why we need to do the xsave. Exactly. So we use that information to check, whether we need to tear down a VM_MPX flagged region with mpx_unmap(), right? Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. So what you are saying is, that if user space sets the pointer to NULL via the unregister prctl, kernel can safely ignore vmas which have the VM_MPX flag set. I really can't follow that logic. mmap_mpx(); prctl(enable mpx); do lots of crap which uses mpx; prctl(disable mpx); So after that point the previous use of MPX is irrelevant, just because we set a pointer to NULL? Does it just look like crap because I do not get the big picture how all of this is supposed to work? Yes. The only other way the kernel can possibly know that it needs to go tearing things down is with a potentially frequent and expensive xsave. Either we change mmap to say this mmap() is for a bounds directory, or we have some other interface that says the mmap() for the bounds directory is at $foo. We could also record the bounds directory the first time that we catch userspace using it. I'd rather have an explicit interface than an implicit one like that, though I don't feel that strongly about it. I really have to disagree here. If I follow your logic then we would have a prctl for using floating point as well instead of catching the use and handle it from there. Just get it, if you make it simple for user space to do stupid things, they will happen in all provided ways and some more. The design to support this feature makes no sense at all to me. We have a special mmap interface, some magic kernel side mapping functionality and then on top of it a prctl telling the kernel to ignore/respect it. That's a good point. We don't seem to have anything in the allocate_bt() side of things to tell the kernel to refuse to create things if the prctl() hasn't been called. That needs to get added. And then you need another bunch of logic in the prctl(disable mpx) path to cleanup the mess instead of just setting a random pointer to NULL. If you don't want to share them in public, I'm happy to take this off-list, but please do share. I'll let you know once I verified that it might work. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Dave Hansen wrote: Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. So what you are saying is, that if user space sets the pointer to NULL via the unregister prctl, kernel can safely ignore vmas which have the VM_MPX flag set. I really can't follow that logic. mmap_mpx(); prctl(enable mpx); do lots of crap which uses mpx; prctl(disable mpx); So after that point the previous use of MPX is irrelevant, just because we set a pointer to NULL? Does it just look like crap because I do not get the big picture how all of this is supposed to work? do_bounds() will happily map new BTs no matter whether the prctl was invoked or not. So what's the value of the prctl at all? The mapping is flagged VM_MPX. Why is this not sufficient? Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 02:24 AM, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Dave Hansen wrote: Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. So what you are saying is, that if user space sets the pointer to NULL via the unregister prctl, kernel can safely ignore vmas which have the VM_MPX flag set. I really can't follow that logic. mmap_mpx(); prctl(enable mpx); do lots of crap which uses mpx; prctl(disable mpx); So after that point the previous use of MPX is irrelevant, just because we set a pointer to NULL? Does it just look like crap because I do not get the big picture how all of this is supposed to work? do_bounds() will happily map new BTs no matter whether the prctl was invoked or not. So what's the value of the prctl at all? The behavior as it stands is wrong. We should at least have the kernel refuse to map new BTs if the prctl() hasn't been issued. We'll fix it up. The mapping is flagged VM_MPX. Why is this not sufficient? The comment is confusing and only speaks to half of what the if() in question is doing. We'll get a better comment in there. But, for the sake of explaining it fully: There are two mappings in play: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... (never tagged VM_MPX) 2. The mapping for the bounds table *backing* the data (is tagged with VM_MPX) The code ends up looking like this: vm_munmap() { do_unmap(vma); // #1 above if (mm-bd_addr !(vma-vm_flags VM_MPX)) // lookup the backing vma (#2 above) vm_munmap(vma2) } The bd_addr check is intended to say could the kernel have possibly created some VM_MPX vmas? As you noted above, we will happily go creating VM_MPX vmas without mm-bd_addr being set. That's will get fixed. The VM_MPX _flags_ check on the VMA is there simply to prevent recursion. vm_munmap() of the VM_MPX vma is called _under_ vm_munmap() of the data VMA, and we've got to ensure it doesn't recurse. *This* part of the if() in question is not addressed in the comment. That's something we can fix up in the next version. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 01:11 AM, Thomas Gleixner wrote: So what you are saying is, that if user space sets the pointer to NULL via the unregister prctl, kernel can safely ignore vmas which have the VM_MPX flag set. I really can't follow that logic. mmap_mpx(); prctl(enable mpx); do lots of crap which uses mpx; prctl(disable mpx); So after that point the previous use of MPX is irrelevant, just because we set a pointer to NULL? Does it just look like crap because I do not get the big picture how all of this is supposed to work? The prctl(register) is meant to be a signal from userspace to the kernel to say, I would like your help in managing these bounds tables. prctl(unregister) is the opposite, meaning I don't want your help any more. The kernel won't really ignore VM_MPX vmas, it just won't go looking for them actively in response to the unmapping of other non-VM_MPX vmas. Yes. The only other way the kernel can possibly know that it needs to go tearing things down is with a potentially frequent and expensive xsave. Either we change mmap to say this mmap() is for a bounds directory, or we have some other interface that says the mmap() for the bounds directory is at $foo. We could also record the bounds directory the first time that we catch userspace using it. I'd rather have an explicit interface than an implicit one like that, though I don't feel that strongly about it. I really have to disagree here. If I follow your logic then we would have a prctl for using floating point as well instead of catching the use and handle it from there. Just get it, if you make it simple for user space to do stupid things, they will happen in all provided ways and some more. Here's what it boils down to: If userspace uses a floating point register, it wants it saved. If userspace uses MPX, it does not necessarily want the kernel to do bounds table management all the time (or ever in some cases). Without the prctl(), the kernel has no way of distinguishing what userspace wants. The design to support this feature makes no sense at all to me. We have a special mmap interface, some magic kernel side mapping functionality and then on top of it a prctl telling the kernel to ignore/respect it. That's a good point. We don't seem to have anything in the allocate_bt() side of things to tell the kernel to refuse to create things if the prctl() hasn't been called. That needs to get added. And then you need another bunch of logic in the prctl(disable mpx) path to cleanup the mess instead of just setting a random pointer to NULL. The bounds tables potentially represent a *lot* of state. If userspace wants to temporarily turn off the kernel's MPX bounds table management, it does not necessarily want that state destroyed. On the other hand, if userspace feels the need to go destroying all the state, it is free to do so and does not need any help to do so from the kernel. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Dave Hansen wrote: On 09/12/2014 02:24 AM, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Dave Hansen wrote: Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. So what you are saying is, that if user space sets the pointer to NULL via the unregister prctl, kernel can safely ignore vmas which have the VM_MPX flag set. I really can't follow that logic. mmap_mpx(); prctl(enable mpx); do lots of crap which uses mpx; prctl(disable mpx); So after that point the previous use of MPX is irrelevant, just because we set a pointer to NULL? Does it just look like crap because I do not get the big picture how all of this is supposed to work? do_bounds() will happily map new BTs no matter whether the prctl was invoked or not. So what's the value of the prctl at all? The behavior as it stands is wrong. We should at least have the kernel refuse to map new BTs if the prctl() hasn't been issued. We'll fix it up. The mapping is flagged VM_MPX. Why is this not sufficient? The comment is confusing and only speaks to half of what the if() in question is doing. We'll get a better comment in there. But, for the sake of explaining it fully: There are two mappings in play: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... (never tagged VM_MPX) It's not tagged that way because it is mapped by user space. This is the directory, right? 2. The mapping for the bounds table *backing* the data (is tagged with VM_MPX) That's the stuff, which gets magically allocated from do_bounds(). And the reason you do that from the #BR is that user space would have to allocate a gazillion of bound tables to make sure that every corner case is covered. With the allocation from #BR you make that behaviour dynamic and you just provide an empty no bounds table to make the bound checker happy. The code ends up looking like this: vm_munmap() { do_unmap(vma); // #1 above if (mm-bd_addr !(vma-vm_flags VM_MPX)) // lookup the backing vma (#2 above) vm_munmap(vma2) } The bd_addr check is intended to say could the kernel have possibly created some VM_MPX vmas? As you noted above, we will happily go creating VM_MPX vmas without mm-bd_addr being set. That's will get fixed. The VM_MPX _flags_ check on the VMA is there simply to prevent recursion. vm_munmap() of the VM_MPX vma is called _under_ vm_munmap() of the data VMA, and we've got to ensure it doesn't recurse. *This* part of the if() in question is not addressed in the comment. That's something we can fix up in the next version. Ok, slowly I get the puzzle together :) Now, the question is whether this magic fragile fixup is the right thing to do in the context of unmap/brk. So if the directory is unmapped, you want to free the bounds tables which are referenced from the directory, i.e. those which you allocated in do_bounds(). So you call arch_unmap() at the very end of do_unmap(). This walks the directory to look at the entries and unmaps the bounds table which is referenced from the directory and then clears the directory entry. Now, I have a hard time to see how that is supposed to work. do_unmap() detach_vmas_to_be_unmapped() unmap_region() free_pgtables() arch_unmap() mpx_unmap() So at the point where you try to access the directory to gather the information about the entries which might be affected, that stuff is unmapped already and the page tables are gone. Brilliant idea, really. And if you run into the fault in mpx_unmap() you plan to delegate the fixup to a work queue. How is that thing going to find what belonged to the unmapped directory? Even if the stuff would be accessible at that point, it is a damned stupid idea to rely on anything userspace is providing to you. I learned that the hard way in futex.c The proper solution to this problem is: do_bounds() bd_addr = get_bd_addr_from_xsave(); bd_entry = bndstatus ADDR_MASK: bt = mpx_mmap(bd_addr, bd_entry, len); set_bt_entry_in_bd(bd_entry, bt); And in mpx_mmap() . vma = find_vma(); vma-bd_addr = bd_addr; vma-bd_entry = bd_entry; Now on mpx_unmap() for_each_vma() if (is_affected(vma-bd_addr, vma-bd_entry)) unmap(vma); That does not require a prctl, no fault handling in the unmap path, it just works and is robust by design because it does not rely on any user space crappola. You store the directory context at allocation time and free it when that context goes away. It's that simple, really. So you can still think about a prctl in order to enable/disable the automatic mapping stuff, but that's a completely different story. Thanks, tglx -- To unsubscribe from this list:
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Dave Hansen wrote: On 09/12/2014 01:11 AM, Thomas Gleixner wrote: So what you are saying is, that if user space sets the pointer to NULL via the unregister prctl, kernel can safely ignore vmas which have the VM_MPX flag set. I really can't follow that logic. mmap_mpx(); prctl(enable mpx); do lots of crap which uses mpx; prctl(disable mpx); So after that point the previous use of MPX is irrelevant, just because we set a pointer to NULL? Does it just look like crap because I do not get the big picture how all of this is supposed to work? The prctl(register) is meant to be a signal from userspace to the kernel to say, I would like your help in managing these bounds tables. prctl(unregister) is the opposite, meaning I don't want your help any more. Fine, but that's a totally different story. I can see the usefulness of this, but then it's a complete misnomer. It should be: prctl(EN/DISABLE_MPX_BT_MANAGEMENT) So this wants to be a boolean value and not some random user space address collected at some random point and then ignored until you do the magic cleanup. See the other reply. If userspace uses MPX, it does not necessarily want the kernel to do bounds table management all the time (or ever in some cases). Without the prctl(), the kernel has no way of distinguishing what userspace wants. Fine with me, but it needs to be done proper. And proper means: ON/OFF The kernel has to handle the information for which context it allocated stuff and then tear it down when the context goes away. Relying on a user space address sampled at some random prctl point is just stupid. And then you need another bunch of logic in the prctl(disable mpx) path to cleanup the mess instead of just setting a random pointer to NULL. The bounds tables potentially represent a *lot* of state. If userspace wants to temporarily turn off the kernel's MPX bounds table management, it does not necessarily want that state destroyed. On the other hand, if userspace feels the need to go destroying all the state, it is free to do so and does not need any help to do so from the kernel. Fine with me, but the above still stands. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Dave Hansen wrote: The proper solution to this problem is: do_bounds() bd_addr = get_bd_addr_from_xsave(); bd_entry = bndstatus ADDR_MASK: Just for clarification. You CANNOT avoid the xsave here because it's the only way to access BNDSTATUS according to the manual. The BNDCFGU and BNDSTATUS registers are accessible only with XSAVE/XRSTOR family of instructions So there is no point to cache BNDCFGU as you get it anyway when you need to retrieve the invalid BD entry. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 10:34 AM, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Dave Hansen wrote: There are two mappings in play: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... (never tagged VM_MPX) It's not tagged that way because it is mapped by user space. Correct. It is not tagged because it is mapped by user space. This is the directory, right? No. The untagged mapping in question here is for normal user data, like an mmap() or brk(), unrelated to MPX. The directory is a separate matter. It is also (currently) untagged with VM_MPX since it is also allocated by userspace. 2. The mapping for the bounds table *backing* the data (is tagged with VM_MPX) That's the stuff, which gets magically allocated from do_bounds(). And the reason you do that from the #BR is that user space would have to allocate a gazillion of bound tables to make sure that every corner case is covered. Yes. With the allocation from #BR you make that behaviour dynamic and you just provide an empty no bounds table to make the bound checker happy. Kinda. We do provide an empty table, but the first access will always be a write, so it doesn't stay empty for long. ... Now, I have a hard time to see how that is supposed to work. do_unmap() detach_vmas_to_be_unmapped() unmap_region() free_pgtables() arch_unmap() mpx_unmap() So at the point where you try to access the directory to gather the information about the entries which might be affected, that stuff is unmapped already and the page tables are gone. Brilliant idea, really. And if you run into the fault in mpx_unmap() you plan to delegate the fixup to a work queue. How is that thing going to find what belonged to the unmapped directory? The bounds directory is not being unmapped here. I _think_ I covered that above, but don't be shy if I'm not being clear. ;) Even if the stuff would be accessible at that point, it is a damned stupid idea to rely on anything userspace is providing to you. I learned that the hard way in futex.c The proper solution to this problem is: do_bounds() bd_addr = get_bd_addr_from_xsave(); bd_entry = bndstatus ADDR_MASK: bt = mpx_mmap(bd_addr, bd_entry, len); set_bt_entry_in_bd(bd_entry, bt); And in mpx_mmap() . vma = find_vma(); vma-bd_addr = bd_addr; vma-bd_entry = bd_entry; If the bounds directory moved around, this would make sense. Otherwise, it's a waste of space because all vmas in a given mm would have the exact same bd_addr, and we might as well just store it in mm-bd_something. Are you suggesting that we support moving the bounds directory around? Also, the bd_entry can be _calculated_ from vma-vm_start and the bd_addr. It seems a bit redundant to store it like this. Also this would add 16 bytes to the currently 184-byte VMA. That seems suboptimal to me. It would eat over a megabyte of memory on my *laptop* alone. Now on mpx_unmap() for_each_vma() if (is_affected(vma-bd_addr, vma-bd_entry)) unmap(vma); That does not require a prctl, no fault handling in the unmap path, it just works and is robust by design because it does not rely on any user space crappola. You store the directory context at allocation time and free it when that context goes away. It's that simple, really. If you are talking about the VM_MPX VMA that was allocated to hold the bounds table, this won't work. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space could now be allocated for some other (random) use, and the MPX hardware is now going to go trying to walk it as if it were a bounds table. That would be bad. Any unmapping of a bounds table has to be accompanied by a corresponding write to the bounds directory entry. That write to the bounds directory can fault. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 10:42 AM, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Dave Hansen wrote: The prctl(register) is meant to be a signal from userspace to the kernel to say, I would like your help in managing these bounds tables. prctl(unregister) is the opposite, meaning I don't want your help any more. Fine, but that's a totally different story. I can see the usefulness of this, but then it's a complete misnomer. It should be: prctl(EN/DISABLE_MPX_BT_MANAGEMENT) Agreed. Those are much better names. So this wants to be a boolean value and not some random user space address collected at some random point and then ignored until you do the magic cleanup. See the other reply. I know at this point you think the kernel can not or should not keep a copy of the bounds directory location around. I understand that. Bear with me for a moment, and please just assume for a moment that we need it. It's far from a random userspace address. When you make a syscall, we put the arguments in registers. The register we're putting it in here just happens to be used by the hardware. Right now, we do (ignoring the actual xsave/xrstr): bndcfgu = bnd_dir_ptr | ENABLE_BIT; prctl(ENABLE_MPX_BT_MANAGEMENT); // kernel grabs from xsave buf We could pass it explicitly in %rdi as a syscall argument and not have the prctl() code fetch it from the xsave buffer. I'm just not sure what this buys us: bndcfgu = bnd_dir_ptr | ENABLE_BIT; prctl(ENABLE_MPX_BT_MANAGEMENT, bndcfgu); Also, the random cleanup just happens to correspond with memory deallocation, which is something we want to go fast. I'd _prefer_ to keep xsaves out of the unmap path if possible. It's not a strict requirement, but it does seem prudent as an xsave eats a dozen or so cachelines. It's also not sampled. I can't imagine a situation where the register will change values during the execution of any sane program. It really is essentially fixed. It's probably one of the reasons it is so expensive to access: there's *no* reason to do it frequently. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/12/2014 11:42 AM, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Thomas Gleixner wrote: On Fri, 12 Sep 2014, Dave Hansen wrote: The proper solution to this problem is: do_bounds() bd_addr = get_bd_addr_from_xsave(); bd_entry = bndstatus ADDR_MASK: Just for clarification. You CANNOT avoid the xsave here because it's the only way to access BNDSTATUS according to the manual. The BNDCFGU and BNDSTATUS registers are accessible only with XSAVE/XRSTOR family of instructions So there is no point to cache BNDCFGU as you get it anyway when you need to retrieve the invalid BD entry. Agreed. It serves no purpose during a bounds fault. However, it does keep you from having to do an xsave during the bounds table free operations, like at unmap() time. That is actually a much more critical path than bounds faults. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-11, Hansen, Dave wrote: > On 09/11/2014 01:46 AM, Qiaowei Ren wrote: >> + >> +return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u & >> +MPX_BNDCFG_ADDR_MASK); >> +} > > I don't think casting a u64 to a ulong, then to a pointer is useful. > Just take the '(unsigned long)' out. If so, this will spits out a warning on 32-bit: arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir': arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/11/2014 04:28 PM, Thomas Gleixner wrote: > On Thu, 11 Sep 2014, Qiaowei Ren wrote: >> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() >> commands. These commands can be used to register and unregister MPX >> related resource on the x86 platform. > > I cant see anything which is registered/unregistered. This registers the location of the bounds directory with the kernel. >From the app's perspective, it says "I'm using MPX, and here is where I put the root data structure". Without this, the kernel would have to do an (expensive) xsave operation every time it wanted to see if MPX was in use. This also makes the user/kernel interaction more explicit. We would be in a world of hurt if userspace was allowed to move the bounds directory around. With this interface, it's a bit more obvious that userspace can't just move it around willy-nilly. >> The base of the bounds directory is set into mm_struct during >> PR_MPX_REGISTER command execution. This member can be used to >> check whether one application is mpx enabled. > > This changelog is completely useless. Yeah, it's pretty bare-bones. Let me know if the explanation above makes sense, and we'll get it updated. >> +/* >> + * This should only be called when cpuid has been checked >> + * and we are sure that MPX is available. > > Groan. Why can't you put that cpuid check into that function right > away instead of adding a worthless comment? Sounds reasonable to me. We should just move the cpuid check in to task_get_bounds_dir(). >> + */ >> +static __user void *task_get_bounds_dir(struct task_struct *tsk) >> +{ >> +struct xsave_struct *xsave_buf; >> + >> +fpu_xsave(>thread.fpu); >> +xsave_buf = &(tsk->thread.fpu.state->xsave); >> +if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) >> +return NULL; > > Now this might be understandable with a proper comment. Right now it's > a magic check for something uncomprehensible. It's a bit ugly to access, but it seems pretty blatantly obvious that this is a check for "Is the enable flag in a hardware register set?" Yes, the registers have names only a mother could love. But that is what they're really called. I guess we could add some comments about why we need to do the xsave. >> +int mpx_register(struct task_struct *tsk) >> +{ >> +struct mm_struct *mm = tsk->mm; >> + >> +if (!cpu_has_mpx) >> +return -EINVAL; >> + >> +/* >> + * runtime in the userspace will be responsible for allocation of >> + * the bounds directory. Then, it will save the base of the bounds >> + * directory into XSAVE/XRSTOR Save Area and enable MPX through >> + * XRSTOR instruction. >> + * >> + * fpu_xsave() is expected to be very expensive. In order to do >> + * performance optimization, here we get the base of the bounds >> + * directory and then save it into mm_struct to be used in future. >> + */ > > Ah. Now we get some information what this might do. But that does not > make any sense at all. > > So all it does is: > > tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; > > or: > > tsk->mm.bd_addr = NULL; > > So we use that information to check, whether we need to tear down a > VM_MPX flagged region with mpx_unmap(), right? Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. >> + /* >> + * Check whether this vma comes from MPX-enabled application. >> + * If so, release this vma related bound tables. >> + */ >> + if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) >> + mpx_unmap(mm, start, end); > > You really must be kidding. The application maps that table and never > calls that prctl so do_unmap() will happily ignore it? Yes. The only other way the kernel can possibly know that it needs to go tearing things down is with a potentially frequent and expensive xsave. Either we change mmap to say "this mmap() is for a bounds directory", or we have some other interface that says "the mmap() for the bounds directory is at $foo". We could also record the bounds directory the first time that we catch userspace using it. I'd rather have an explicit interface than an implicit one like that, though I don't feel that strongly about it. > The design to support this feature makes no sense at all to me. We > have a special mmap interface, some magic kernel side mapping > functionality and then on top of it a prctl telling the kernel to > ignore/respect it. That's a good point. We don't seem to have anything in the allocate_bt() side of things to tell the kernel to refuse to create things if the prctl() hasn't been called. That needs to get added. > All I have seen so far is the hint to read some intel feature > documentation, but no coherent explanation how this patch set makes > use of that very feature. The last patch in the series
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, 11 Sep 2014, Qiaowei Ren wrote: > This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() > commands. These commands can be used to register and unregister MPX > related resource on the x86 platform. I cant see anything which is registered/unregistered. > The base of the bounds directory is set into mm_struct during > PR_MPX_REGISTER command execution. This member can be used to > check whether one application is mpx enabled. This changelog is completely useless. What's the actual point of this prctl? > +/* > + * This should only be called when cpuid has been checked > + * and we are sure that MPX is available. Groan. Why can't you put that cpuid check into that function right away instead of adding a worthless comment? It's obviously more important to have a comment about somthing which is obvious than explaining what the function is actually doing, right? > + */ > +static __user void *task_get_bounds_dir(struct task_struct *tsk) > +{ > + struct xsave_struct *xsave_buf; > + > + fpu_xsave(>thread.fpu); > + xsave_buf = &(tsk->thread.fpu.state->xsave); > + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) > + return NULL; Now this might be understandable with a proper comment. Right now it's a magic check for something uncomprehensible. > + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u & > + MPX_BNDCFG_ADDR_MASK); > +} > + > +int mpx_register(struct task_struct *tsk) > +{ > + struct mm_struct *mm = tsk->mm; > + > + if (!cpu_has_mpx) > + return -EINVAL; > + > + /* > + * runtime in the userspace will be responsible for allocation of > + * the bounds directory. Then, it will save the base of the bounds > + * directory into XSAVE/XRSTOR Save Area and enable MPX through > + * XRSTOR instruction. > + * > + * fpu_xsave() is expected to be very expensive. In order to do > + * performance optimization, here we get the base of the bounds > + * directory and then save it into mm_struct to be used in future. > + */ Ah. Now we get some information what this might do. But that does not make any sense at all. So all it does is: tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; or: tsk->mm.bd_addr = NULL; So we use that information to check, whether we need to tear down a VM_MPX flagged region with mpx_unmap(), right? > + /* > + * Check whether this vma comes from MPX-enabled application. > + * If so, release this vma related bound tables. > + */ > + if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) > + mpx_unmap(mm, start, end); You really must be kidding. The application maps that table and never calls that prctl so do_unmap() will happily ignore it? The design to support this feature makes no sense at all to me. We have a special mmap interface, some magic kernel side mapping functionality and then on top of it a prctl telling the kernel to ignore/respect it. All I have seen so far is the hint to read some intel feature documentation, but no coherent explanation how this patch set makes use of that very feature. The last patch in the series does not count as coherent explanation. It merily documents parts of the implementation details which are required to make use of it but completely lacks of a coherent description how all of this is supposed to work. Despite the fact that this is V8, I can't suppress the feeling that this is just cobbled together to make it work somehow and we'll deal with the fallout later. I wouldn't be surprised if some of the fallout is going to be security related. I have a pretty good idea how to exploit it even without understanding the non-malicious intent of the whole thing. So: NAK to the whole series for now until someone comes up with a coherent explanation. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/11/2014 01:46 AM, Qiaowei Ren wrote: > + > + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u & > + MPX_BNDCFG_ADDR_MASK); > +} I don't think casting a u64 to a ulong, then to a pointer is useful. Just take the '(unsigned long)' out. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On Thu, 11 Sep 2014, Qiaowei Ren wrote: This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. I cant see anything which is registered/unregistered. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. This changelog is completely useless. What's the actual point of this prctl? +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. Groan. Why can't you put that cpuid check into that function right away instead of adding a worthless comment? It's obviously more important to have a comment about somthing which is obvious than explaining what the function is actually doing, right? + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(tsk-thread.fpu); + xsave_buf = (tsk-thread.fpu.state-xsave); + if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) + return NULL; Now this might be understandable with a proper comment. Right now it's a magic check for something uncomprehensible. + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ Ah. Now we get some information what this might do. But that does not make any sense at all. So all it does is: tsk-mm.bd_addr = xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ADDR_MASK; or: tsk-mm.bd_addr = NULL; So we use that information to check, whether we need to tear down a VM_MPX flagged region with mpx_unmap(), right? + /* + * Check whether this vma comes from MPX-enabled application. + * If so, release this vma related bound tables. + */ + if (mm-bd_addr !(vma-vm_flags VM_MPX)) + mpx_unmap(mm, start, end); You really must be kidding. The application maps that table and never calls that prctl so do_unmap() will happily ignore it? The design to support this feature makes no sense at all to me. We have a special mmap interface, some magic kernel side mapping functionality and then on top of it a prctl telling the kernel to ignore/respect it. All I have seen so far is the hint to read some intel feature documentation, but no coherent explanation how this patch set makes use of that very feature. The last patch in the series does not count as coherent explanation. It merily documents parts of the implementation details which are required to make use of it but completely lacks of a coherent description how all of this is supposed to work. Despite the fact that this is V8, I can't suppress the feeling that this is just cobbled together to make it work somehow and we'll deal with the fallout later. I wouldn't be surprised if some of the fallout is going to be security related. I have a pretty good idea how to exploit it even without understanding the non-malicious intent of the whole thing. So: NAK to the whole series for now until someone comes up with a coherent explanation. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/11/2014 04:28 PM, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Qiaowei Ren wrote: This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. I cant see anything which is registered/unregistered. This registers the location of the bounds directory with the kernel. From the app's perspective, it says I'm using MPX, and here is where I put the root data structure. Without this, the kernel would have to do an (expensive) xsave operation every time it wanted to see if MPX was in use. This also makes the user/kernel interaction more explicit. We would be in a world of hurt if userspace was allowed to move the bounds directory around. With this interface, it's a bit more obvious that userspace can't just move it around willy-nilly. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. This changelog is completely useless. Yeah, it's pretty bare-bones. Let me know if the explanation above makes sense, and we'll get it updated. +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. Groan. Why can't you put that cpuid check into that function right away instead of adding a worthless comment? Sounds reasonable to me. We should just move the cpuid check in to task_get_bounds_dir(). + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ +struct xsave_struct *xsave_buf; + +fpu_xsave(tsk-thread.fpu); +xsave_buf = (tsk-thread.fpu.state-xsave); +if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) +return NULL; Now this might be understandable with a proper comment. Right now it's a magic check for something uncomprehensible. It's a bit ugly to access, but it seems pretty blatantly obvious that this is a check for Is the enable flag in a hardware register set? Yes, the registers have names only a mother could love. But that is what they're really called. I guess we could add some comments about why we need to do the xsave. +int mpx_register(struct task_struct *tsk) +{ +struct mm_struct *mm = tsk-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +/* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ Ah. Now we get some information what this might do. But that does not make any sense at all. So all it does is: tsk-mm.bd_addr = xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ADDR_MASK; or: tsk-mm.bd_addr = NULL; So we use that information to check, whether we need to tear down a VM_MPX flagged region with mpx_unmap(), right? Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. + /* + * Check whether this vma comes from MPX-enabled application. + * If so, release this vma related bound tables. + */ + if (mm-bd_addr !(vma-vm_flags VM_MPX)) + mpx_unmap(mm, start, end); You really must be kidding. The application maps that table and never calls that prctl so do_unmap() will happily ignore it? Yes. The only other way the kernel can possibly know that it needs to go tearing things down is with a potentially frequent and expensive xsave. Either we change mmap to say this mmap() is for a bounds directory, or we have some other interface that says the mmap() for the bounds directory is at $foo. We could also record the bounds directory the first time that we catch userspace using it. I'd rather have an explicit interface than an implicit one like that, though I don't feel that strongly about it. The design to support this feature makes no sense at all to me. We have a special mmap interface, some magic kernel side mapping functionality and then on top of it a prctl telling the kernel to ignore/respect it. That's a good point. We don't seem to have anything in the allocate_bt() side of things to tell the kernel to refuse to create things if the prctl() hasn't been called. That needs to get added. All I have seen so far is the hint to read some intel feature documentation, but no coherent explanation how this patch set makes use of that very feature. The last patch in the series does not count as coherent explanation. It merily documents parts of the implementation details which are required to make use of it but
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-11, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: + +return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u +MPX_BNDCFG_ADDR_MASK); +} I don't think casting a u64 to a ulong, then to a pointer is useful. Just take the '(unsigned long)' out. If so, this will spits out a warning on 32-bit: arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir': arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/11/2014 01:46 AM, Qiaowei Ren wrote: + + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u + MPX_BNDCFG_ADDR_MASK); +} I don't think casting a u64 to a ulong, then to a pointer is useful. Just take the '(unsigned long)' out. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/