subject:"RE\: \[PATCH v8 08\/10\] x86, mpx\: add prctl commands PR_MPX_REGISTER, PR_MPX

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-18 Thread Dave Hansen

On 09/18/2014 12:17 AM, Kevin Easton wrote:
> I was assuming that if an application did want to enable MPX after threads
> had already been created, it would generally want to enable it
> simultaneously across all threads.  This would be a lot easier for the
> kernel than for the application.

The current gcc setup mechanism would set up MPX even before main().  So
I think it's pretty unlikely that help is needed to coordinate threads.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-18 Thread Kevin Easton

On Wed, Sep 17, 2014 at 09:43:09PM -0700, Dave Hansen wrote:
> On 09/17/2014 08:23 PM, Kevin Easton wrote:
> > I was actually thinking that the kernel would take care of the xsave / 
> > xrstor (for current), updating tsk->thread.fpu.state (for non-running
> > threads) and sending an IPI for threads running on other CPUs.
> > 
> > Of course userspace can always then manually change the bounds directory
> > address itself, but then it's quite clear that they're doing something
> > unsupported.  Just an idea, anyway.
> 
> What's the benefit of that?
> 
> As it stands now, MPX is likely to be enabled well before any threads
> are created, and the MPX enabling state will be inherited by the new
> thread at clone() time.  The current mechanism allows a thread to
> individually enable or disable MPX independently of the other threads.
> 
> I think it makes it both more complicated and less flexible.

I was assuming that if an application did want to enable MPX after threads
had already been created, it would generally want to enable it
simultaneously across all threads.  This would be a lot easier for the
kernel than for the application.

- Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-18 Thread Kevin Easton

On Wed, Sep 17, 2014 at 09:43:09PM -0700, Dave Hansen wrote:
 On 09/17/2014 08:23 PM, Kevin Easton wrote:
  I was actually thinking that the kernel would take care of the xsave / 
  xrstor (for current), updating tsk-thread.fpu.state (for non-running
  threads) and sending an IPI for threads running on other CPUs.
  
  Of course userspace can always then manually change the bounds directory
  address itself, but then it's quite clear that they're doing something
  unsupported.  Just an idea, anyway.
 
 What's the benefit of that?
 
 As it stands now, MPX is likely to be enabled well before any threads
 are created, and the MPX enabling state will be inherited by the new
 thread at clone() time.  The current mechanism allows a thread to
 individually enable or disable MPX independently of the other threads.
 
 I think it makes it both more complicated and less flexible.

I was assuming that if an application did want to enable MPX after threads
had already been created, it would generally want to enable it
simultaneously across all threads.  This would be a lot easier for the
kernel than for the application.

- Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-18 Thread Dave Hansen

On 09/18/2014 12:17 AM, Kevin Easton wrote:
 I was assuming that if an application did want to enable MPX after threads
 had already been created, it would generally want to enable it
 simultaneously across all threads.  This would be a lot easier for the
 kernel than for the application.

The current gcc setup mechanism would set up MPX even before main().  So
I think it's pretty unlikely that help is needed to coordinate threads.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Dave Hansen

On 09/17/2014 08:23 PM, Kevin Easton wrote:
> I was actually thinking that the kernel would take care of the xsave / 
> xrstor (for current), updating tsk->thread.fpu.state (for non-running
> threads) and sending an IPI for threads running on other CPUs.
> 
> Of course userspace can always then manually change the bounds directory
> address itself, but then it's quite clear that they're doing something
> unsupported.  Just an idea, anyway.

What's the benefit of that?

As it stands now, MPX is likely to be enabled well before any threads
are created, and the MPX enabling state will be inherited by the new
thread at clone() time.  The current mechanism allows a thread to
individually enable or disable MPX independently of the other threads.

I think it makes it both more complicated and less flexible.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Ren, Qiaowei



On 2014-09-18, Kevin Easton wrote:
> On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote:
>>> Would it be prudent to use an error code other than EINVAL for the
>>> "hardware doesn't support it" case?
>>> 
>> Seems like no specific error code for this case.
> 
> ENXIO would probably be OK.  It's not too important as long as it's
> documented.
> 
Yes. Looks like that ENXIO will be better.

Thanks,
Qiaowei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Kevin Easton

On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote:
> > Would it be prudent to use an error code other than EINVAL for the
> > "hardware doesn't support it" case?
> >
> Seems like no specific error code for this case.

ENXIO would probably be OK.  It's not too important as long as it's
documented.

> 
> >> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> > long, arg2, unsigned long, arg3,
> >>me->mm->def_flags &= ~VM_NOHUGEPAGE;
> >>up_write(>mm->mmap_sem);
> >>break;
> >> +  case PR_MPX_REGISTER:
> >> +  error = MPX_REGISTER(me);
> >> +  break;
> >> +  case PR_MPX_UNREGISTER:
> >> +  error = MPX_UNREGISTER(me);
> >> +  break;
> > 
> > If you pass me->mm from prctl, that makes it clear that it's
> > per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
> > 
> > This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
> > it's not using them, otherwise you'll be sunk if you ever want to use them 
> > later.
> > 
> > It seems like it only makes sense for all threads using the mm to have
> > the same bounds directory set.  If the interface was changed to
> > directly pass the address, then could the kernel take care of setting
> > it for *all* of the threads in the process? This seems like something
> > that would be easier for the kernel to do than userspace.
> > 
> If the interface was changed to this, it will be possible for insane 
> application to pass error bounds directory address to kernel. We still 
> have to call fpu_xsave() to check this.

I was actually thinking that the kernel would take care of the xsave / 
xrstor (for current), updating tsk->thread.fpu.state (for non-running
threads) and sending an IPI for threads running on other CPUs.

Of course userspace can always then manually change the bounds directory
address itself, but then it's quite clear that they're doing something
unsupported.  Just an idea, anyway.

- Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Ren, Qiaowei



On 2014-09-16, Kevin Easton wrote:
> On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:
>> +
>> +int mpx_register(struct task_struct *tsk) {
>> +struct mm_struct *mm = tsk->mm;
>> +
>> +if (!cpu_has_mpx)
>> +return -EINVAL;
>> +
>> +/*
>> + * runtime in the userspace will be responsible for allocation of
>> + * the bounds directory. Then, it will save the base of the bounds
>> + * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> + * XRSTOR instruction.
>> + *
>> + * fpu_xsave() is expected to be very expensive. In order to do
>> + * performance optimization, here we get the base of the bounds
>> + * directory and then save it into mm_struct to be used in future.
>> + */
>> +mm->bd_addr = task_get_bounds_dir(tsk);
>> +if (!mm->bd_addr)
>> +return -EINVAL;
>> +
>> +return 0;
>> +}
>> +
>> +int mpx_unregister(struct task_struct *tsk) {
>> +struct mm_struct *mm = current->mm;
>> +
>> +if (!cpu_has_mpx)
>> +return -EINVAL;
>> +
>> +mm->bd_addr = NULL;
>> +return 0;
>> +}
> 
> If that's changed, then mpx_register() and mpx_unregister() don't need
> a task_struct, just an mm_struct.
> 
Yes. An mm_struct is enough.

> Probably these functions should be locking mmap_sem.
> 
> Would it be prudent to use an error code other than EINVAL for the
> "hardware doesn't support it" case?
>
Seems like no specific error code for this case.

>> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> long, arg2, unsigned long, arg3,
>>  me->mm->def_flags &= ~VM_NOHUGEPAGE;
>>  up_write(>mm->mmap_sem);
>>  break;
>> +case PR_MPX_REGISTER:
>> +error = MPX_REGISTER(me);
>> +break;
>> +case PR_MPX_UNREGISTER:
>> +error = MPX_UNREGISTER(me);
>> +break;
> 
> If you pass me->mm from prctl, that makes it clear that it's
> per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
> 
> This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
> it's not using them, otherwise you'll be sunk if you ever want to use them 
> later.
> 
> It seems like it only makes sense for all threads using the mm to have
> the same bounds directory set.  If the interface was changed to
> directly pass the address, then could the kernel take care of setting
> it for *all* of the threads in the process? This seems like something
> that would be easier for the kernel to do than userspace.
> 
If the interface was changed to this, it will be possible for insane 
application to pass error bounds directory address to kernel. We still have to 
call fpu_xsave() to check this.

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Ren, Qiaowei



On 2014-09-16, Kevin Easton wrote:
 On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:
 +
 +int mpx_register(struct task_struct *tsk) {
 +struct mm_struct *mm = tsk-mm;
 +
 +if (!cpu_has_mpx)
 +return -EINVAL;
 +
 +/*
 + * runtime in the userspace will be responsible for allocation of
 + * the bounds directory. Then, it will save the base of the bounds
 + * directory into XSAVE/XRSTOR Save Area and enable MPX through
 + * XRSTOR instruction.
 + *
 + * fpu_xsave() is expected to be very expensive. In order to do
 + * performance optimization, here we get the base of the bounds
 + * directory and then save it into mm_struct to be used in future.
 + */
 +mm-bd_addr = task_get_bounds_dir(tsk);
 +if (!mm-bd_addr)
 +return -EINVAL;
 +
 +return 0;
 +}
 +
 +int mpx_unregister(struct task_struct *tsk) {
 +struct mm_struct *mm = current-mm;
 +
 +if (!cpu_has_mpx)
 +return -EINVAL;
 +
 +mm-bd_addr = NULL;
 +return 0;
 +}
 
 If that's changed, then mpx_register() and mpx_unregister() don't need
 a task_struct, just an mm_struct.
 
Yes. An mm_struct is enough.

 Probably these functions should be locking mmap_sem.
 
 Would it be prudent to use an error code other than EINVAL for the
 hardware doesn't support it case?

Seems like no specific error code for this case.

 @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
 long, arg2, unsigned long, arg3,
  me-mm-def_flags = ~VM_NOHUGEPAGE;
  up_write(me-mm-mmap_sem);
  break;
 +case PR_MPX_REGISTER:
 +error = MPX_REGISTER(me);
 +break;
 +case PR_MPX_UNREGISTER:
 +error = MPX_UNREGISTER(me);
 +break;
 
 If you pass me-mm from prctl, that makes it clear that it's
 per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
 
 This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
 it's not using them, otherwise you'll be sunk if you ever want to use them 
 later.
 
 It seems like it only makes sense for all threads using the mm to have
 the same bounds directory set.  If the interface was changed to
 directly pass the address, then could the kernel take care of setting
 it for *all* of the threads in the process? This seems like something
 that would be easier for the kernel to do than userspace.
 
If the interface was changed to this, it will be possible for insane 
application to pass error bounds directory address to kernel. We still have to 
call fpu_xsave() to check this.

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Kevin Easton

On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote:
  Would it be prudent to use an error code other than EINVAL for the
  hardware doesn't support it case?
 
 Seems like no specific error code for this case.

ENXIO would probably be OK.  It's not too important as long as it's
documented.

 
  @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
  long, arg2, unsigned long, arg3,
 me-mm-def_flags = ~VM_NOHUGEPAGE;
 up_write(me-mm-mmap_sem);
 break;
  +  case PR_MPX_REGISTER:
  +  error = MPX_REGISTER(me);
  +  break;
  +  case PR_MPX_UNREGISTER:
  +  error = MPX_UNREGISTER(me);
  +  break;
  
  If you pass me-mm from prctl, that makes it clear that it's
  per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
  
  This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
  it's not using them, otherwise you'll be sunk if you ever want to use them 
  later.
  
  It seems like it only makes sense for all threads using the mm to have
  the same bounds directory set.  If the interface was changed to
  directly pass the address, then could the kernel take care of setting
  it for *all* of the threads in the process? This seems like something
  that would be easier for the kernel to do than userspace.
  
 If the interface was changed to this, it will be possible for insane 
 application to pass error bounds directory address to kernel. We still 
 have to call fpu_xsave() to check this.

I was actually thinking that the kernel would take care of the xsave / 
xrstor (for current), updating tsk-thread.fpu.state (for non-running
threads) and sending an IPI for threads running on other CPUs.

Of course userspace can always then manually change the bounds directory
address itself, but then it's quite clear that they're doing something
unsupported.  Just an idea, anyway.

- Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Ren, Qiaowei



On 2014-09-18, Kevin Easton wrote:
 On Thu, Sep 18, 2014 at 12:40:29AM +, Ren, Qiaowei wrote:
 Would it be prudent to use an error code other than EINVAL for the
 hardware doesn't support it case?
 
 Seems like no specific error code for this case.
 
 ENXIO would probably be OK.  It's not too important as long as it's
 documented.
 
Yes. Looks like that ENXIO will be better.

Thanks,
Qiaowei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-17 Thread Dave Hansen

On 09/17/2014 08:23 PM, Kevin Easton wrote:
 I was actually thinking that the kernel would take care of the xsave / 
 xrstor (for current), updating tsk-thread.fpu.state (for non-running
 threads) and sending an IPI for threads running on other CPUs.
 
 Of course userspace can always then manually change the bounds directory
 address itself, but then it's quite clear that they're doing something
 unsupported.  Just an idea, anyway.

What's the benefit of that?

As it stands now, MPX is likely to be enabled well before any threads
are created, and the MPX enabling state will be inherited by the new
thread at clone() time.  The current mechanism allows a thread to
individually enable or disable MPX independently of the other threads.

I think it makes it both more complicated and less flexible.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-16 Thread Kevin Easton

On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:

> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> +{
> + struct xsave_struct *xsave_buf;
> +
> + fpu_xsave(>thread.fpu);
> + xsave_buf = &(tsk->thread.fpu.state->xsave);
> + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> + return NULL;
> +
> + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> + MPX_BNDCFG_ADDR_MASK);
> +}

This only makes sense if called with 'current', so is there any need
for the function argument?

> +
> +int mpx_register(struct task_struct *tsk)
> +{
> + struct mm_struct *mm = tsk->mm;
> +
> + if (!cpu_has_mpx)
> + return -EINVAL;
> +
> + /*
> +  * runtime in the userspace will be responsible for allocation of
> +  * the bounds directory. Then, it will save the base of the bounds
> +  * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +  * XRSTOR instruction.
> +  *
> +  * fpu_xsave() is expected to be very expensive. In order to do
> +  * performance optimization, here we get the base of the bounds
> +  * directory and then save it into mm_struct to be used in future.
> +  */
> + mm->bd_addr = task_get_bounds_dir(tsk);
> + if (!mm->bd_addr)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +int mpx_unregister(struct task_struct *tsk)
> +{
> + struct mm_struct *mm = current->mm;
> +
> + if (!cpu_has_mpx)
> + return -EINVAL;
> +
> + mm->bd_addr = NULL;
> + return 0;
> +}

If that's changed, then mpx_register() and mpx_unregister() don't need
a task_struct, just an mm_struct.

Probably these functions should be locking mmap_sem.

Would it be prudent to use an error code other than EINVAL for the 
"hardware doesn't support it" case?

> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
> arg2, unsigned long, arg3,
>   me->mm->def_flags &= ~VM_NOHUGEPAGE;
>   up_write(>mm->mmap_sem);
>   break;
> + case PR_MPX_REGISTER:
> + error = MPX_REGISTER(me);
> + break;
> + case PR_MPX_UNREGISTER:
> + error = MPX_UNREGISTER(me);
> + break;

If you pass me->mm from prctl, that makes it clear that it's per-process
not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.

This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's
not using them, otherwise you'll be sunk if you ever want to use them later.

It seems like it only makes sense for all threads using the mm to have the
same bounds directory set.  If the interface was changed to directly pass
the address, then could the kernel take care of setting it for *all* of
the threads in the process? This seems like something that would be easier
for the kernel to do than userspace.

- Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-16 Thread Kevin Easton

On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:

 +static __user void *task_get_bounds_dir(struct task_struct *tsk)
 +{
 + struct xsave_struct *xsave_buf;
 +
 + fpu_xsave(tsk-thread.fpu);
 + xsave_buf = (tsk-thread.fpu.state-xsave);
 + if (!(xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ENABLE_FLAG))
 + return NULL;
 +
 + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u 
 + MPX_BNDCFG_ADDR_MASK);
 +}

This only makes sense if called with 'current', so is there any need
for the function argument?

 +
 +int mpx_register(struct task_struct *tsk)
 +{
 + struct mm_struct *mm = tsk-mm;
 +
 + if (!cpu_has_mpx)
 + return -EINVAL;
 +
 + /*
 +  * runtime in the userspace will be responsible for allocation of
 +  * the bounds directory. Then, it will save the base of the bounds
 +  * directory into XSAVE/XRSTOR Save Area and enable MPX through
 +  * XRSTOR instruction.
 +  *
 +  * fpu_xsave() is expected to be very expensive. In order to do
 +  * performance optimization, here we get the base of the bounds
 +  * directory and then save it into mm_struct to be used in future.
 +  */
 + mm-bd_addr = task_get_bounds_dir(tsk);
 + if (!mm-bd_addr)
 + return -EINVAL;
 +
 + return 0;
 +}
 +
 +int mpx_unregister(struct task_struct *tsk)
 +{
 + struct mm_struct *mm = current-mm;
 +
 + if (!cpu_has_mpx)
 + return -EINVAL;
 +
 + mm-bd_addr = NULL;
 + return 0;
 +}

If that's changed, then mpx_register() and mpx_unregister() don't need
a task_struct, just an mm_struct.

Probably these functions should be locking mmap_sem.

Would it be prudent to use an error code other than EINVAL for the 
hardware doesn't support it case?

 @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
 arg2, unsigned long, arg3,
   me-mm-def_flags = ~VM_NOHUGEPAGE;
   up_write(me-mm-mmap_sem);
   break;
 + case PR_MPX_REGISTER:
 + error = MPX_REGISTER(me);
 + break;
 + case PR_MPX_UNREGISTER:
 + error = MPX_UNREGISTER(me);
 + break;

If you pass me-mm from prctl, that makes it clear that it's per-process
not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.

This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's
not using them, otherwise you'll be sunk if you ever want to use them later.

It seems like it only makes sense for all threads using the mm to have the
same bounds directory set.  If the interface was changed to directly pass
the address, then could the kernel take care of setting it for *all* of
the threads in the process? This seems like something that would be easier
for the kernel to do than userspace.

- Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-15 Thread Dave Hansen

On 09/15/2014 08:20 PM, Ren, Qiaowei wrote:
>> What are the semantics across execve() ?
>> 
> This will not impact on the semantics of execve(). One runtime
> library
> for MPX will be provided (or merged into Glibc), and when the
> application starts, this runtime will be called to initialize MPX
> runtime environment, including calling prctl() to notify the kernel to
> start managing the bounds directories. You can see the discussion
> about exec(): https://lkml.org/lkml/2014/1/26/199

I think he's asking what happens to the kernel value at execve() time.

The short answer is that it is zero'd along with the rest of a new mm.
It probably _shouldn't_ be, though.  It's actually valid to have a bound
directory at 0x0.  We probably need to initialize it to -1 instead, and
that means initializing to -1 at execve() time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-15 Thread Ren, Qiaowei



On 2014-09-15, One Thousand Gnomes wrote:
>> The base of the bounds directory is set into mm_struct during
>> PR_MPX_REGISTER command execution. This member can be used to check
>> whether one application is mpx enabled.
> 
> Not really because by the time you ask the question another thread
> might have decided to unregister it.
> 
> 
>> +int mpx_register(struct task_struct *tsk) {
>> +struct mm_struct *mm = tsk->mm;
>> +
>> +if (!cpu_has_mpx)
>> +return -EINVAL;
>> +
>> +/*
>> + * runtime in the userspace will be responsible for allocation of
>> + * the bounds directory. Then, it will save the base of the bounds
>> + * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> + * XRSTOR instruction.
>> + *
>> + * fpu_xsave() is expected to be very expensive. In order to do
>> + * performance optimization, here we get the base of the bounds
>> + * directory and then save it into mm_struct to be used in future.
>> + */
>> +mm->bd_addr = task_get_bounds_dir(tsk);
>> +if (!mm->bd_addr)
>> +return -EINVAL;
> 
> What stops two threads calling this in parallel ?
>> +
>> +return 0;
>> +}
>> +
>> +int mpx_unregister(struct task_struct *tsk) {
>> +struct mm_struct *mm = current->mm;
>> +
>> +if (!cpu_has_mpx)
>> +return -EINVAL;
>> +
>> +mm->bd_addr = NULL;
> 
> or indeed calling this in parallel
> 
> What are the semantics across execve() ?
> 
This will not impact on the semantics of execve(). One runtime library for MPX 
will be provided (or merged into Glibc), and when the application starts, this 
runtime will be called to initialize MPX runtime environment, including calling 
prctl() to notify the kernel to start managing the bounds directories. You can 
see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 

It would be extremely unusual for an application to have some MPX and some 
non-MPX threads, since they would share the same address space and the non-MPX 
threads would mess up the bounds. That is to say, it looks like be unusual for 
one of these threads to call prctl() to enable or disable MPX. I guess we need 
to add some rules into documentation.

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-15 Thread Ren, Qiaowei



On 2014-09-15, One Thousand Gnomes wrote:
 The base of the bounds directory is set into mm_struct during
 PR_MPX_REGISTER command execution. This member can be used to check
 whether one application is mpx enabled.
 
 Not really because by the time you ask the question another thread
 might have decided to unregister it.
 
 
 +int mpx_register(struct task_struct *tsk) {
 +struct mm_struct *mm = tsk-mm;
 +
 +if (!cpu_has_mpx)
 +return -EINVAL;
 +
 +/*
 + * runtime in the userspace will be responsible for allocation of
 + * the bounds directory. Then, it will save the base of the bounds
 + * directory into XSAVE/XRSTOR Save Area and enable MPX through
 + * XRSTOR instruction.
 + *
 + * fpu_xsave() is expected to be very expensive. In order to do
 + * performance optimization, here we get the base of the bounds
 + * directory and then save it into mm_struct to be used in future.
 + */
 +mm-bd_addr = task_get_bounds_dir(tsk);
 +if (!mm-bd_addr)
 +return -EINVAL;
 
 What stops two threads calling this in parallel ?
 +
 +return 0;
 +}
 +
 +int mpx_unregister(struct task_struct *tsk) {
 +struct mm_struct *mm = current-mm;
 +
 +if (!cpu_has_mpx)
 +return -EINVAL;
 +
 +mm-bd_addr = NULL;
 
 or indeed calling this in parallel
 
 What are the semantics across execve() ?
 
This will not impact on the semantics of execve(). One runtime library for MPX 
will be provided (or merged into Glibc), and when the application starts, this 
runtime will be called to initialize MPX runtime environment, including calling 
prctl() to notify the kernel to start managing the bounds directories. You can 
see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 

It would be extremely unusual for an application to have some MPX and some 
non-MPX threads, since they would share the same address space and the non-MPX 
threads would mess up the bounds. That is to say, it looks like be unusual for 
one of these threads to call prctl() to enable or disable MPX. I guess we need 
to add some rules into documentation.

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-15 Thread Dave Hansen

On 09/15/2014 08:20 PM, Ren, Qiaowei wrote:
 What are the semantics across execve() ?
 
 This will not impact on the semantics of execve(). One runtime
 library
 for MPX will be provided (or merged into Glibc), and when the
 application starts, this runtime will be called to initialize MPX
 runtime environment, including calling prctl() to notify the kernel to
 start managing the bounds directories. You can see the discussion
 about exec(): https://lkml.org/lkml/2014/1/26/199

I think he's asking what happens to the kernel value at execve() time.

The short answer is that it is zero'd along with the rest of a new mm.
It probably _shouldn't_ be, though.  It's actually valid to have a bound
directory at 0x0.  We probably need to initialize it to -1 instead, and
that means initializing to -1 at execve() time.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-14 Thread One Thousand Gnomes

> The base of the bounds directory is set into mm_struct during
> PR_MPX_REGISTER command execution. This member can be used to
> check whether one application is mpx enabled.

Not really because by the time you ask the question another thread might
have decided to unregister it.


> +int mpx_register(struct task_struct *tsk)
> +{
> + struct mm_struct *mm = tsk->mm;
> +
> + if (!cpu_has_mpx)
> + return -EINVAL;
> +
> + /*
> +  * runtime in the userspace will be responsible for allocation of
> +  * the bounds directory. Then, it will save the base of the bounds
> +  * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +  * XRSTOR instruction.
> +  *
> +  * fpu_xsave() is expected to be very expensive. In order to do
> +  * performance optimization, here we get the base of the bounds
> +  * directory and then save it into mm_struct to be used in future.
> +  */
> + mm->bd_addr = task_get_bounds_dir(tsk);
> + if (!mm->bd_addr)
> + return -EINVAL;

What stops two threads calling this in parallel ?
> +
> + return 0;
> +}
> +
> +int mpx_unregister(struct task_struct *tsk)
> +{
> + struct mm_struct *mm = current->mm;
> +
> + if (!cpu_has_mpx)
> + return -EINVAL;
> +
> + mm->bd_addr = NULL;

or indeed calling this in parallel

What are the semantics across execve() ?

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-14 Thread One Thousand Gnomes

 The base of the bounds directory is set into mm_struct during
 PR_MPX_REGISTER command execution. This member can be used to
 check whether one application is mpx enabled.

Not really because by the time you ask the question another thread might
have decided to unregister it.


 +int mpx_register(struct task_struct *tsk)
 +{
 + struct mm_struct *mm = tsk-mm;
 +
 + if (!cpu_has_mpx)
 + return -EINVAL;
 +
 + /*
 +  * runtime in the userspace will be responsible for allocation of
 +  * the bounds directory. Then, it will save the base of the bounds
 +  * directory into XSAVE/XRSTOR Save Area and enable MPX through
 +  * XRSTOR instruction.
 +  *
 +  * fpu_xsave() is expected to be very expensive. In order to do
 +  * performance optimization, here we get the base of the bounds
 +  * directory and then save it into mm_struct to be used in future.
 +  */
 + mm-bd_addr = task_get_bounds_dir(tsk);
 + if (!mm-bd_addr)
 + return -EINVAL;

What stops two threads calling this in parallel ?
 +
 + return 0;
 +}
 +
 +int mpx_unregister(struct task_struct *tsk)
 +{
 + struct mm_struct *mm = current-mm;
 +
 + if (!cpu_has_mpx)
 + return -EINVAL;
 +
 + mm-bd_addr = NULL;

or indeed calling this in parallel

What are the semantics across execve() ?

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-13 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
> > On Fri, 12 Sep 2014, Dave Hansen wrote:
> >> There are two mappings in play:
> >> 1. The mapping with the actual data, which userspace is munmap()ing or
> >>brk()ing away, etc... (never tagged VM_MPX)
> > 
> > It's not tagged that way because it is mapped by user space.
> 
> Correct.  It is not tagged because it is mapped by user space.
> 
> > This is the directory, right?
> 
> No.  The untagged mapping in question here is for normal user data, like
> an mmap() or brk(), unrelated to MPX.

Ok. That makes sense.

> The directory is a separate matter.  It is also (currently) untagged
> with VM_MPX since it is also allocated by userspace.

So if that gets unmapped my observation holds. You still try to access
the directory, take the fault, queue work and in the work you dont
know how to handle it either.

So if the unmapped region affects bd_addr then we should just release
the affected BT mappings, i.e. all vmas flagged with VMA_MPX.

> > With the allocation from #BR you make that behaviour
> > dynamic and you just provide an empty "no bounds" table to make the
> > bound checker happy.
> 
> Kinda.  We do provide an empty table, but the first access will always
> be a write, so it doesn't stay empty for long.

So this comes from adding an entry to a not yet mapped table not from
an actual bound check? I still need to digest the details in the
manual.

> The bounds directory is not being unmapped here.  I _think_ I covered
> that above, but don't be shy if I'm not being clear. ;)

Fair enough. My confusion.

> If the bounds directory moved around, this would make sense.  Otherwise,
> it's a waste of space because all vmas in a given mm would have the
> exact same bd_addr, and we might as well just store it in mm->bd_something.

Ok. But we really want to do some sanity checking on all of this.

> Are you suggesting that we support moving the bounds directory around?

No, but the stupid thing CAN move around and we want to think about it
now instead of figuring out what to do about it later.

So if we go and store bd_addr with the prctl then you can do in the
#BR "Invalid BD entry":

bd_addr = xsave->xsave_buf->bndcsr.cfg_reg_u;

/*
 * Catch the case that this is not enabled, i.e. mm->bd_addr == 0,
 * and the case that stupid user space moved the directory
 * around.
 */
if (mm->bd_addr != bd_addr) {
   Yell and whack stupid app over the head;
}

> Also, the bd_entry can be _calculated_ from vma->vm_start and the
> bd_addr.  It seems a bit redundant to store it like this.

Fair enough.

> If you are talking about the VM_MPX VMA that was allocated to hold the
> bounds table, this won't work.

Sorry yes, that only works for unmapping the bound directory itself.

> Once we unmap the bounds table, we would have a bounds directory entry
> pointing at empty address space.  That address space could now be
> allocated for some other (random) use, and the MPX hardware is now going
> to go trying to walk it as if it were a bounds table.  That would be bad.
> 
> Any unmapping of a bounds table has to be accompanied by a corresponding
> write to the bounds directory entry.  That write to the bounds directory
> can fault.

So if it fails you need to keep the bound table around until you can
handle that somewhere else, i.e. outside of the mmap sem held
region. That's what you are planning to do with the work queue thing.

Now I'm asking myself, whether we are forced to do that from the end
of do_unmap() rather than doing it from the call site outside of the
mmap_sem held region. I can see that adding arch_unmap() to do_unmap()
is a very simple solution, but it comes with the price of dealing with
faults inside of the mmap_sem held region.

It might be worthwhile to think about the following:

   down_write(mmap_sem);

   do_stuff()
 do_munmap(mm, start, len)
...
arch_munmap(mm, start, len) {
  if (!mm->bd_addr)
 return;
  bt_work = kmalloc(sizeof(struct bt_work));
  bt_work->start = start;
  bt_work->len = len;
  hlist_add(_work->list, >bt_work_head);
} 

And then instead of up_write(mmap_sem);

arch_up_write(mmap_sem);

Which by default is mapped to up_write(mmap_sem);

Now for the MPX case you can do:
{
HLIST_HEAD(bt_work_head);

hlist_move_list(>bt_work_head, _work_head);
up_write(mmap_sem);

hlist_for_each_entry_safe()
handle_bt_work();
}

So that needs a few more changes vs. the up_write(mmap_sem) at the
callsites of do_munmap(), but we might even make that a generic thing,
i.e. replace up_write(mmap_sem) with release_write(mmap_sem). I can
imagine that we have other use cases for this.

Thoughts?

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-13 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Dave Hansen wrote:
 On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
  On Fri, 12 Sep 2014, Dave Hansen wrote:
  There are two mappings in play:
  1. The mapping with the actual data, which userspace is munmap()ing or
 brk()ing away, etc... (never tagged VM_MPX)
  
  It's not tagged that way because it is mapped by user space.
 
 Correct.  It is not tagged because it is mapped by user space.
 
  This is the directory, right?
 
 No.  The untagged mapping in question here is for normal user data, like
 an mmap() or brk(), unrelated to MPX.

Ok. That makes sense.
 
 The directory is a separate matter.  It is also (currently) untagged
 with VM_MPX since it is also allocated by userspace.

So if that gets unmapped my observation holds. You still try to access
the directory, take the fault, queue work and in the work you dont
know how to handle it either.

So if the unmapped region affects bd_addr then we should just release
the affected BT mappings, i.e. all vmas flagged with VMA_MPX.

  With the allocation from #BR you make that behaviour
  dynamic and you just provide an empty no bounds table to make the
  bound checker happy.
 
 Kinda.  We do provide an empty table, but the first access will always
 be a write, so it doesn't stay empty for long.

So this comes from adding an entry to a not yet mapped table not from
an actual bound check? I still need to digest the details in the
manual.

 The bounds directory is not being unmapped here.  I _think_ I covered
 that above, but don't be shy if I'm not being clear. ;)

Fair enough. My confusion.
 
 If the bounds directory moved around, this would make sense.  Otherwise,
 it's a waste of space because all vmas in a given mm would have the
 exact same bd_addr, and we might as well just store it in mm-bd_something.

Ok. But we really want to do some sanity checking on all of this.
 
 Are you suggesting that we support moving the bounds directory around?

No, but the stupid thing CAN move around and we want to think about it
now instead of figuring out what to do about it later.

So if we go and store bd_addr with the prctl then you can do in the
#BR Invalid BD entry:

bd_addr = xsave-xsave_buf-bndcsr.cfg_reg_u;

/*
 * Catch the case that this is not enabled, i.e. mm-bd_addr == 0,
 * and the case that stupid user space moved the directory
 * around.
 */
if (mm-bd_addr != bd_addr) {
   Yell and whack stupid app over the head;
}

 Also, the bd_entry can be _calculated_ from vma-vm_start and the
 bd_addr.  It seems a bit redundant to store it like this.

Fair enough.

 If you are talking about the VM_MPX VMA that was allocated to hold the
 bounds table, this won't work.

Sorry yes, that only works for unmapping the bound directory itself.

 Once we unmap the bounds table, we would have a bounds directory entry
 pointing at empty address space.  That address space could now be
 allocated for some other (random) use, and the MPX hardware is now going
 to go trying to walk it as if it were a bounds table.  That would be bad.
 
 Any unmapping of a bounds table has to be accompanied by a corresponding
 write to the bounds directory entry.  That write to the bounds directory
 can fault.

So if it fails you need to keep the bound table around until you can
handle that somewhere else, i.e. outside of the mmap sem held
region. That's what you are planning to do with the work queue thing.

Now I'm asking myself, whether we are forced to do that from the end
of do_unmap() rather than doing it from the call site outside of the
mmap_sem held region. I can see that adding arch_unmap() to do_unmap()
is a very simple solution, but it comes with the price of dealing with
faults inside of the mmap_sem held region.

It might be worthwhile to think about the following:

   down_write(mmap_sem);
   
   do_stuff()
 do_munmap(mm, start, len)
...
arch_munmap(mm, start, len) {
  if (!mm-bd_addr)
 return;
  bt_work = kmalloc(sizeof(struct bt_work));
  bt_work-start = start;
  bt_work-len = len;
  hlist_add(bt_work-list, mm-bt_work_head);
} 

And then instead of up_write(mmap_sem);

arch_up_write(mmap_sem);

Which by default is mapped to up_write(mmap_sem);

Now for the MPX case you can do:
{
HLIST_HEAD(bt_work_head);

hlist_move_list(mm-bt_work_head, bt_work_head);
up_write(mmap_sem);

hlist_for_each_entry_safe()
handle_bt_work();
}
  
So that needs a few more changes vs. the up_write(mmap_sem) at the
callsites of do_munmap(), but we might even make that a generic thing,
i.e. replace up_write(mmap_sem) with release_write(mmap_sem). I can
imagine that we have other use cases for this.

Thoughts?

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 11:42 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Thomas Gleixner wrote:
>> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> The proper solution to this problem is:
>>
>> do_bounds()
>>  bd_addr = get_bd_addr_from_xsave();
>>  bd_entry = bndstatus & ADDR_MASK:
> 
> Just for clarification. You CANNOT avoid the xsave here because it's
> the only way to access BNDSTATUS according to the manual.
> 
> "The BNDCFGU and BNDSTATUS registers are accessible only with
>  XSAVE/XRSTOR family of instructions"
> 
> So there is no point to cache BNDCFGU as you get it anyway when you
> need to retrieve the invalid BD entry.

Agreed.  It serves no purpose during a bounds fault.

However, it does keep you from having to do an xsave during the bounds
table free operations, like at unmap() time.  That is actually a much
more critical path than bounds faults.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 10:42 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> The prctl(register) is meant to be a signal from userspace to the kernel
>> to say, "I would like your help in managing these bounds tables".
>> prctl(unregister) is the opposite, meaning "I don't want your help any
>> more".
> 
> Fine, but that's a totally different story. I can see the usefulness
> of this, but then it's a complete misnomer. It should be:
> 
>prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

Agreed.  Those are much better names.

> So this wants to be a boolean value and not some random user space
> address collected at some random point and then ignored until you do
> the magic cleanup. See the other reply.

I know at this point you think the kernel can not or should not keep a
copy of the bounds directory location around.  I understand that.  Bear
with me for a moment, and please just assume for a moment that we need it.

It's far from a random userspace address.  When you make a syscall, we
put the arguments in registers.  The register we're putting it in here
just happens to be used by the hardware.

Right now, we do (ignoring the actual xsave/xrstr):

bndcfgu = bnd_dir_ptr | ENABLE_BIT;
prctl(ENABLE_MPX_BT_MANAGEMENT); // kernel grabs from xsave buf

We could pass it explicitly in %rdi as a syscall argument and not have
the prctl() code fetch it from the xsave buffer.  I'm just not sure what
this buys us:

bndcfgu = bnd_dir_ptr | ENABLE_BIT;
prctl(ENABLE_MPX_BT_MANAGEMENT, bndcfgu);

Also, the "random cleanup" just happens to correspond with memory
deallocation, which is something we want to go fast.  I'd _prefer_ to
keep xsaves out of the unmap path if possible.  It's not a strict
requirement, but it does seem prudent as an xsave eats a dozen or so
cachelines.

It's also not "sampled".  I can't imagine a situation where the register
will change values during the execution of any sane program.  It really
is essentially fixed.  It's probably one of the reasons it is so
expensive to access: there's *no* reason to do it frequently.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> There are two mappings in play:
>> 1. The mapping with the actual data, which userspace is munmap()ing or
>>brk()ing away, etc... (never tagged VM_MPX)
> 
> It's not tagged that way because it is mapped by user space.

Correct.  It is not tagged because it is mapped by user space.

> This is the directory, right?

No.  The untagged mapping in question here is for normal user data, like
an mmap() or brk(), unrelated to MPX.

The directory is a separate matter.  It is also (currently) untagged
with VM_MPX since it is also allocated by userspace.

>> 2. The mapping for the bounds table *backing* the data (is tagged with
>>VM_MPX)
> 
> That's the stuff, which gets magically allocated from do_bounds(). And
> the reason you do that from the #BR is that user space would have to
> allocate a gazillion of bound tables to make sure that every corner
> case is covered.

Yes.

> With the allocation from #BR you make that behaviour
> dynamic and you just provide an empty "no bounds" table to make the
> bound checker happy.

Kinda.  We do provide an empty table, but the first access will always
be a write, so it doesn't stay empty for long.

...
> Now, I have a hard time to see how that is supposed to work.
> 
> do_unmap()
>  detach_vmas_to_be_unmapped()
>  unmap_region()
>free_pgtables()
>  arch_unmap()
>mpx_unmap()
> 
> So at the point where you try to access the directory to gather the
> information about the entries which might be affected, that stuff is
> unmapped already and the page tables are gone.
> 
> Brilliant idea, really. And if you run into the fault in mpx_unmap()
> you plan to delegate the fixup to a work queue. How is that thing
> going to find what belonged to the unmapped directory?

The bounds directory is not being unmapped here.  I _think_ I covered
that above, but don't be shy if I'm not being clear. ;)

> Even if the stuff would be accessible at that point, it is a damned
> stupid idea to rely on anything userspace is providing to you. I
> learned that the hard way in futex.c
> 
> The proper solution to this problem is:
> 
> do_bounds()
>   bd_addr = get_bd_addr_from_xsave();
>   bd_entry = bndstatus & ADDR_MASK:
> 
>   bt = mpx_mmap(bd_addr, bd_entry, len);
> 
>   set_bt_entry_in_bd(bd_entry, bt);
> 
> And in mpx_mmap()
> 
>.
>vma = find_vma();
> 
>vma->bd_addr = bd_addr;
>vma->bd_entry = bd_entry;

If the bounds directory moved around, this would make sense.  Otherwise,
it's a waste of space because all vmas in a given mm would have the
exact same bd_addr, and we might as well just store it in mm->bd_something.

Are you suggesting that we support moving the bounds directory around?

Also, the bd_entry can be _calculated_ from vma->vm_start and the
bd_addr.  It seems a bit redundant to store it like this.

Also this would add 16 bytes to the currently 184-byte VMA.  That seems
suboptimal to me.  It would eat over a megabyte of memory on my *laptop*
alone.

> Now on mpx_unmap()
> 
> for_each_vma()
>   if (is_affected(vma->bd_addr, vma->bd_entry))
>  unmap(vma);
> 
> That does not require a prctl, no fault handling in the unmap path, it
> just works and is robust by design because it does not rely on any
> user space crappola. You store the directory context at allocation
> time and free it when that context goes away. It's that simple, really.

If you are talking about the VM_MPX VMA that was allocated to hold the
bounds table, this won't work.

Once we unmap the bounds table, we would have a bounds directory entry
pointing at empty address space.  That address space could now be
allocated for some other (random) use, and the MPX hardware is now going
to go trying to walk it as if it were a bounds table.  That would be bad.

Any unmapping of a bounds table has to be accompanied by a corresponding
write to the bounds directory entry.  That write to the bounds directory
can fault.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
> The proper solution to this problem is:
> 
> do_bounds()
>   bd_addr = get_bd_addr_from_xsave();
>   bd_entry = bndstatus & ADDR_MASK:

Just for clarification. You CANNOT avoid the xsave here because it's
the only way to access BNDSTATUS according to the manual.

"The BNDCFGU and BNDSTATUS registers are accessible only with
 XSAVE/XRSTOR family of instructions"

So there is no point to cache BNDCFGU as you get it anyway when you
need to retrieve the invalid BD entry.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
> > So what you are saying is, that if user space sets the pointer to NULL
> > via the unregister prctl, kernel can safely ignore vmas which have the
> > VM_MPX flag set. I really can't follow that logic.
> >  
> > mmap_mpx();
> > prctl(enable mpx);
> > do lots of crap which uses mpx;
> > prctl(disable mpx);
> > 
> > So after that point the previous use of MPX is irrelevant, just
> > because we set a pointer to NULL? Does it just look like crap because
> > I do not get the big picture how all of this is supposed to work?
> 
> The prctl(register) is meant to be a signal from userspace to the kernel
> to say, "I would like your help in managing these bounds tables".
> prctl(unregister) is the opposite, meaning "I don't want your help any
> more".

Fine, but that's a totally different story. I can see the usefulness
of this, but then it's a complete misnomer. It should be:

   prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

So this wants to be a boolean value and not some random user space
address collected at some random point and then ignored until you do
the magic cleanup. See the other reply.

> If userspace uses MPX, it does not necessarily want the kernel to do
> bounds table management all the time (or ever in some cases).  Without
> the prctl(), the kernel has no way of distinguishing what userspace wants.

Fine with me, but it needs to be done proper. And proper means: ON/OFF

The kernel has to handle the information for which context it
allocated stuff and then tear it down when the context goes
away. Relying on a user space address sampled at some random prctl
point is just stupid.

> > And then you need another bunch of logic in the prctl(disable mpx)
> > path to cleanup the mess instead of just setting a random pointer to
> > NULL.
> 
> The bounds tables potentially represent a *lot* of state.  If userspace
> wants to temporarily turn off the kernel's MPX bounds table management,
> it does not necessarily want that state destroyed.  On the other hand,
> if userspace feels the need to go destroying all the state, it is free
> to do so and does not need any help to do so from the kernel.

Fine with me, but the above still stands.

Thanks,

tglx

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
> > On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Dave Hansen wrote:
> >>> Well, we use it to figure out whether we _potentially_ need to tear down
> >>> an VM_MPX-flagged area.  There's no guarantee that there will be one.
> >>
> >> So what you are saying is, that if user space sets the pointer to NULL
> >> via the unregister prctl, kernel can safely ignore vmas which have the
> >> VM_MPX flag set. I really can't follow that logic.
> >>  
> >>mmap_mpx();
> >>prctl(enable mpx);
> >>do lots of crap which uses mpx;
> >>prctl(disable mpx);
> >>
> >> So after that point the previous use of MPX is irrelevant, just
> >> because we set a pointer to NULL? Does it just look like crap because
> >> I do not get the big picture how all of this is supposed to work?
> > 
> > do_bounds() will happily map new BTs no matter whether the prctl was
> > invoked or not. So what's the value of the prctl at all?
> 
> The behavior as it stands is wrong.  We should at least have the kernel
> refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.
> 
> > The mapping is flagged VM_MPX. Why is this not sufficient?
> 
> The comment is confusing and only speaks to half of what the if() in
> question is doing.  We'll get a better comment in there.  But, for the
> sake of explaining it fully:
> 
> There are two mappings in play:
> 1. The mapping with the actual data, which userspace is munmap()ing or
>brk()ing away, etc... (never tagged VM_MPX)

It's not tagged that way because it is mapped by user space. This is
the directory, right?

> 2. The mapping for the bounds table *backing* the data (is tagged with
>VM_MPX)

That's the stuff, which gets magically allocated from do_bounds(). And
the reason you do that from the #BR is that user space would have to
allocate a gazillion of bound tables to make sure that every corner
case is covered. With the allocation from #BR you make that behaviour
dynamic and you just provide an empty "no bounds" table to make the
bound checker happy.

> The code ends up looking like this:
> 
> vm_munmap()
> {
>   do_unmap(vma); // #1 above
>   if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
>   // lookup the backing vma (#2 above)
>   vm_munmap(vma2)
> }
> 
> The bd_addr check is intended to say "could the kernel have possibly
> created some VM_MPX vmas?"  As you noted above, we will happily go
> creating VM_MPX vmas without mm->bd_addr being set.  That's will get fixed.
> 
> The VM_MPX _flags_ check on the VMA is there simply to prevent
> recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
> of the data VMA, and we've got to ensure it doesn't recurse.  *This*
> part of the if() in question is not addressed in the comment.  That's
> something we can fix up in the next version.

Ok, slowly I get the puzzle together :)

Now, the question is whether this magic fragile fixup is the right
thing to do in the context of unmap/brk.

So if the directory is unmapped, you want to free the bounds tables
which are referenced from the directory, i.e. those which you
allocated in do_bounds().

So you call arch_unmap() at the very end of do_unmap(). This walks the
directory to look at the entries and unmaps the bounds table which is
referenced from the directory and then clears the directory entry.

Now, I have a hard time to see how that is supposed to work.

do_unmap()
 detach_vmas_to_be_unmapped()
 unmap_region()
   free_pgtables()
 arch_unmap()
   mpx_unmap()

So at the point where you try to access the directory to gather the
information about the entries which might be affected, that stuff is
unmapped already and the page tables are gone.

Brilliant idea, really. And if you run into the fault in mpx_unmap()
you plan to delegate the fixup to a work queue. How is that thing
going to find what belonged to the unmapped directory?

Even if the stuff would be accessible at that point, it is a damned
stupid idea to rely on anything userspace is providing to you. I
learned that the hard way in futex.c

The proper solution to this problem is:

do_bounds()
bd_addr = get_bd_addr_from_xsave();
bd_entry = bndstatus & ADDR_MASK:

bt = mpx_mmap(bd_addr, bd_entry, len);

set_bt_entry_in_bd(bd_entry, bt);

And in mpx_mmap()

   .
   vma = find_vma();

   vma->bd_addr = bd_addr;
   vma->bd_entry = bd_entry;

Now on mpx_unmap()

for_each_vma()
if (is_affected(vma->bd_addr, vma->bd_entry))
   unmap(vma);

That does not require a prctl, no fault handling in the unmap path, it
just works and is robust by design because it does not rely on any
user space crappola. You store the directory context at allocation
time and free it when that context goes away. It's that simple, really.

So you can still think about a prctl in order to enable/disable the
automatic mapping

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
> So what you are saying is, that if user space sets the pointer to NULL
> via the unregister prctl, kernel can safely ignore vmas which have the
> VM_MPX flag set. I really can't follow that logic.
>  
>   mmap_mpx();
>   prctl(enable mpx);
>   do lots of crap which uses mpx;
>   prctl(disable mpx);
> 
> So after that point the previous use of MPX is irrelevant, just
> because we set a pointer to NULL? Does it just look like crap because
> I do not get the big picture how all of this is supposed to work?

The prctl(register) is meant to be a signal from userspace to the kernel
to say, "I would like your help in managing these bounds tables".
prctl(unregister) is the opposite, meaning "I don't want your help any
more".

The kernel won't really ignore VM_MPX vmas, it just won't go looking for
them actively in response to the unmapping of other non-VM_MPX vmas.

>> Yes.  The only other way the kernel can possibly know that it needs to
>> go tearing things down is with a potentially frequent and expensive xsave.
>>
>> Either we change mmap to say "this mmap() is for a bounds directory", or
>> we have some other interface that says "the mmap() for the bounds
>> directory is at $foo".  We could also record the bounds directory the
>> first time that we catch userspace using it.  I'd rather have an
>> explicit interface than an implicit one like that, though I don't feel
>> that strongly about it.
> 
> I really have to disagree here. If I follow your logic then we would
> have a prctl for using floating point as well instead of catching the
> use and handle it from there. Just get it, if you make it simple for
> user space to do stupid things, they will happen in all provided ways
> and some more.

Here's what it boils down to:

If userspace uses a floating point register, it wants it saved.

If userspace uses MPX, it does not necessarily want the kernel to do
bounds table management all the time (or ever in some cases).  Without
the prctl(), the kernel has no way of distinguishing what userspace wants.

>>> The design to support this feature makes no sense at all to me. We
>>> have a special mmap interface, some magic kernel side mapping
>>> functionality and then on top of it a prctl telling the kernel to
>>> ignore/respect it.
>>
>> That's a good point.  We don't seem to have anything in the
>> allocate_bt() side of things to tell the kernel to refuse to create
>> things if the prctl() hasn't been called.  That needs to get added.
> 
> And then you need another bunch of logic in the prctl(disable mpx)
> path to cleanup the mess instead of just setting a random pointer to
> NULL.

The bounds tables potentially represent a *lot* of state.  If userspace
wants to temporarily turn off the kernel's MPX bounds table management,
it does not necessarily want that state destroyed.  On the other hand,
if userspace feels the need to go destroying all the state, it is free
to do so and does not need any help to do so from the kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Thomas Gleixner wrote:
>> On Thu, 11 Sep 2014, Dave Hansen wrote:
>>> Well, we use it to figure out whether we _potentially_ need to tear down
>>> an VM_MPX-flagged area.  There's no guarantee that there will be one.
>>
>> So what you are saying is, that if user space sets the pointer to NULL
>> via the unregister prctl, kernel can safely ignore vmas which have the
>> VM_MPX flag set. I really can't follow that logic.
>>  
>>  mmap_mpx();
>>  prctl(enable mpx);
>>  do lots of crap which uses mpx;
>>  prctl(disable mpx);
>>
>> So after that point the previous use of MPX is irrelevant, just
>> because we set a pointer to NULL? Does it just look like crap because
>> I do not get the big picture how all of this is supposed to work?
> 
> do_bounds() will happily map new BTs no matter whether the prctl was
> invoked or not. So what's the value of the prctl at all?

The behavior as it stands is wrong.  We should at least have the kernel
refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.

> The mapping is flagged VM_MPX. Why is this not sufficient?

The comment is confusing and only speaks to half of what the if() in
question is doing.  We'll get a better comment in there.  But, for the
sake of explaining it fully:

There are two mappings in play:
1. The mapping with the actual data, which userspace is munmap()ing or
   brk()ing away, etc... (never tagged VM_MPX)
2. The mapping for the bounds table *backing* the data (is tagged with
   VM_MPX)

The code ends up looking like this:

vm_munmap()
{
do_unmap(vma); // #1 above
if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
// lookup the backing vma (#2 above)
vm_munmap(vma2)
}

The bd_addr check is intended to say "could the kernel have possibly
created some VM_MPX vmas?"  As you noted above, we will happily go
creating VM_MPX vmas without mm->bd_addr being set.  That's will get fixed.

The VM_MPX _flags_ check on the VMA is there simply to prevent
recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
of the data VMA, and we've got to ensure it doesn't recurse.  *This*
part of the if() in question is not addressed in the comment.  That's
something we can fix up in the next version.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Dave Hansen wrote:
> > Well, we use it to figure out whether we _potentially_ need to tear down
> > an VM_MPX-flagged area.  There's no guarantee that there will be one.
> 
> So what you are saying is, that if user space sets the pointer to NULL
> via the unregister prctl, kernel can safely ignore vmas which have the
> VM_MPX flag set. I really can't follow that logic.
>  
>   mmap_mpx();
>   prctl(enable mpx);
>   do lots of crap which uses mpx;
>   prctl(disable mpx);
> 
> So after that point the previous use of MPX is irrelevant, just
> because we set a pointer to NULL? Does it just look like crap because
> I do not get the big picture how all of this is supposed to work?

do_bounds() will happily map new BTs no matter whether the prctl was
invoked or not. So what's the value of the prctl at all?

The mapping is flagged VM_MPX. Why is this not sufficient?

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Thu, 11 Sep 2014, Dave Hansen wrote:
> On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
> > On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> >> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
> >> commands. These commands can be used to register and unregister MPX
> >> related resource on the x86 platform.
> > 
> > I cant see anything which is registered/unregistered.
> 
> This registers the location of the bounds directory with the kernel.
> 
> >From the app's perspective, it says "I'm using MPX, and here is where I
> put the root data structure".
> 
> Without this, the kernel would have to do an (expensive) xsave operation
> every time it wanted to see if MPX was in use.  This also makes the
> user/kernel interaction more explicit.  We would be in a world of hurt
> if userspace was allowed to move the bounds directory around.  With this
> interface, it's a bit more obvious that userspace can't just move it
> around willy-nilly.

And what prevents it to do so? Just the fact that you have a prctl
does not make userspace better.

> >> The base of the bounds directory is set into mm_struct during
> >> PR_MPX_REGISTER command execution. This member can be used to
> >> check whether one application is mpx enabled.
> > 
> > This changelog is completely useless.
> 
> Yeah, it's pretty bare-bones.  Let me know if the explanation above
> makes sense, and we'll get it updated.

Well, it at least explains what its supposed to do. Whether that
itself makes sense is a completely different question.
 
> >> + */
> >> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> >> +{
> >> +  struct xsave_struct *xsave_buf;
> >> +
> >> +  fpu_xsave(>thread.fpu);
> >> +  xsave_buf = &(tsk->thread.fpu.state->xsave);
> >> +  if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> >> +  return NULL;
> > 
> > Now this might be understandable with a proper comment. Right now it's
> > a magic check for something uncomprehensible.
> 
> It's a bit ugly to access, but it seems pretty blatantly obvious that
> this is a check for "Is the enable flag in a hardware register set?"
> 
> Yes, the registers have names only a mother could love.  But that is
> what they're really called.
> 
> I guess we could add some comments about why we need to do the xsave.

Exactly.
 
> > So we use that information to check, whether we need to tear down a
> > VM_MPX flagged region with mpx_unmap(), right?
> 
> Well, we use it to figure out whether we _potentially_ need to tear down
> an VM_MPX-flagged area.  There's no guarantee that there will be one.

So what you are saying is, that if user space sets the pointer to NULL
via the unregister prctl, kernel can safely ignore vmas which have the
VM_MPX flag set. I really can't follow that logic.
 
mmap_mpx();
prctl(enable mpx);
do lots of crap which uses mpx;
prctl(disable mpx);

So after that point the previous use of MPX is irrelevant, just
because we set a pointer to NULL? Does it just look like crap because
I do not get the big picture how all of this is supposed to work?

> Yes.  The only other way the kernel can possibly know that it needs to
> go tearing things down is with a potentially frequent and expensive xsave.
> 
> Either we change mmap to say "this mmap() is for a bounds directory", or
> we have some other interface that says "the mmap() for the bounds
> directory is at $foo".  We could also record the bounds directory the
> first time that we catch userspace using it.  I'd rather have an
> explicit interface than an implicit one like that, though I don't feel
> that strongly about it.

I really have to disagree here. If I follow your logic then we would
have a prctl for using floating point as well instead of catching the
use and handle it from there. Just get it, if you make it simple for
user space to do stupid things, they will happen in all provided ways
and some more.

> > The design to support this feature makes no sense at all to me. We
> > have a special mmap interface, some magic kernel side mapping
> > functionality and then on top of it a prctl telling the kernel to
> > ignore/respect it.
> 
> That's a good point.  We don't seem to have anything in the
> allocate_bt() side of things to tell the kernel to refuse to create
> things if the prctl() hasn't been called.  That needs to get added.

And then you need another bunch of logic in the prctl(disable mpx)
path to cleanup the mess instead of just setting a random pointer to
NULL.

> If you don't want to share them in public, I'm happy to take this
> off-list, but please do share.

I'll let you know once I verified that it might work.

Thanks,

tglx
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Thu, 11 Sep 2014, Dave Hansen wrote:
 On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
  On Thu, 11 Sep 2014, Qiaowei Ren wrote:
  This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
  commands. These commands can be used to register and unregister MPX
  related resource on the x86 platform.
  
  I cant see anything which is registered/unregistered.
 
 This registers the location of the bounds directory with the kernel.
 
 From the app's perspective, it says I'm using MPX, and here is where I
 put the root data structure.
 
 Without this, the kernel would have to do an (expensive) xsave operation
 every time it wanted to see if MPX was in use.  This also makes the
 user/kernel interaction more explicit.  We would be in a world of hurt
 if userspace was allowed to move the bounds directory around.  With this
 interface, it's a bit more obvious that userspace can't just move it
 around willy-nilly.

And what prevents it to do so? Just the fact that you have a prctl
does not make userspace better.

  The base of the bounds directory is set into mm_struct during
  PR_MPX_REGISTER command execution. This member can be used to
  check whether one application is mpx enabled.
  
  This changelog is completely useless.
 
 Yeah, it's pretty bare-bones.  Let me know if the explanation above
 makes sense, and we'll get it updated.

Well, it at least explains what its supposed to do. Whether that
itself makes sense is a completely different question.
 
  + */
  +static __user void *task_get_bounds_dir(struct task_struct *tsk)
  +{
  +  struct xsave_struct *xsave_buf;
  +
  +  fpu_xsave(tsk-thread.fpu);
  +  xsave_buf = (tsk-thread.fpu.state-xsave);
  +  if (!(xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ENABLE_FLAG))
  +  return NULL;
  
  Now this might be understandable with a proper comment. Right now it's
  a magic check for something uncomprehensible.
 
 It's a bit ugly to access, but it seems pretty blatantly obvious that
 this is a check for Is the enable flag in a hardware register set?
 
 Yes, the registers have names only a mother could love.  But that is
 what they're really called.
 
 I guess we could add some comments about why we need to do the xsave.

Exactly.
 
  So we use that information to check, whether we need to tear down a
  VM_MPX flagged region with mpx_unmap(), right?
 
 Well, we use it to figure out whether we _potentially_ need to tear down
 an VM_MPX-flagged area.  There's no guarantee that there will be one.

So what you are saying is, that if user space sets the pointer to NULL
via the unregister prctl, kernel can safely ignore vmas which have the
VM_MPX flag set. I really can't follow that logic.
 
mmap_mpx();
prctl(enable mpx);
do lots of crap which uses mpx;
prctl(disable mpx);

So after that point the previous use of MPX is irrelevant, just
because we set a pointer to NULL? Does it just look like crap because
I do not get the big picture how all of this is supposed to work?

 Yes.  The only other way the kernel can possibly know that it needs to
 go tearing things down is with a potentially frequent and expensive xsave.
 
 Either we change mmap to say this mmap() is for a bounds directory, or
 we have some other interface that says the mmap() for the bounds
 directory is at $foo.  We could also record the bounds directory the
 first time that we catch userspace using it.  I'd rather have an
 explicit interface than an implicit one like that, though I don't feel
 that strongly about it.

I really have to disagree here. If I follow your logic then we would
have a prctl for using floating point as well instead of catching the
use and handle it from there. Just get it, if you make it simple for
user space to do stupid things, they will happen in all provided ways
and some more.

  The design to support this feature makes no sense at all to me. We
  have a special mmap interface, some magic kernel side mapping
  functionality and then on top of it a prctl telling the kernel to
  ignore/respect it.
 
 That's a good point.  We don't seem to have anything in the
 allocate_bt() side of things to tell the kernel to refuse to create
 things if the prctl() hasn't been called.  That needs to get added.

And then you need another bunch of logic in the prctl(disable mpx)
path to cleanup the mess instead of just setting a random pointer to
NULL.

 If you don't want to share them in public, I'm happy to take this
 off-list, but please do share.

I'll let you know once I verified that it might work.

Thanks,

tglx
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
 On Thu, 11 Sep 2014, Dave Hansen wrote:
  Well, we use it to figure out whether we _potentially_ need to tear down
  an VM_MPX-flagged area.  There's no guarantee that there will be one.
 
 So what you are saying is, that if user space sets the pointer to NULL
 via the unregister prctl, kernel can safely ignore vmas which have the
 VM_MPX flag set. I really can't follow that logic.
  
   mmap_mpx();
   prctl(enable mpx);
   do lots of crap which uses mpx;
   prctl(disable mpx);
 
 So after that point the previous use of MPX is irrelevant, just
 because we set a pointer to NULL? Does it just look like crap because
 I do not get the big picture how all of this is supposed to work?

do_bounds() will happily map new BTs no matter whether the prctl was
invoked or not. So what's the value of the prctl at all?

The mapping is flagged VM_MPX. Why is this not sufficient?

Thanks,

tglx

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
 On Fri, 12 Sep 2014, Thomas Gleixner wrote:
 On Thu, 11 Sep 2014, Dave Hansen wrote:
 Well, we use it to figure out whether we _potentially_ need to tear down
 an VM_MPX-flagged area.  There's no guarantee that there will be one.

 So what you are saying is, that if user space sets the pointer to NULL
 via the unregister prctl, kernel can safely ignore vmas which have the
 VM_MPX flag set. I really can't follow that logic.
  
  mmap_mpx();
  prctl(enable mpx);
  do lots of crap which uses mpx;
  prctl(disable mpx);

 So after that point the previous use of MPX is irrelevant, just
 because we set a pointer to NULL? Does it just look like crap because
 I do not get the big picture how all of this is supposed to work?
 
 do_bounds() will happily map new BTs no matter whether the prctl was
 invoked or not. So what's the value of the prctl at all?

The behavior as it stands is wrong.  We should at least have the kernel
refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.

 The mapping is flagged VM_MPX. Why is this not sufficient?

The comment is confusing and only speaks to half of what the if() in
question is doing.  We'll get a better comment in there.  But, for the
sake of explaining it fully:

There are two mappings in play:
1. The mapping with the actual data, which userspace is munmap()ing or
   brk()ing away, etc... (never tagged VM_MPX)
2. The mapping for the bounds table *backing* the data (is tagged with
   VM_MPX)

The code ends up looking like this:

vm_munmap()
{
do_unmap(vma); // #1 above
if (mm-bd_addr  !(vma-vm_flags  VM_MPX))
// lookup the backing vma (#2 above)
vm_munmap(vma2)
}

The bd_addr check is intended to say could the kernel have possibly
created some VM_MPX vmas?  As you noted above, we will happily go
creating VM_MPX vmas without mm-bd_addr being set.  That's will get fixed.

The VM_MPX _flags_ check on the VMA is there simply to prevent
recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
of the data VMA, and we've got to ensure it doesn't recurse.  *This*
part of the if() in question is not addressed in the comment.  That's
something we can fix up in the next version.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
 So what you are saying is, that if user space sets the pointer to NULL
 via the unregister prctl, kernel can safely ignore vmas which have the
 VM_MPX flag set. I really can't follow that logic.
  
   mmap_mpx();
   prctl(enable mpx);
   do lots of crap which uses mpx;
   prctl(disable mpx);
 
 So after that point the previous use of MPX is irrelevant, just
 because we set a pointer to NULL? Does it just look like crap because
 I do not get the big picture how all of this is supposed to work?

The prctl(register) is meant to be a signal from userspace to the kernel
to say, I would like your help in managing these bounds tables.
prctl(unregister) is the opposite, meaning I don't want your help any
more.

The kernel won't really ignore VM_MPX vmas, it just won't go looking for
them actively in response to the unmapping of other non-VM_MPX vmas.

 Yes.  The only other way the kernel can possibly know that it needs to
 go tearing things down is with a potentially frequent and expensive xsave.

 Either we change mmap to say this mmap() is for a bounds directory, or
 we have some other interface that says the mmap() for the bounds
 directory is at $foo.  We could also record the bounds directory the
 first time that we catch userspace using it.  I'd rather have an
 explicit interface than an implicit one like that, though I don't feel
 that strongly about it.
 
 I really have to disagree here. If I follow your logic then we would
 have a prctl for using floating point as well instead of catching the
 use and handle it from there. Just get it, if you make it simple for
 user space to do stupid things, they will happen in all provided ways
 and some more.

Here's what it boils down to:

If userspace uses a floating point register, it wants it saved.

If userspace uses MPX, it does not necessarily want the kernel to do
bounds table management all the time (or ever in some cases).  Without
the prctl(), the kernel has no way of distinguishing what userspace wants.

 The design to support this feature makes no sense at all to me. We
 have a special mmap interface, some magic kernel side mapping
 functionality and then on top of it a prctl telling the kernel to
 ignore/respect it.

 That's a good point.  We don't seem to have anything in the
 allocate_bt() side of things to tell the kernel to refuse to create
 things if the prctl() hasn't been called.  That needs to get added.
 
 And then you need another bunch of logic in the prctl(disable mpx)
 path to cleanup the mess instead of just setting a random pointer to
 NULL.

The bounds tables potentially represent a *lot* of state.  If userspace
wants to temporarily turn off the kernel's MPX bounds table management,
it does not necessarily want that state destroyed.  On the other hand,
if userspace feels the need to go destroying all the state, it is free
to do so and does not need any help to do so from the kernel.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Dave Hansen wrote:
 On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
  On Fri, 12 Sep 2014, Thomas Gleixner wrote:
  On Thu, 11 Sep 2014, Dave Hansen wrote:
  Well, we use it to figure out whether we _potentially_ need to tear down
  an VM_MPX-flagged area.  There's no guarantee that there will be one.
 
  So what you are saying is, that if user space sets the pointer to NULL
  via the unregister prctl, kernel can safely ignore vmas which have the
  VM_MPX flag set. I really can't follow that logic.
   
 mmap_mpx();
 prctl(enable mpx);
 do lots of crap which uses mpx;
 prctl(disable mpx);
 
  So after that point the previous use of MPX is irrelevant, just
  because we set a pointer to NULL? Does it just look like crap because
  I do not get the big picture how all of this is supposed to work?
  
  do_bounds() will happily map new BTs no matter whether the prctl was
  invoked or not. So what's the value of the prctl at all?
 
 The behavior as it stands is wrong.  We should at least have the kernel
 refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.
 
  The mapping is flagged VM_MPX. Why is this not sufficient?
 
 The comment is confusing and only speaks to half of what the if() in
 question is doing.  We'll get a better comment in there.  But, for the
 sake of explaining it fully:
 
 There are two mappings in play:
 1. The mapping with the actual data, which userspace is munmap()ing or
brk()ing away, etc... (never tagged VM_MPX)

It's not tagged that way because it is mapped by user space. This is
the directory, right?

 2. The mapping for the bounds table *backing* the data (is tagged with
VM_MPX)

That's the stuff, which gets magically allocated from do_bounds(). And
the reason you do that from the #BR is that user space would have to
allocate a gazillion of bound tables to make sure that every corner
case is covered. With the allocation from #BR you make that behaviour
dynamic and you just provide an empty no bounds table to make the
bound checker happy.

 The code ends up looking like this:
 
 vm_munmap()
 {
   do_unmap(vma); // #1 above
   if (mm-bd_addr  !(vma-vm_flags  VM_MPX))
   // lookup the backing vma (#2 above)
   vm_munmap(vma2)
 }
 
 The bd_addr check is intended to say could the kernel have possibly
 created some VM_MPX vmas?  As you noted above, we will happily go
 creating VM_MPX vmas without mm-bd_addr being set.  That's will get fixed.
 
 The VM_MPX _flags_ check on the VMA is there simply to prevent
 recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
 of the data VMA, and we've got to ensure it doesn't recurse.  *This*
 part of the if() in question is not addressed in the comment.  That's
 something we can fix up in the next version.

Ok, slowly I get the puzzle together :)

Now, the question is whether this magic fragile fixup is the right
thing to do in the context of unmap/brk.

So if the directory is unmapped, you want to free the bounds tables
which are referenced from the directory, i.e. those which you
allocated in do_bounds().
 
So you call arch_unmap() at the very end of do_unmap(). This walks the
directory to look at the entries and unmaps the bounds table which is
referenced from the directory and then clears the directory entry.

Now, I have a hard time to see how that is supposed to work.

do_unmap()
 detach_vmas_to_be_unmapped()
 unmap_region()
   free_pgtables()
 arch_unmap()
   mpx_unmap()

So at the point where you try to access the directory to gather the
information about the entries which might be affected, that stuff is
unmapped already and the page tables are gone.

Brilliant idea, really. And if you run into the fault in mpx_unmap()
you plan to delegate the fixup to a work queue. How is that thing
going to find what belonged to the unmapped directory?

Even if the stuff would be accessible at that point, it is a damned
stupid idea to rely on anything userspace is providing to you. I
learned that the hard way in futex.c

The proper solution to this problem is:

do_bounds()
bd_addr = get_bd_addr_from_xsave();
bd_entry = bndstatus  ADDR_MASK:

bt = mpx_mmap(bd_addr, bd_entry, len);

set_bt_entry_in_bd(bd_entry, bt);

And in mpx_mmap()

   .
   vma = find_vma();

   vma-bd_addr = bd_addr;
   vma-bd_entry = bd_entry;

Now on mpx_unmap()

for_each_vma()
if (is_affected(vma-bd_addr, vma-bd_entry))
   unmap(vma);

That does not require a prctl, no fault handling in the unmap path, it
just works and is robust by design because it does not rely on any
user space crappola. You store the directory context at allocation
time and free it when that context goes away. It's that simple, really.

So you can still think about a prctl in order to enable/disable the
automatic mapping stuff, but that's a completely different story.
   
Thanks,

tglx
--
To unsubscribe from this list:

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Dave Hansen wrote:
 On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
  So what you are saying is, that if user space sets the pointer to NULL
  via the unregister prctl, kernel can safely ignore vmas which have the
  VM_MPX flag set. I really can't follow that logic.
   
  mmap_mpx();
  prctl(enable mpx);
  do lots of crap which uses mpx;
  prctl(disable mpx);
  
  So after that point the previous use of MPX is irrelevant, just
  because we set a pointer to NULL? Does it just look like crap because
  I do not get the big picture how all of this is supposed to work?
 
 The prctl(register) is meant to be a signal from userspace to the kernel
 to say, I would like your help in managing these bounds tables.
 prctl(unregister) is the opposite, meaning I don't want your help any
 more.

Fine, but that's a totally different story. I can see the usefulness
of this, but then it's a complete misnomer. It should be:

   prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

So this wants to be a boolean value and not some random user space
address collected at some random point and then ignored until you do
the magic cleanup. See the other reply.

 If userspace uses MPX, it does not necessarily want the kernel to do
 bounds table management all the time (or ever in some cases).  Without
 the prctl(), the kernel has no way of distinguishing what userspace wants.

Fine with me, but it needs to be done proper. And proper means: ON/OFF

The kernel has to handle the information for which context it
allocated stuff and then tear it down when the context goes
away. Relying on a user space address sampled at some random prctl
point is just stupid.

  And then you need another bunch of logic in the prctl(disable mpx)
  path to cleanup the mess instead of just setting a random pointer to
  NULL.
 
 The bounds tables potentially represent a *lot* of state.  If userspace
 wants to temporarily turn off the kernel's MPX bounds table management,
 it does not necessarily want that state destroyed.  On the other hand,
 if userspace feels the need to go destroying all the state, it is free
 to do so and does not need any help to do so from the kernel.

Fine with me, but the above still stands.

Thanks,

tglx

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Thomas Gleixner

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
 On Fri, 12 Sep 2014, Dave Hansen wrote:
 The proper solution to this problem is:
 
 do_bounds()
   bd_addr = get_bd_addr_from_xsave();
   bd_entry = bndstatus  ADDR_MASK:

Just for clarification. You CANNOT avoid the xsave here because it's
the only way to access BNDSTATUS according to the manual.

The BNDCFGU and BNDSTATUS registers are accessible only with
 XSAVE/XRSTOR family of instructions

So there is no point to cache BNDCFGU as you get it anyway when you
need to retrieve the invalid BD entry.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
 On Fri, 12 Sep 2014, Dave Hansen wrote:
 There are two mappings in play:
 1. The mapping with the actual data, which userspace is munmap()ing or
brk()ing away, etc... (never tagged VM_MPX)
 
 It's not tagged that way because it is mapped by user space.

Correct.  It is not tagged because it is mapped by user space.

 This is the directory, right?

No.  The untagged mapping in question here is for normal user data, like
an mmap() or brk(), unrelated to MPX.

The directory is a separate matter.  It is also (currently) untagged
with VM_MPX since it is also allocated by userspace.

 2. The mapping for the bounds table *backing* the data (is tagged with
VM_MPX)
 
 That's the stuff, which gets magically allocated from do_bounds(). And
 the reason you do that from the #BR is that user space would have to
 allocate a gazillion of bound tables to make sure that every corner
 case is covered.

Yes.

 With the allocation from #BR you make that behaviour
 dynamic and you just provide an empty no bounds table to make the
 bound checker happy.

Kinda.  We do provide an empty table, but the first access will always
be a write, so it doesn't stay empty for long.

...
 Now, I have a hard time to see how that is supposed to work.
 
 do_unmap()
  detach_vmas_to_be_unmapped()
  unmap_region()
free_pgtables()
  arch_unmap()
mpx_unmap()
 
 So at the point where you try to access the directory to gather the
 information about the entries which might be affected, that stuff is
 unmapped already and the page tables are gone.
 
 Brilliant idea, really. And if you run into the fault in mpx_unmap()
 you plan to delegate the fixup to a work queue. How is that thing
 going to find what belonged to the unmapped directory?

The bounds directory is not being unmapped here.  I _think_ I covered
that above, but don't be shy if I'm not being clear. ;)

 Even if the stuff would be accessible at that point, it is a damned
 stupid idea to rely on anything userspace is providing to you. I
 learned that the hard way in futex.c
 
 The proper solution to this problem is:
 
 do_bounds()
   bd_addr = get_bd_addr_from_xsave();
   bd_entry = bndstatus  ADDR_MASK:
 
   bt = mpx_mmap(bd_addr, bd_entry, len);
 
   set_bt_entry_in_bd(bd_entry, bt);
 
 And in mpx_mmap()
 
.
vma = find_vma();
 
vma-bd_addr = bd_addr;
vma-bd_entry = bd_entry;

If the bounds directory moved around, this would make sense.  Otherwise,
it's a waste of space because all vmas in a given mm would have the
exact same bd_addr, and we might as well just store it in mm-bd_something.

Are you suggesting that we support moving the bounds directory around?

Also, the bd_entry can be _calculated_ from vma-vm_start and the
bd_addr.  It seems a bit redundant to store it like this.

Also this would add 16 bytes to the currently 184-byte VMA.  That seems
suboptimal to me.  It would eat over a megabyte of memory on my *laptop*
alone.

 Now on mpx_unmap()
 
 for_each_vma()
   if (is_affected(vma-bd_addr, vma-bd_entry))
  unmap(vma);
 
 That does not require a prctl, no fault handling in the unmap path, it
 just works and is robust by design because it does not rely on any
 user space crappola. You store the directory context at allocation
 time and free it when that context goes away. It's that simple, really.

If you are talking about the VM_MPX VMA that was allocated to hold the
bounds table, this won't work.

Once we unmap the bounds table, we would have a bounds directory entry
pointing at empty address space.  That address space could now be
allocated for some other (random) use, and the MPX hardware is now going
to go trying to walk it as if it were a bounds table.  That would be bad.

Any unmapping of a bounds table has to be accompanied by a corresponding
write to the bounds directory entry.  That write to the bounds directory
can fault.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 10:42 AM, Thomas Gleixner wrote:
 On Fri, 12 Sep 2014, Dave Hansen wrote:
 The prctl(register) is meant to be a signal from userspace to the kernel
 to say, I would like your help in managing these bounds tables.
 prctl(unregister) is the opposite, meaning I don't want your help any
 more.
 
 Fine, but that's a totally different story. I can see the usefulness
 of this, but then it's a complete misnomer. It should be:
 
prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

Agreed.  Those are much better names.

 So this wants to be a boolean value and not some random user space
 address collected at some random point and then ignored until you do
 the magic cleanup. See the other reply.

I know at this point you think the kernel can not or should not keep a
copy of the bounds directory location around.  I understand that.  Bear
with me for a moment, and please just assume for a moment that we need it.

It's far from a random userspace address.  When you make a syscall, we
put the arguments in registers.  The register we're putting it in here
just happens to be used by the hardware.

Right now, we do (ignoring the actual xsave/xrstr):

bndcfgu = bnd_dir_ptr | ENABLE_BIT;
prctl(ENABLE_MPX_BT_MANAGEMENT); // kernel grabs from xsave buf

We could pass it explicitly in %rdi as a syscall argument and not have
the prctl() code fetch it from the xsave buffer.  I'm just not sure what
this buys us:

bndcfgu = bnd_dir_ptr | ENABLE_BIT;
prctl(ENABLE_MPX_BT_MANAGEMENT, bndcfgu);

Also, the random cleanup just happens to correspond with memory
deallocation, which is something we want to go fast.  I'd _prefer_ to
keep xsaves out of the unmap path if possible.  It's not a strict
requirement, but it does seem prudent as an xsave eats a dozen or so
cachelines.

It's also not sampled.  I can't imagine a situation where the register
will change values during the execution of any sane program.  It really
is essentially fixed.  It's probably one of the reasons it is so
expensive to access: there's *no* reason to do it frequently.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-12 Thread Dave Hansen

On 09/12/2014 11:42 AM, Thomas Gleixner wrote:
 On Fri, 12 Sep 2014, Thomas Gleixner wrote:
 On Fri, 12 Sep 2014, Dave Hansen wrote:
 The proper solution to this problem is:

 do_bounds()
  bd_addr = get_bd_addr_from_xsave();
  bd_entry = bndstatus  ADDR_MASK:
 
 Just for clarification. You CANNOT avoid the xsave here because it's
 the only way to access BNDSTATUS according to the manual.
 
 The BNDCFGU and BNDSTATUS registers are accessible only with
  XSAVE/XRSTOR family of instructions
 
 So there is no point to cache BNDCFGU as you get it anyway when you
 need to retrieve the invalid BD entry.

Agreed.  It serves no purpose during a bounds fault.

However, it does keep you from having to do an xsave during the bounds
table free operations, like at unmap() time.  That is actually a much
more critical path than bounds faults.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Ren, Qiaowei



On 2014-09-11, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> +
>> +return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
>> +MPX_BNDCFG_ADDR_MASK);
>> +}
> 
> I don't think casting a u64 to a ulong, then to a pointer is useful.
> Just take the '(unsigned long)' out.

If so, this will spits out a warning on 32-bit:

arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir':
arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Dave Hansen

On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
>> commands. These commands can be used to register and unregister MPX
>> related resource on the x86 platform.
> 
> I cant see anything which is registered/unregistered.

This registers the location of the bounds directory with the kernel.

>From the app's perspective, it says "I'm using MPX, and here is where I
put the root data structure".

Without this, the kernel would have to do an (expensive) xsave operation
every time it wanted to see if MPX was in use.  This also makes the
user/kernel interaction more explicit.  We would be in a world of hurt
if userspace was allowed to move the bounds directory around.  With this
interface, it's a bit more obvious that userspace can't just move it
around willy-nilly.

>> The base of the bounds directory is set into mm_struct during
>> PR_MPX_REGISTER command execution. This member can be used to
>> check whether one application is mpx enabled.
> 
> This changelog is completely useless.

Yeah, it's pretty bare-bones.  Let me know if the explanation above
makes sense, and we'll get it updated.

>> +/*
>> + * This should only be called when cpuid has been checked
>> + * and we are sure that MPX is available.
> 
> Groan. Why can't you put that cpuid check into that function right
> away instead of adding a worthless comment?

Sounds reasonable to me.  We should just move the cpuid check in to
task_get_bounds_dir().

>> + */
>> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
>> +{
>> +struct xsave_struct *xsave_buf;
>> +
>> +fpu_xsave(>thread.fpu);
>> +xsave_buf = &(tsk->thread.fpu.state->xsave);
>> +if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
>> +return NULL;
> 
> Now this might be understandable with a proper comment. Right now it's
> a magic check for something uncomprehensible.

It's a bit ugly to access, but it seems pretty blatantly obvious that
this is a check for "Is the enable flag in a hardware register set?"

Yes, the registers have names only a mother could love.  But that is
what they're really called.

I guess we could add some comments about why we need to do the xsave.

>> +int mpx_register(struct task_struct *tsk)
>> +{
>> +struct mm_struct *mm = tsk->mm;
>> +
>> +if (!cpu_has_mpx)
>> +return -EINVAL;
>> +
>> +/*
>> + * runtime in the userspace will be responsible for allocation of
>> + * the bounds directory. Then, it will save the base of the bounds
>> + * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> + * XRSTOR instruction.
>> + *
>> + * fpu_xsave() is expected to be very expensive. In order to do
>> + * performance optimization, here we get the base of the bounds
>> + * directory and then save it into mm_struct to be used in future.
>> + */
> 
> Ah. Now we get some information what this might do. But that does not
> make any sense at all.
> 
> So all it does is:
> 
> tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> 
> or:
> 
> tsk->mm.bd_addr = NULL;
> 
> So we use that information to check, whether we need to tear down a
> VM_MPX flagged region with mpx_unmap(), right?

Well, we use it to figure out whether we _potentially_ need to tear down
an VM_MPX-flagged area.  There's no guarantee that there will be one.

>> + /*
>> +  * Check whether this vma comes from MPX-enabled application.
>> +  * If so, release this vma related bound tables.
>> +  */
>> + if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
>> + mpx_unmap(mm, start, end);
> 
> You really must be kidding. The application maps that table and never
> calls that prctl so do_unmap() will happily ignore it?

Yes.  The only other way the kernel can possibly know that it needs to
go tearing things down is with a potentially frequent and expensive xsave.

Either we change mmap to say "this mmap() is for a bounds directory", or
we have some other interface that says "the mmap() for the bounds
directory is at $foo".  We could also record the bounds directory the
first time that we catch userspace using it.  I'd rather have an
explicit interface than an implicit one like that, though I don't feel
that strongly about it.

> The design to support this feature makes no sense at all to me. We
> have a special mmap interface, some magic kernel side mapping
> functionality and then on top of it a prctl telling the kernel to
> ignore/respect it.

That's a good point.  We don't seem to have anything in the
allocate_bt() side of things to tell the kernel to refuse to create
things if the prctl() hasn't been called.  That needs to get added.

> All I have seen so far is the hint to read some intel feature
> documentation, but no coherent explanation how this patch set makes
> use of that very feature. The last patch in the series

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Thomas Gleixner

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
> commands. These commands can be used to register and unregister MPX
> related resource on the x86 platform.

I cant see anything which is registered/unregistered.

> The base of the bounds directory is set into mm_struct during
> PR_MPX_REGISTER command execution. This member can be used to
> check whether one application is mpx enabled.

This changelog is completely useless.

What's the actual point of this prctl?

> +/*
> + * This should only be called when cpuid has been checked
> + * and we are sure that MPX is available.

Groan. Why can't you put that cpuid check into that function right
away instead of adding a worthless comment?

It's obviously more important to have a comment about somthing which
is obvious than explaining what the function is actually doing, right?

> + */
> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> +{
> + struct xsave_struct *xsave_buf;
> +
> + fpu_xsave(>thread.fpu);
> + xsave_buf = &(tsk->thread.fpu.state->xsave);
> + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> + return NULL;

Now this might be understandable with a proper comment. Right now it's
a magic check for something uncomprehensible.

> + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> + MPX_BNDCFG_ADDR_MASK);
> +}
> +
> +int mpx_register(struct task_struct *tsk)
> +{
> + struct mm_struct *mm = tsk->mm;
> +
> + if (!cpu_has_mpx)
> + return -EINVAL;
> +
> + /*
> +  * runtime in the userspace will be responsible for allocation of
> +  * the bounds directory. Then, it will save the base of the bounds
> +  * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +  * XRSTOR instruction.
> +  *
> +  * fpu_xsave() is expected to be very expensive. In order to do
> +  * performance optimization, here we get the base of the bounds
> +  * directory and then save it into mm_struct to be used in future.
> +  */

Ah. Now we get some information what this might do. But that does not
make any sense at all.

So all it does is:

tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;

or:

tsk->mm.bd_addr = NULL;

So we use that information to check, whether we need to tear down a
VM_MPX flagged region with mpx_unmap(), right?

> + /*
> +  * Check whether this vma comes from MPX-enabled application.
> +  * If so, release this vma related bound tables.
> +  */
> + if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
> + mpx_unmap(mm, start, end);

You really must be kidding. The application maps that table and never
calls that prctl so do_unmap() will happily ignore it?

The design to support this feature makes no sense at all to me. We
have a special mmap interface, some magic kernel side mapping
functionality and then on top of it a prctl telling the kernel to
ignore/respect it.

All I have seen so far is the hint to read some intel feature
documentation, but no coherent explanation how this patch set makes
use of that very feature. The last patch in the series does not count
as coherent explanation. It merily documents parts of the
implementation details which are required to make use of it but
completely lacks of a coherent description how all of this is supposed
to work.

Despite the fact that this is V8, I can't suppress the feeling that
this is just cobbled together to make it work somehow and we'll deal
with the fallout later. I wouldn't be surprised if some of the fallout
is going to be security related. I have a pretty good idea how to
exploit it even without understanding the non-malicious intent of the
whole thing.

So: NAK to the whole series for now until someone comes up with a
coherent explanation.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Dave Hansen

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +
> + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> + MPX_BNDCFG_ADDR_MASK);
> +}

I don't think casting a u64 to a ulong, then to a pointer is useful.
Just take the '(unsigned long)' out.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Thomas Gleixner

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

 This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
 commands. These commands can be used to register and unregister MPX
 related resource on the x86 platform.

I cant see anything which is registered/unregistered.
 
 The base of the bounds directory is set into mm_struct during
 PR_MPX_REGISTER command execution. This member can be used to
 check whether one application is mpx enabled.

This changelog is completely useless.

What's the actual point of this prctl?

 +/*
 + * This should only be called when cpuid has been checked
 + * and we are sure that MPX is available.

Groan. Why can't you put that cpuid check into that function right
away instead of adding a worthless comment?

It's obviously more important to have a comment about somthing which
is obvious than explaining what the function is actually doing, right?

 + */
 +static __user void *task_get_bounds_dir(struct task_struct *tsk)
 +{
 + struct xsave_struct *xsave_buf;
 +
 + fpu_xsave(tsk-thread.fpu);
 + xsave_buf = (tsk-thread.fpu.state-xsave);
 + if (!(xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ENABLE_FLAG))
 + return NULL;

Now this might be understandable with a proper comment. Right now it's
a magic check for something uncomprehensible.

 + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u 
 + MPX_BNDCFG_ADDR_MASK);
 +}
 +
 +int mpx_register(struct task_struct *tsk)
 +{
 + struct mm_struct *mm = tsk-mm;
 +
 + if (!cpu_has_mpx)
 + return -EINVAL;
 +
 + /*
 +  * runtime in the userspace will be responsible for allocation of
 +  * the bounds directory. Then, it will save the base of the bounds
 +  * directory into XSAVE/XRSTOR Save Area and enable MPX through
 +  * XRSTOR instruction.
 +  *
 +  * fpu_xsave() is expected to be very expensive. In order to do
 +  * performance optimization, here we get the base of the bounds
 +  * directory and then save it into mm_struct to be used in future.
 +  */

Ah. Now we get some information what this might do. But that does not
make any sense at all.

So all it does is:

tsk-mm.bd_addr = xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ADDR_MASK;

or:

tsk-mm.bd_addr = NULL;

So we use that information to check, whether we need to tear down a
VM_MPX flagged region with mpx_unmap(), right?

 + /*
 +  * Check whether this vma comes from MPX-enabled application.
 +  * If so, release this vma related bound tables.
 +  */
 + if (mm-bd_addr  !(vma-vm_flags  VM_MPX))
 + mpx_unmap(mm, start, end);

You really must be kidding. The application maps that table and never
calls that prctl so do_unmap() will happily ignore it?

The design to support this feature makes no sense at all to me. We
have a special mmap interface, some magic kernel side mapping
functionality and then on top of it a prctl telling the kernel to
ignore/respect it.

All I have seen so far is the hint to read some intel feature
documentation, but no coherent explanation how this patch set makes
use of that very feature. The last patch in the series does not count
as coherent explanation. It merily documents parts of the
implementation details which are required to make use of it but
completely lacks of a coherent description how all of this is supposed
to work.

Despite the fact that this is V8, I can't suppress the feeling that
this is just cobbled together to make it work somehow and we'll deal
with the fallout later. I wouldn't be surprised if some of the fallout
is going to be security related. I have a pretty good idea how to
exploit it even without understanding the non-malicious intent of the
whole thing.

So: NAK to the whole series for now until someone comes up with a
coherent explanation.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Dave Hansen

On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
 On Thu, 11 Sep 2014, Qiaowei Ren wrote:
 This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
 commands. These commands can be used to register and unregister MPX
 related resource on the x86 platform.
 
 I cant see anything which is registered/unregistered.

This registers the location of the bounds directory with the kernel.

From the app's perspective, it says I'm using MPX, and here is where I
put the root data structure.

Without this, the kernel would have to do an (expensive) xsave operation
every time it wanted to see if MPX was in use.  This also makes the
user/kernel interaction more explicit.  We would be in a world of hurt
if userspace was allowed to move the bounds directory around.  With this
interface, it's a bit more obvious that userspace can't just move it
around willy-nilly.

 The base of the bounds directory is set into mm_struct during
 PR_MPX_REGISTER command execution. This member can be used to
 check whether one application is mpx enabled.
 
 This changelog is completely useless.

Yeah, it's pretty bare-bones.  Let me know if the explanation above
makes sense, and we'll get it updated.

 +/*
 + * This should only be called when cpuid has been checked
 + * and we are sure that MPX is available.
 
 Groan. Why can't you put that cpuid check into that function right
 away instead of adding a worthless comment?

Sounds reasonable to me.  We should just move the cpuid check in to
task_get_bounds_dir().

 + */
 +static __user void *task_get_bounds_dir(struct task_struct *tsk)
 +{
 +struct xsave_struct *xsave_buf;
 +
 +fpu_xsave(tsk-thread.fpu);
 +xsave_buf = (tsk-thread.fpu.state-xsave);
 +if (!(xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ENABLE_FLAG))
 +return NULL;
 
 Now this might be understandable with a proper comment. Right now it's
 a magic check for something uncomprehensible.

It's a bit ugly to access, but it seems pretty blatantly obvious that
this is a check for Is the enable flag in a hardware register set?

Yes, the registers have names only a mother could love.  But that is
what they're really called.

I guess we could add some comments about why we need to do the xsave.

 +int mpx_register(struct task_struct *tsk)
 +{
 +struct mm_struct *mm = tsk-mm;
 +
 +if (!cpu_has_mpx)
 +return -EINVAL;
 +
 +/*
 + * runtime in the userspace will be responsible for allocation of
 + * the bounds directory. Then, it will save the base of the bounds
 + * directory into XSAVE/XRSTOR Save Area and enable MPX through
 + * XRSTOR instruction.
 + *
 + * fpu_xsave() is expected to be very expensive. In order to do
 + * performance optimization, here we get the base of the bounds
 + * directory and then save it into mm_struct to be used in future.
 + */
 
 Ah. Now we get some information what this might do. But that does not
 make any sense at all.
 
 So all it does is:
 
 tsk-mm.bd_addr = xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ADDR_MASK;
 
 or:
 
 tsk-mm.bd_addr = NULL;
 
 So we use that information to check, whether we need to tear down a
 VM_MPX flagged region with mpx_unmap(), right?

Well, we use it to figure out whether we _potentially_ need to tear down
an VM_MPX-flagged area.  There's no guarantee that there will be one.

 + /*
 +  * Check whether this vma comes from MPX-enabled application.
 +  * If so, release this vma related bound tables.
 +  */
 + if (mm-bd_addr  !(vma-vm_flags  VM_MPX))
 + mpx_unmap(mm, start, end);
 
 You really must be kidding. The application maps that table and never
 calls that prctl so do_unmap() will happily ignore it?

Yes.  The only other way the kernel can possibly know that it needs to
go tearing things down is with a potentially frequent and expensive xsave.

Either we change mmap to say this mmap() is for a bounds directory, or
we have some other interface that says the mmap() for the bounds
directory is at $foo.  We could also record the bounds directory the
first time that we catch userspace using it.  I'd rather have an
explicit interface than an implicit one like that, though I don't feel
that strongly about it.

 The design to support this feature makes no sense at all to me. We
 have a special mmap interface, some magic kernel side mapping
 functionality and then on top of it a prctl telling the kernel to
 ignore/respect it.

That's a good point.  We don't seem to have anything in the
allocate_bt() side of things to tell the kernel to refuse to create
things if the prctl() hasn't been called.  That needs to get added.

 All I have seen so far is the hint to read some intel feature
 documentation, but no coherent explanation how this patch set makes
 use of that very feature. The last patch in the series does not count
 as coherent explanation. It merily documents parts of the
 implementation details which are required to make use of it but

RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Ren, Qiaowei



On 2014-09-11, Hansen, Dave wrote:
 On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
 +
 +return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u 
 +MPX_BNDCFG_ADDR_MASK);
 +}
 
 I don't think casting a u64 to a ulong, then to a pointer is useful.
 Just take the '(unsigned long)' out.

If so, this will spits out a warning on 32-bit:

arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir':
arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Dave Hansen

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
 +
 + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u 
 + MPX_BNDCFG_ADDR_MASK);
 +}

I don't think casting a u64 to a ulong, then to a pointer is useful.
Just take the '(unsigned long)' out.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

50 matches

Mail list logo