Re: futex(2) man page update help request

2015-02-16 Thread Cyril Hrubis
Hi!
> I'll follow up with you in a couple weeks most likely. I have some urgent
> things that will be taking all my time and then some until then. Feel free
> to poke me though if I lose track of it :-)

FYI I've started to work on futex testcases for LTP. The first batch has
been commited in:

https://github.com/linux-test-project/ltp/commit/6270ba2ebe999ffdb1364e5e814d7e56070a0198

Some of these are losely based on futextest some are written from
scratch. The requeue operation, pi futexes and bitset are not covered
yet.

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-02-16 Thread Cyril Hrubis
Hi!
 I'll follow up with you in a couple weeks most likely. I have some urgent
 things that will be taking all my time and then some until then. Feel free
 to poke me though if I lose track of it :-)

FYI I've started to work on futex testcases for LTP. The first batch has
been commited in:

https://github.com/linux-test-project/ltp/commit/6270ba2ebe999ffdb1364e5e814d7e56070a0198

Some of these are losely based on futextest some are written from
scratch. The requeue operation, pi futexes and bitset are not covered
yet.

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-02-05 Thread Darren Hart
On 1/24/15, 3:35 AM, "Thomas Gleixner"  wrote:

>On Fri, 23 Jan 2015, Torvald Riegel wrote:
>> Second, the current documentation for EINTR is that it can happen due to
>> receiving a signal *or* due to a spurious wake-up.  This is difficult to
>
>I don't think so. I went through all callchains again with a fine comb.
>
>futex_wait()
>retry:
>   ret = futex_wait_setup();
>   if (ret) {
>/*
> * Possible return codes related to uaddr:
> * -EINVAL:Not u32 aligned uaddr
> * -EFAULT:No mapping, no RW
> * -ENOMEM:Paging ran out of memory
> * -EHWPOISON: Memory hardware error
> *
> * Others:
> * -EWOULDBLOCK: value at uaddr has changed
> */
>   return ret;
>   }
>
>   futex_wait_queue_me();
>
>   if (woken by futex_wake/requeue)
>   return 0;
>
>   if (timeout)
>   return -ETIMEOUT;
>
>   /*
>* Spurious wakeup, i.e. no signal pending
>*/
>   if (!signal_pending())
>   goto retry;
>
>   /* Handled in the low level syscall exit code */
>   if (!timed_wait)
>   return -ERESTARTSYS;
>   else
>   return -ERESTARTBLOCK;
>
>Now in the low level syscall exit we try to deliver the signal
>
>   if (!signal_delivered())
> restart_syscall();
>
>   if (sigaction->flags & SA_RESTART)
> restart_syscall();
>
>   ret_to_userspace -EINTR;
>
>So we should never see -EINTR in the case of a spurious wakeup here.
>
>But, here is the not so good news:
>
> I did some archaeology. The restart handling of futex_wait() got
> introduced in kernel 2.6.22, so anything older than that will have
> the spurious -EINTR issues.
>
>futex_wait_pi() always had the restart handling and glibc folks back
>then (2006) requested that it should never return -EINTR, so it
>unconditionally restarts the syscall whether a signal had been
>delivered or not.
>
>So kernels >= 2.6.22 should never return -EINTR spuriously. If that
>happens it's a bug and needs to be fixed.
>
>> Third, I think it would be useful to -- somewhere -- explain which
>> behavior the futex operations would have conceptually when expressed by
>> C11 code.  We currently say that they wake up, sleep, etc, and which
>> values they return.  But we never say how to properly synchronize with
>> them on the userspace side.  The C11 memory model is probably the best
>> model to use on the userspace side, so that's why I'm arguing for this.
>> Basically, I think we need to (1) tell people that they should use
>> memory_order_relaxed accesses to the futex variable (ie, the memory
>> location associated with the whole futex construct on the kernel side --
>> or do we have another name for this?), and (2) give some conceptual
>> guarantees for the kernel-side synchronization so that one use this to
>> derive how to use them correctly in userspace.
>> 
>> The man pages might not be the right place for this, and maybe we just
>> need a revision of "Futexes are tricky".  If you have other suggestions
>> for where to document this, or on the content, let me know.  (I'm also
>> willing to spend time on this :) ).
>
>The current futex code in the kernel has gained documentation about
>the required memory ordering recently. That should be a good starting
>point.

Lots of paging in here... If I recall correctly there was something about
not being able to return to userspace in these events without owning the
lock (waiters but no owner, breaking pi chains and promotion, etc.), so
restarting was the preferable path.

-- 
Darren Hart
Intel Open Source Technology Center



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-02-05 Thread Darren Hart
On 1/24/15, 3:35 AM, Thomas Gleixner t...@linutronix.de wrote:

On Fri, 23 Jan 2015, Torvald Riegel wrote:
 Second, the current documentation for EINTR is that it can happen due to
 receiving a signal *or* due to a spurious wake-up.  This is difficult to

I don't think so. I went through all callchains again with a fine comb.

futex_wait()
retry:
   ret = futex_wait_setup();
   if (ret) {
/*
 * Possible return codes related to uaddr:
 * -EINVAL:Not u32 aligned uaddr
 * -EFAULT:No mapping, no RW
 * -ENOMEM:Paging ran out of memory
 * -EHWPOISON: Memory hardware error
 *
 * Others:
 * -EWOULDBLOCK: value at uaddr has changed
 */
   return ret;
   }

   futex_wait_queue_me();

   if (woken by futex_wake/requeue)
   return 0;

   if (timeout)
   return -ETIMEOUT;

   /*
* Spurious wakeup, i.e. no signal pending
*/
   if (!signal_pending())
   goto retry;

   /* Handled in the low level syscall exit code */
   if (!timed_wait)
   return -ERESTARTSYS;
   else
   return -ERESTARTBLOCK;

Now in the low level syscall exit we try to deliver the signal

   if (!signal_delivered())
 restart_syscall();

   if (sigaction-flags  SA_RESTART)
 restart_syscall();

   ret_to_userspace -EINTR;

So we should never see -EINTR in the case of a spurious wakeup here.

But, here is the not so good news:

 I did some archaeology. The restart handling of futex_wait() got
 introduced in kernel 2.6.22, so anything older than that will have
 the spurious -EINTR issues.

futex_wait_pi() always had the restart handling and glibc folks back
then (2006) requested that it should never return -EINTR, so it
unconditionally restarts the syscall whether a signal had been
delivered or not.

So kernels = 2.6.22 should never return -EINTR spuriously. If that
happens it's a bug and needs to be fixed.

 Third, I think it would be useful to -- somewhere -- explain which
 behavior the futex operations would have conceptually when expressed by
 C11 code.  We currently say that they wake up, sleep, etc, and which
 values they return.  But we never say how to properly synchronize with
 them on the userspace side.  The C11 memory model is probably the best
 model to use on the userspace side, so that's why I'm arguing for this.
 Basically, I think we need to (1) tell people that they should use
 memory_order_relaxed accesses to the futex variable (ie, the memory
 location associated with the whole futex construct on the kernel side --
 or do we have another name for this?), and (2) give some conceptual
 guarantees for the kernel-side synchronization so that one use this to
 derive how to use them correctly in userspace.
 
 The man pages might not be the right place for this, and maybe we just
 need a revision of Futexes are tricky.  If you have other suggestions
 for where to document this, or on the content, let me know.  (I'm also
 willing to spend time on this :) ).

The current futex code in the kernel has gained documentation about
the required memory ordering recently. That should be a good starting
point.

Lots of paging in here... If I recall correctly there was something about
not being able to return to userspace in these events without owning the
lock (waiters but no owner, breaking pi chains and promotion, etc.), so
restarting was the preferable path.

-- 
Darren Hart
Intel Open Source Technology Center



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-26 Thread Michael Kerrisk (man-pages)
Hello Torvald,

On 01/24/2015 02:12 PM, Torvald Riegel wrote:
> On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
>> So we should never see -EINTR in the case of a spurious wakeup here.
>>
>> But, here is the not so good news:
>>
>>  I did some archaeology. The restart handling of futex_wait() got
>>  introduced in kernel 2.6.22, so anything older than that will have
>>  the spurious -EINTR issues.
>>
>> futex_wait_pi() always had the restart handling and glibc folks back
>> then (2006) requested that it should never return -EINTR, so it
>> unconditionally restarts the syscall whether a signal had been
>> delivered or not.
>>
>> So kernels >= 2.6.22 should never return -EINTR spuriously. If that
>> happens it's a bug and needs to be fixed.
> 
> Thanks for looking into this.
> 
> Michael, can you include the above in the documentation please?  This is
> useful for userspace code like glibc that expects a minimum kernel
> version.  Thanks!

I've added some text to my draft to cover this point.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-26 Thread Michael Kerrisk (man-pages)
Hello Torvald,

On 01/24/2015 02:12 PM, Torvald Riegel wrote:
 On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
 So we should never see -EINTR in the case of a spurious wakeup here.

 But, here is the not so good news:

  I did some archaeology. The restart handling of futex_wait() got
  introduced in kernel 2.6.22, so anything older than that will have
  the spurious -EINTR issues.

 futex_wait_pi() always had the restart handling and glibc folks back
 then (2006) requested that it should never return -EINTR, so it
 unconditionally restarts the syscall whether a signal had been
 delivered or not.

 So kernels = 2.6.22 should never return -EINTR spuriously. If that
 happens it's a bug and needs to be fixed.
 
 Thanks for looking into this.
 
 Michael, can you include the above in the documentation please?  This is
 useful for userspace code like glibc that expects a minimum kernel
 version.  Thanks!

I've added some text to my draft to cover this point.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Thomas Gleixner
On Sat, 24 Jan 2015, Torvald Riegel wrote:
> On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
> > On Fri, 23 Jan 2015, Torvald Riegel wrote:
> > 
> > > On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > > > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > > >  wrote:
> > > > 
> > > > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > > > >check that is "independent of the requeue type (normal/pi)"?
> > > > >
> > > > >When I look through futex_requeue(), all the likely looking sources
> > > > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > > > 
> > > > 
> > > > Right, in the non-PI case, I believe there are valid use cases: move to
> > > > the back of the FIFO, for example (OK, maybe the only example?).
> > > 
> > > But we never guarantee a futex is a FIFO, or do we?  If we don't, then
> > > such a requeue could be implemented as a no-op by the kernel, which
> > > would sort of invalidate the use case.
> > > 
> > > (And I guess we don't want to guarantee FIFO behavior for futexes.)
> > 
> > The (current) behaviour is:
> > 
> > real-time threads:   FIFO per priority level
> > sched-other threads: FIFO independent of nice level
> > 
> > The wakeup is priority ordered. Highest priority level first.
> 
> OK.
> 
> But, just to be clear, do I correctly understand that you do not want to
> guarantee FIFO behavior in the specified futex semantics?  I think there
> are cases where being able to *rely* on FIFO (now and on all future
> kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
> assume in other ordered-wakeup cases too) -- but at the same time, this
> would constrain future futex implementations.

It would be a constraint, but I don't think it would be a horrible
one. Though I have my doubts, that we can actually guarantee it under
all circumstances.

One thing comes to my mind right away: spurious wakeups. There is no
way that we can guarantee FIFO ordering in the context of those. And
we cannot prevent them either.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Torvald Riegel
On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
> So we should never see -EINTR in the case of a spurious wakeup here.
> 
> But, here is the not so good news:
> 
>  I did some archaeology. The restart handling of futex_wait() got
>  introduced in kernel 2.6.22, so anything older than that will have
>  the spurious -EINTR issues.
> 
> futex_wait_pi() always had the restart handling and glibc folks back
> then (2006) requested that it should never return -EINTR, so it
> unconditionally restarts the syscall whether a signal had been
> delivered or not.
> 
> So kernels >= 2.6.22 should never return -EINTR spuriously. If that
> happens it's a bug and needs to be fixed.

Thanks for looking into this.

Michael, can you include the above in the documentation please?  This is
useful for userspace code like glibc that expects a minimum kernel
version.  Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Torvald Riegel
On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
> On Fri, 23 Jan 2015, Torvald Riegel wrote:
> 
> > On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > >  wrote:
> > > 
> > > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > > >check that is "independent of the requeue type (normal/pi)"?
> > > >
> > > >When I look through futex_requeue(), all the likely looking sources
> > > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > > 
> > > 
> > > Right, in the non-PI case, I believe there are valid use cases: move to
> > > the back of the FIFO, for example (OK, maybe the only example?).
> > 
> > But we never guarantee a futex is a FIFO, or do we?  If we don't, then
> > such a requeue could be implemented as a no-op by the kernel, which
> > would sort of invalidate the use case.
> > 
> > (And I guess we don't want to guarantee FIFO behavior for futexes.)
> 
> The (current) behaviour is:
> 
> real-time threads:   FIFO per priority level
> sched-other threads: FIFO independent of nice level
> 
> The wakeup is priority ordered. Highest priority level first.

OK.

But, just to be clear, do I correctly understand that you do not want to
guarantee FIFO behavior in the specified futex semantics?  I think there
are cases where being able to *rely* on FIFO (now and on all future
kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
assume in other ordered-wakeup cases too) -- but at the same time, this
would constrain future futex implementations.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Thomas Gleixner
On Fri, 23 Jan 2015, Torvald Riegel wrote:
> Second, the current documentation for EINTR is that it can happen due to
> receiving a signal *or* due to a spurious wake-up.  This is difficult to

I don't think so. I went through all callchains again with a fine comb.

futex_wait()
retry:
ret = futex_wait_setup();
if (ret) {
 /*
  * Possible return codes related to uaddr:
  * -EINVAL:Not u32 aligned uaddr
  * -EFAULT:No mapping, no RW
  * -ENOMEM:Paging ran out of memory
  * -EHWPOISON: Memory hardware error
  *
  * Others:
  * -EWOULDBLOCK: value at uaddr has changed
  */
return ret;
}

futex_wait_queue_me();

if (woken by futex_wake/requeue)
return 0;

if (timeout)
return -ETIMEOUT;

/*
 * Spurious wakeup, i.e. no signal pending
 */
if (!signal_pending())
goto retry;

/* Handled in the low level syscall exit code */
if (!timed_wait)
return -ERESTARTSYS;
else
return -ERESTARTBLOCK;

Now in the low level syscall exit we try to deliver the signal

if (!signal_delivered())
  restart_syscall();

if (sigaction->flags & SA_RESTART)
  restart_syscall();

ret_to_userspace -EINTR;

So we should never see -EINTR in the case of a spurious wakeup here.

But, here is the not so good news:

 I did some archaeology. The restart handling of futex_wait() got
 introduced in kernel 2.6.22, so anything older than that will have
 the spurious -EINTR issues.

futex_wait_pi() always had the restart handling and glibc folks back
then (2006) requested that it should never return -EINTR, so it
unconditionally restarts the syscall whether a signal had been
delivered or not.

So kernels >= 2.6.22 should never return -EINTR spuriously. If that
happens it's a bug and needs to be fixed.

> Third, I think it would be useful to -- somewhere -- explain which
> behavior the futex operations would have conceptually when expressed by
> C11 code.  We currently say that they wake up, sleep, etc, and which
> values they return.  But we never say how to properly synchronize with
> them on the userspace side.  The C11 memory model is probably the best
> model to use on the userspace side, so that's why I'm arguing for this.
> Basically, I think we need to (1) tell people that they should use
> memory_order_relaxed accesses to the futex variable (ie, the memory
> location associated with the whole futex construct on the kernel side --
> or do we have another name for this?), and (2) give some conceptual
> guarantees for the kernel-side synchronization so that one use this to
> derive how to use them correctly in userspace.
> 
> The man pages might not be the right place for this, and maybe we just
> need a revision of "Futexes are tricky".  If you have other suggestions
> for where to document this, or on the content, let me know.  (I'm also
> willing to spend time on this :) ).

The current futex code in the kernel has gained documentation about
the required memory ordering recently. That should be a good starting
point.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Thomas Gleixner
On Fri, 23 Jan 2015, Torvald Riegel wrote:

> On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> >  wrote:
> > 
> > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > >check that is "independent of the requeue type (normal/pi)"?
> > >
> > >When I look through futex_requeue(), all the likely looking sources
> > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > 
> > 
> > Right, in the non-PI case, I believe there are valid use cases: move to
> > the back of the FIFO, for example (OK, maybe the only example?).
> 
> But we never guarantee a futex is a FIFO, or do we?  If we don't, then
> such a requeue could be implemented as a no-op by the kernel, which
> would sort of invalidate the use case.
> 
> (And I guess we don't want to guarantee FIFO behavior for futexes.)

The (current) behaviour is:

real-time threads:   FIFO per priority level
sched-other threads: FIFO independent of nice level

The wakeup is priority ordered. Highest priority level first.

Thanks,

tglx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Thomas Gleixner
On Sat, 24 Jan 2015, Torvald Riegel wrote:
 On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
  On Fri, 23 Jan 2015, Torvald Riegel wrote:
  
   On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
On 1/16/15, 12:54 PM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:

Color me stupid, but I can't see this in futex_requeue(). Where is that
check that is independent of the requeue type (normal/pi)?

When I look through futex_requeue(), all the likely looking sources
of EINVAL are governed by a check on the 'requeue_pi' argument.


Right, in the non-PI case, I believe there are valid use cases: move to
the back of the FIFO, for example (OK, maybe the only example?).
   
   But we never guarantee a futex is a FIFO, or do we?  If we don't, then
   such a requeue could be implemented as a no-op by the kernel, which
   would sort of invalidate the use case.
   
   (And I guess we don't want to guarantee FIFO behavior for futexes.)
  
  The (current) behaviour is:
  
  real-time threads:   FIFO per priority level
  sched-other threads: FIFO independent of nice level
  
  The wakeup is priority ordered. Highest priority level first.
 
 OK.
 
 But, just to be clear, do I correctly understand that you do not want to
 guarantee FIFO behavior in the specified futex semantics?  I think there
 are cases where being able to *rely* on FIFO (now and on all future
 kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
 assume in other ordered-wakeup cases too) -- but at the same time, this
 would constrain future futex implementations.

It would be a constraint, but I don't think it would be a horrible
one. Though I have my doubts, that we can actually guarantee it under
all circumstances.

One thing comes to my mind right away: spurious wakeups. There is no
way that we can guarantee FIFO ordering in the context of those. And
we cannot prevent them either.

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Thomas Gleixner
On Fri, 23 Jan 2015, Torvald Riegel wrote:

 On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
  On 1/16/15, 12:54 PM, Michael Kerrisk (man-pages)
  mtk.manpa...@gmail.com wrote:
  
  Color me stupid, but I can't see this in futex_requeue(). Where is that
  check that is independent of the requeue type (normal/pi)?
  
  When I look through futex_requeue(), all the likely looking sources
  of EINVAL are governed by a check on the 'requeue_pi' argument.
  
  
  Right, in the non-PI case, I believe there are valid use cases: move to
  the back of the FIFO, for example (OK, maybe the only example?).
 
 But we never guarantee a futex is a FIFO, or do we?  If we don't, then
 such a requeue could be implemented as a no-op by the kernel, which
 would sort of invalidate the use case.
 
 (And I guess we don't want to guarantee FIFO behavior for futexes.)

The (current) behaviour is:

real-time threads:   FIFO per priority level
sched-other threads: FIFO independent of nice level

The wakeup is priority ordered. Highest priority level first.

Thanks,

tglx



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Thomas Gleixner
On Fri, 23 Jan 2015, Torvald Riegel wrote:
 Second, the current documentation for EINTR is that it can happen due to
 receiving a signal *or* due to a spurious wake-up.  This is difficult to

I don't think so. I went through all callchains again with a fine comb.

futex_wait()
retry:
ret = futex_wait_setup();
if (ret) {
 /*
  * Possible return codes related to uaddr:
  * -EINVAL:Not u32 aligned uaddr
  * -EFAULT:No mapping, no RW
  * -ENOMEM:Paging ran out of memory
  * -EHWPOISON: Memory hardware error
  *
  * Others:
  * -EWOULDBLOCK: value at uaddr has changed
  */
return ret;
}

futex_wait_queue_me();

if (woken by futex_wake/requeue)
return 0;

if (timeout)
return -ETIMEOUT;

/*
 * Spurious wakeup, i.e. no signal pending
 */
if (!signal_pending())
goto retry;

/* Handled in the low level syscall exit code */
if (!timed_wait)
return -ERESTARTSYS;
else
return -ERESTARTBLOCK;

Now in the low level syscall exit we try to deliver the signal

if (!signal_delivered())
  restart_syscall();

if (sigaction-flags  SA_RESTART)
  restart_syscall();

ret_to_userspace -EINTR;

So we should never see -EINTR in the case of a spurious wakeup here.

But, here is the not so good news:

 I did some archaeology. The restart handling of futex_wait() got
 introduced in kernel 2.6.22, so anything older than that will have
 the spurious -EINTR issues.

futex_wait_pi() always had the restart handling and glibc folks back
then (2006) requested that it should never return -EINTR, so it
unconditionally restarts the syscall whether a signal had been
delivered or not.

So kernels = 2.6.22 should never return -EINTR spuriously. If that
happens it's a bug and needs to be fixed.

 Third, I think it would be useful to -- somewhere -- explain which
 behavior the futex operations would have conceptually when expressed by
 C11 code.  We currently say that they wake up, sleep, etc, and which
 values they return.  But we never say how to properly synchronize with
 them on the userspace side.  The C11 memory model is probably the best
 model to use on the userspace side, so that's why I'm arguing for this.
 Basically, I think we need to (1) tell people that they should use
 memory_order_relaxed accesses to the futex variable (ie, the memory
 location associated with the whole futex construct on the kernel side --
 or do we have another name for this?), and (2) give some conceptual
 guarantees for the kernel-side synchronization so that one use this to
 derive how to use them correctly in userspace.
 
 The man pages might not be the right place for this, and maybe we just
 need a revision of Futexes are tricky.  If you have other suggestions
 for where to document this, or on the content, let me know.  (I'm also
 willing to spend time on this :) ).

The current futex code in the kernel has gained documentation about
the required memory ordering recently. That should be a good starting
point.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Torvald Riegel
On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
 On Fri, 23 Jan 2015, Torvald Riegel wrote:
 
  On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
   On 1/16/15, 12:54 PM, Michael Kerrisk (man-pages)
   mtk.manpa...@gmail.com wrote:
   
   Color me stupid, but I can't see this in futex_requeue(). Where is that
   check that is independent of the requeue type (normal/pi)?
   
   When I look through futex_requeue(), all the likely looking sources
   of EINVAL are governed by a check on the 'requeue_pi' argument.
   
   
   Right, in the non-PI case, I believe there are valid use cases: move to
   the back of the FIFO, for example (OK, maybe the only example?).
  
  But we never guarantee a futex is a FIFO, or do we?  If we don't, then
  such a requeue could be implemented as a no-op by the kernel, which
  would sort of invalidate the use case.
  
  (And I guess we don't want to guarantee FIFO behavior for futexes.)
 
 The (current) behaviour is:
 
 real-time threads:   FIFO per priority level
 sched-other threads: FIFO independent of nice level
 
 The wakeup is priority ordered. Highest priority level first.

OK.

But, just to be clear, do I correctly understand that you do not want to
guarantee FIFO behavior in the specified futex semantics?  I think there
are cases where being able to *rely* on FIFO (now and on all future
kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
assume in other ordered-wakeup cases too) -- but at the same time, this
would constrain future futex implementations.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-24 Thread Torvald Riegel
On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
 So we should never see -EINTR in the case of a spurious wakeup here.
 
 But, here is the not so good news:
 
  I did some archaeology. The restart handling of futex_wait() got
  introduced in kernel 2.6.22, so anything older than that will have
  the spurious -EINTR issues.
 
 futex_wait_pi() always had the restart handling and glibc folks back
 then (2006) requested that it should never return -EINTR, so it
 unconditionally restarts the syscall whether a signal had been
 delivered or not.
 
 So kernels = 2.6.22 should never return -EINTR spuriously. If that
 happens it's a bug and needs to be fixed.

Thanks for looking into this.

Michael, can you include the above in the documentation please?  This is
useful for userspace code like glibc that expects a minimum kernel
version.  Thanks!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-23 Thread Torvald Riegel
On Thu, 2015-01-15 at 16:10 +0100, Michael Kerrisk (man-pages) wrote:
> [Adding a few people to CC that have expressed interest in the 
> progress of the updates of this page, or who may be able to
> provide review feedback. Eventually, you'll all get CCed on
> the new draft of the page.]
> 
> Hello Thomas,
> 
> On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> > On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
> >> And that universe would love to have your documentation of 
> >> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
> > 
> > I give you almost the full treatment, but I leave REQUEUE_PI to
> > Darren and FUTEX_WAKE_OP to Jakub. :)
> 
> Thank you for the great effort you put into compiling the
> text below, and apologies for my long delay in following up.
> 
> I've integrated almost all of your suggestions into the 
> manual page. I will shortly send out a new draft of the
> page that contains various FIXMEs for points that remain 
> unclear.

Michael, thanks for working on the draft!  I'll review the draft closely
once you've sent it (or have I missed it?).

There are a few things that I'd like to see covered.

First, we should discuss that users, until they control all code in the
respective process, need to expect futexes to be affected by spurious
futex_wake calls; see https://lkml.org/lkml/2014/11/27/472 for
background and Linus' choice (AFAIU) to just document this.
So, for example, a return code of 0 for FUTEX_WAIT can mean either being
woken up by a FUTEX_WAKE intended for this futex, or a stale one
intended for another futex used by, for example, glibc internally.
(Note that as explained in this thread, this isn't just a glibc
artifact, but a result of the general futex design mixed with
destruction requirements for mutexes and other constructs in C++11 and
POSIX.)
It might also be necessary to further consider this when documenting the
errors, because it does affect how to handle them. See this for a glibc
perspective:
https://sourceware.org/ml/libc-alpha/2014-09/msg00381.html

Second, the current documentation for EINTR is that it can happen due to
receiving a signal *or* due to a spurious wake-up.  This is difficult to
handle when implementing POSIX semaphores, because they require that
EINTR is returned from SEM_WAIT if and only if the interruption was due
to a signal.  Thus, if FUTEX_WAIT returns EINTR, the semaphore
implementation can't return EINTR from sem_wait; see this for more
comments, including some discussion why use cases relying on the POSIX
requirement around EINTR are borderline timing-dependent:
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/sem_waitcommon.c;h=96848d7ac5b6f8f1f3099b422deacc09323c796a;hb=HEAD#l282
Others have commented that aio_suspend has a similar issue; if EINTR
wouldn't in fact be returned spuriously, the POSIX-implementation-side
would get easier.

Third, I think it would be useful to -- somewhere -- explain which
behavior the futex operations would have conceptually when expressed by
C11 code.  We currently say that they wake up, sleep, etc, and which
values they return.  But we never say how to properly synchronize with
them on the userspace side.  The C11 memory model is probably the best
model to use on the userspace side, so that's why I'm arguing for this.
Basically, I think we need to (1) tell people that they should use
memory_order_relaxed accesses to the futex variable (ie, the memory
location associated with the whole futex construct on the kernel side --
or do we have another name for this?), and (2) give some conceptual
guarantees for the kernel-side synchronization so that one use this to
derive how to use them correctly in userspace.

The man pages might not be the right place for this, and maybe we just
need a revision of "Futexes are tricky".  If you have other suggestions
for where to document this, or on the content, let me know.  (I'm also
willing to spend time on this :) ).


Torvald




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-23 Thread Torvald Riegel
On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
>  wrote:
> 
> >Color me stupid, but I can't see this in futex_requeue(). Where is that
> >check that is "independent of the requeue type (normal/pi)"?
> >
> >When I look through futex_requeue(), all the likely looking sources
> >of EINVAL are governed by a check on the 'requeue_pi' argument.
> 
> 
> Right, in the non-PI case, I believe there are valid use cases: move to
> the back of the FIFO, for example (OK, maybe the only example?).

But we never guarantee a futex is a FIFO, or do we?  If we don't, then
such a requeue could be implemented as a no-op by the kernel, which
would sort of invalidate the use case.

(And I guess we don't want to guarantee FIFO behavior for futexes.)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-23 Thread Torvald Riegel
On Thu, 2015-01-15 at 16:10 +0100, Michael Kerrisk (man-pages) wrote:
 [Adding a few people to CC that have expressed interest in the 
 progress of the updates of this page, or who may be able to
 provide review feedback. Eventually, you'll all get CCed on
 the new draft of the page.]
 
 Hello Thomas,
 
 On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
  On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
  And that universe would love to have your documentation of 
  FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
  
  I give you almost the full treatment, but I leave REQUEUE_PI to
  Darren and FUTEX_WAKE_OP to Jakub. :)
 
 Thank you for the great effort you put into compiling the
 text below, and apologies for my long delay in following up.
 
 I've integrated almost all of your suggestions into the 
 manual page. I will shortly send out a new draft of the
 page that contains various FIXMEs for points that remain 
 unclear.

Michael, thanks for working on the draft!  I'll review the draft closely
once you've sent it (or have I missed it?).

There are a few things that I'd like to see covered.

First, we should discuss that users, until they control all code in the
respective process, need to expect futexes to be affected by spurious
futex_wake calls; see https://lkml.org/lkml/2014/11/27/472 for
background and Linus' choice (AFAIU) to just document this.
So, for example, a return code of 0 for FUTEX_WAIT can mean either being
woken up by a FUTEX_WAKE intended for this futex, or a stale one
intended for another futex used by, for example, glibc internally.
(Note that as explained in this thread, this isn't just a glibc
artifact, but a result of the general futex design mixed with
destruction requirements for mutexes and other constructs in C++11 and
POSIX.)
It might also be necessary to further consider this when documenting the
errors, because it does affect how to handle them. See this for a glibc
perspective:
https://sourceware.org/ml/libc-alpha/2014-09/msg00381.html

Second, the current documentation for EINTR is that it can happen due to
receiving a signal *or* due to a spurious wake-up.  This is difficult to
handle when implementing POSIX semaphores, because they require that
EINTR is returned from SEM_WAIT if and only if the interruption was due
to a signal.  Thus, if FUTEX_WAIT returns EINTR, the semaphore
implementation can't return EINTR from sem_wait; see this for more
comments, including some discussion why use cases relying on the POSIX
requirement around EINTR are borderline timing-dependent:
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/sem_waitcommon.c;h=96848d7ac5b6f8f1f3099b422deacc09323c796a;hb=HEAD#l282
Others have commented that aio_suspend has a similar issue; if EINTR
wouldn't in fact be returned spuriously, the POSIX-implementation-side
would get easier.

Third, I think it would be useful to -- somewhere -- explain which
behavior the futex operations would have conceptually when expressed by
C11 code.  We currently say that they wake up, sleep, etc, and which
values they return.  But we never say how to properly synchronize with
them on the userspace side.  The C11 memory model is probably the best
model to use on the userspace side, so that's why I'm arguing for this.
Basically, I think we need to (1) tell people that they should use
memory_order_relaxed accesses to the futex variable (ie, the memory
location associated with the whole futex construct on the kernel side --
or do we have another name for this?), and (2) give some conceptual
guarantees for the kernel-side synchronization so that one use this to
derive how to use them correctly in userspace.

The man pages might not be the right place for this, and maybe we just
need a revision of Futexes are tricky.  If you have other suggestions
for where to document this, or on the content, let me know.  (I'm also
willing to spend time on this :) ).


Torvald




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-23 Thread Torvald Riegel
On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
 On 1/16/15, 12:54 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 
 Color me stupid, but I can't see this in futex_requeue(). Where is that
 check that is independent of the requeue type (normal/pi)?
 
 When I look through futex_requeue(), all the likely looking sources
 of EINVAL are governed by a check on the 'requeue_pi' argument.
 
 
 Right, in the non-PI case, I believe there are valid use cases: move to
 the back of the FIFO, for example (OK, maybe the only example?).

But we never guarantee a futex is a FIFO, or do we?  If we don't, then
such a requeue could be implemented as a no-op by the kernel, which
would sort of invalidate the use case.

(And I guess we don't want to guarantee FIFO behavior for futexes.)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-19 Thread Michael Kerrisk (man-pages)
On 01/19/2015 11:45 AM, Thomas Gleixner wrote:
> On Fri, 16 Jan 2015, Darren Hart wrote:
>> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
>>  wrote:
>>
>>> On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
 On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:

> Hello Thomas,
>
> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
 [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>>
>>> ??? I added this, but does this error not occur only for PI requeues?
>>
>> It's equally wrong for normal futexes. And its actually the same code
>> checking for this for all variants.
>
> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> mean:
>
> a) This error text should be there for both normal and PI requeues

 It is there for both. The requeue code has that check independent of
 the requeue type (normal/pi). It never makes sense to requeue
 something to itself whether normal or pi futex. We added this for PI,
 because there it is harmful, but we did not special case it. So normal
 futexes get the same treatment.
>>>
>>> Hello Thomas, 
>>>
>>> Color me stupid, but I can't see this in futex_requeue(). Where is that
>>> check that is "independent of the requeue type (normal/pi)"?
>>>
>>> When I look through futex_requeue(), all the likely looking sources
>>> of EINVAL are governed by a check on the 'requeue_pi' argument.
>>
>>
>> Right, in the non-PI case, I believe there are valid use cases: move to
>> the back of the FIFO, for example (OK, maybe the only example?). Both
>> tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
>> block. The second compares the keys in case they are not FUTEX_PRIVATE
>> (uaddrs would be different, but still the same backing store).
>>
>> Thomas, am I missing a test for this someplace else?
> 
> No, I had a short look at the code misread it. So, yes, it's a valid
> operation for the non PI case. Sorry for the confusion.

Thanks for the confirmation, Thomas. Page updated to remove 
FUTEX_CMP_REQUEUE from that error case.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-19 Thread Thomas Gleixner
On Fri, 16 Jan 2015, Darren Hart wrote:
> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
>  wrote:
> 
> >On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> >> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >> 
> >>> Hello Thomas,
> >>>
> >>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>  On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> >
> > ??? I added this, but does this error not occur only for PI requeues?
> 
>  It's equally wrong for normal futexes. And its actually the same code
>  checking for this for all variants.
> >>>
> >>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> >>> mean:
> >>>
> >>> a) This error text should be there for both normal and PI requeues
> >> 
> >> It is there for both. The requeue code has that check independent of
> >> the requeue type (normal/pi). It never makes sense to requeue
> >> something to itself whether normal or pi futex. We added this for PI,
> >> because there it is harmful, but we did not special case it. So normal
> >> futexes get the same treatment.
> >
> >Hello Thomas, 
> >
> >Color me stupid, but I can't see this in futex_requeue(). Where is that
> >check that is "independent of the requeue type (normal/pi)"?
> >
> >When I look through futex_requeue(), all the likely looking sources
> >of EINVAL are governed by a check on the 'requeue_pi' argument.
> 
> 
> Right, in the non-PI case, I believe there are valid use cases: move to
> the back of the FIFO, for example (OK, maybe the only example?). Both
> tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
> block. The second compares the keys in case they are not FUTEX_PRIVATE
> (uaddrs would be different, but still the same backing store).
> 
> Thomas, am I missing a test for this someplace else?

No, I had a short look at the code misread it. So, yes, it's a valid
operation for the non PI case. Sorry for the confusion.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-19 Thread Thomas Gleixner
On Fri, 16 Jan 2015, Darren Hart wrote:
 On 1/16/15, 12:54 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 
 On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
  On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
  
  Hello Thomas,
 
  On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
  On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
  [EINVAL] uaddr equal uaddr2. Requeue to same futex.
 
  ??? I added this, but does this error not occur only for PI requeues?
 
  It's equally wrong for normal futexes. And its actually the same code
  checking for this for all variants.
 
  I don't understand equally wrong in your reply, I'm sorry. Do you
  mean:
 
  a) This error text should be there for both normal and PI requeues
  
  It is there for both. The requeue code has that check independent of
  the requeue type (normal/pi). It never makes sense to requeue
  something to itself whether normal or pi futex. We added this for PI,
  because there it is harmful, but we did not special case it. So normal
  futexes get the same treatment.
 
 Hello Thomas, 
 
 Color me stupid, but I can't see this in futex_requeue(). Where is that
 check that is independent of the requeue type (normal/pi)?
 
 When I look through futex_requeue(), all the likely looking sources
 of EINVAL are governed by a check on the 'requeue_pi' argument.
 
 
 Right, in the non-PI case, I believe there are valid use cases: move to
 the back of the FIFO, for example (OK, maybe the only example?). Both
 tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
 block. The second compares the keys in case they are not FUTEX_PRIVATE
 (uaddrs would be different, but still the same backing store).
 
 Thomas, am I missing a test for this someplace else?

No, I had a short look at the code misread it. So, yes, it's a valid
operation for the non PI case. Sorry for the confusion.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-19 Thread Michael Kerrisk (man-pages)
On 01/19/2015 11:45 AM, Thomas Gleixner wrote:
 On Fri, 16 Jan 2015, Darren Hart wrote:
 On 1/16/15, 12:54 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:

 On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
 On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:

 Hello Thomas,

 On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
 On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
 [EINVAL] uaddr equal uaddr2. Requeue to same futex.

 ??? I added this, but does this error not occur only for PI requeues?

 It's equally wrong for normal futexes. And its actually the same code
 checking for this for all variants.

 I don't understand equally wrong in your reply, I'm sorry. Do you
 mean:

 a) This error text should be there for both normal and PI requeues

 It is there for both. The requeue code has that check independent of
 the requeue type (normal/pi). It never makes sense to requeue
 something to itself whether normal or pi futex. We added this for PI,
 because there it is harmful, but we did not special case it. So normal
 futexes get the same treatment.

 Hello Thomas, 

 Color me stupid, but I can't see this in futex_requeue(). Where is that
 check that is independent of the requeue type (normal/pi)?

 When I look through futex_requeue(), all the likely looking sources
 of EINVAL are governed by a check on the 'requeue_pi' argument.


 Right, in the non-PI case, I believe there are valid use cases: move to
 the back of the FIFO, for example (OK, maybe the only example?). Both
 tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
 block. The second compares the keys in case they are not FUTEX_PRIVATE
 (uaddrs would be different, but still the same backing store).

 Thomas, am I missing a test for this someplace else?
 
 No, I had a short look at the code misread it. So, yes, it's a valid
 operation for the non PI case. Sorry for the confusion.

Thanks for the confirmation, Thomas. Page updated to remove 
FUTEX_CMP_REQUEUE from that error case.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-18 Thread Michael Kerrisk (man-pages)
Hello Darren,

On 01/17/2015 08:26 PM, Darren Hart wrote:
> 
> On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)"
>  wrote:

[...]

 In the meantime, I have a couple of questions, which, if
 you could answer them, I would work some changes into the
 page before sending.

 1. In various places, distinction is made between non-PI
   futexs and PI futexes. But what determines that distinction?
   From the kernel's perspective, hat make a futex one type
   or another? I presume it is to do with the types of blocking
   waiters on the futex, but it would be good to have a formal
   definition.
>>>
>>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no
>>> such
>>> thing as a futex", it doesn't exist as any kind of identifiable object,
>>> so
>>> these discussions can get rather confusing :-)
>>
>> So, I want to make sure that I am clear on what you mean you say this.
>> You say "there is no such thing as a futex" because from the kernel's
>> perspective there is no visible entity in the uncontended case
>> (where everything can be dealt with in user space). And from user-space,
>> in the uncontended case all we're doing is memory operations. Right?
>>
>> On the other hand, from a kernel perspective, we could say that a
>> futex "exists" in the contended phases, since the kernel has allocated
>> state associated with the uaddr. Right?
> 
> 
> Sorry, this was more anecdotal, and probably more of a distraction than
> constructive. I just meant that unlike other things which you can point to
> a specific struct for (task, rt_mutex, etc.), a "futex" has it's state
> distributed across the backing store (uaddr), the queue (futex_q), the
> pi_state, the rt_mutex, etc, and these span kernel space and userspace.
> Your description above is correct.

Okay. Thanks. I've added a few more words to the page noting that
the kernel maintains no state for a futex in the uncontended state.

>>> A "futex" becomes a PI futex when it is "created" via a PI futex op
>>> code.
>>
>> Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
>> FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
> 
> Based on your wording below about taking a user POV on this, I'm going to
> say "yes" here. These opcodes paired with the PI futex value policy
> (described below) defines a "futex" as PI aware. These were created very
> specifically in support of PI pthread_mutexes, so it makes a lot more
> sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
> there is a lot of policy and scaffolding that has to be built up around it
> to use it properly (this is what a PI pthread_mutex is).

See below.

>>> At that point, the syscall will ensure a pi_state is populated for the
>>> futex_q entry. See futex_lock_pi() for example. Before the locks are
>>> taken, there is a call to refill_pi_state_cache() which preps a pi_state
>>> for assignment later in futex_lock_pi_atomic(). This pi_state provides
>>> the
>>> necessary linkage to perform the priority boosting in the event of a
>>> priority inversion. This is handled externally from the futexes via the
>>> rt_mutex construct.
>>>
>>> Clear as mud?
>>
>> Not quite that bad, but... The thing is, still, the man page has text
>> such as the following (based on your wording):
>>
>>   FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
>>  This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
>>  Itrequeueswaitersthatareblockedvia
>>  FUTEX_WAIT_REQUEUE_PI  on uaddr from a non-PI source futex
>>  (uaddr) to a PI target futex (uaddr2).
>>
>> And elsewhere you said
>>
>>EINVAL is returned if the non-pi to pi or
>>op pairing semantics are violated.
>>
>> When someone in user-land (e.g., me) reads pieces like that, they then
>> want to find somewhere in the man page a description of what makes a
>> futex a *PI futex* and probably some statements of the distinction
>> between PI and non-PI futexes. And those statements should be from a
>> perspective that is somewhat comprehensible to user-space. I'm not
>> yet confident that I can do that. Do you care to take a shot at it?
> 
> Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy
> agreement between kernel and userspace (which is the value of the futex:
> 0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
> op codes when making the futex syscalls.

Okay -- I've attempted to capture this in some text that I added to the 
page.

> For a longer discussion of this policy, see Documentation/pi-futex.txt.

Sad to say, that document doesn't supply that much more detail, in
my reading of it, at least.

> Also note that this policy can be combined with that for robust futexes,
> adding the OWNERDIED component.

Now there's two other stories that have yet to be dealt with ;-). 

I have a FIXME already in the page regarding OWNERDIED, and

Re: futex(2) man page update help request

2015-01-18 Thread Michael Kerrisk (man-pages)
Hello Darren,

On 01/17/2015 08:26 PM, Darren Hart wrote:
 
 On 1/17/15, 1:16 AM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:

[...]

 In the meantime, I have a couple of questions, which, if
 you could answer them, I would work some changes into the
 page before sending.

 1. In various places, distinction is made between non-PI
   futexs and PI futexes. But what determines that distinction?
   From the kernel's perspective, hat make a futex one type
   or another? I presume it is to do with the types of blocking
   waiters on the futex, but it would be good to have a formal
   definition.

 You're right in that a uaddr is a uaddr is a uaddr. Also there is no
 such
 thing as a futex, it doesn't exist as any kind of identifiable object,
 so
 these discussions can get rather confusing :-)

 So, I want to make sure that I am clear on what you mean you say this.
 You say there is no such thing as a futex because from the kernel's
 perspective there is no visible entity in the uncontended case
 (where everything can be dealt with in user space). And from user-space,
 in the uncontended case all we're doing is memory operations. Right?

 On the other hand, from a kernel perspective, we could say that a
 futex exists in the contended phases, since the kernel has allocated
 state associated with the uaddr. Right?
 
 
 Sorry, this was more anecdotal, and probably more of a distraction than
 constructive. I just meant that unlike other things which you can point to
 a specific struct for (task, rt_mutex, etc.), a futex has it's state
 distributed across the backing store (uaddr), the queue (futex_q), the
 pi_state, the rt_mutex, etc, and these span kernel space and userspace.
 Your description above is correct.

Okay. Thanks. I've added a few more words to the page noting that
the kernel maintains no state for a futex in the uncontended state.

 A futex becomes a PI futex when it is created via a PI futex op
 code.

 Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
 FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
 
 Based on your wording below about taking a user POV on this, I'm going to
 say yes here. These opcodes paired with the PI futex value policy
 (described below) defines a futex as PI aware. These were created very
 specifically in support of PI pthread_mutexes, so it makes a lot more
 sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
 there is a lot of policy and scaffolding that has to be built up around it
 to use it properly (this is what a PI pthread_mutex is).

See below.

 At that point, the syscall will ensure a pi_state is populated for the
 futex_q entry. See futex_lock_pi() for example. Before the locks are
 taken, there is a call to refill_pi_state_cache() which preps a pi_state
 for assignment later in futex_lock_pi_atomic(). This pi_state provides
 the
 necessary linkage to perform the priority boosting in the event of a
 priority inversion. This is handled externally from the futexes via the
 rt_mutex construct.

 Clear as mud?

 Not quite that bad, but... The thing is, still, the man page has text
 such as the following (based on your wording):

   FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
  This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
  Itrequeueswaitersthatareblockedvia
  FUTEX_WAIT_REQUEUE_PI  on uaddr from a non-PI source futex
  (uaddr) to a PI target futex (uaddr2).

 And elsewhere you said

EINVAL is returned if the non-pi to pi or
op pairing semantics are violated.

 When someone in user-land (e.g., me) reads pieces like that, they then
 want to find somewhere in the man page a description of what makes a
 futex a *PI futex* and probably some statements of the distinction
 between PI and non-PI futexes. And those statements should be from a
 perspective that is somewhat comprehensible to user-space. I'm not
 yet confident that I can do that. Do you care to take a shot at it?
 
 Hrm, tricky indeed. From userspace, what makes a futex PI is the policy
 agreement between kernel and userspace (which is the value of the futex:
 0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
 op codes when making the futex syscalls.

Okay -- I've attempted to capture this in some text that I added to the 
page.

 For a longer discussion of this policy, see Documentation/pi-futex.txt.

Sad to say, that document doesn't supply that much more detail, in
my reading of it, at least.

 Also note that this policy can be combined with that for robust futexes,
 adding the OWNERDIED component.

Now there's two other stories that have yet to be dealt with ;-). 

I have a FIXME already in the page regarding OWNERDIED, and
get_robust_list(2) is another page that seems like it could do with 
a fair bit of work, but that's a story for another day.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages 

Re: futex(2) man page update help request

2015-01-17 Thread Darren Hart

On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)"
 wrote:

>Hello Darren,
>
>On 01/17/2015 02:33 AM, Darren Hart wrote:
>> Corrected Davidlohr's email address.
>
>Thanks!
>
>> On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
>>  wrote:
>> 
>>> Hello Darren,
>>>
>>> I give you the same apology as to Thomas for the
>>> long-delayed response to your mail.
>>>
>>> And I repeat my note to Thomas:
>>> In the next day or two, I hope to send out the new version
>>> of the futex(2) page for review. The new draft is a bit
>>> bigger (okay -- 4 x bigger) than the current page. And there
>>> are a quite number of FIXMEs that I've placed in the page
>>> for various points--some minor, but a few major--that need
>>> to be checked or fixed. Would you have some time to review
>>> that page?
>> 
>> I'll make the time for that. I've wanted to see this for a while, so
>>thank
>> you for working on it!
>
>Great!
>
>>> In the meantime, I have a couple of questions, which, if
>>> you could answer them, I would work some changes into the
>>> page before sending.
>>>
>>> 1. In various places, distinction is made between non-PI
>>>   futexs and PI futexes. But what determines that distinction?
>>>   From the kernel's perspective, hat make a futex one type
>>>   or another? I presume it is to do with the types of blocking
>>>   waiters on the futex, but it would be good to have a formal
>>>   definition.
>> 
>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no
>>such
>> thing as a futex", it doesn't exist as any kind of identifiable object,
>>so
>> these discussions can get rather confusing :-)
>
>So, I want to make sure that I am clear on what you mean you say this.
>You say "there is no such thing as a futex" because from the kernel's
>perspective there is no visible entity in the uncontended case
>(where everything can be dealt with in user space). And from user-space,
>in the uncontended case all we're doing is memory operations. Right?
>
>On the other hand, from a kernel perspective, we could say that a
>futex "exists" in the contended phases, since the kernel has allocated
>state associated with the uaddr. Right?


Sorry, this was more anecdotal, and probably more of a distraction than
constructive. I just meant that unlike other things which you can point to
a specific struct for (task, rt_mutex, etc.), a "futex" has it's state
distributed across the backing store (uaddr), the queue (futex_q), the
pi_state, the rt_mutex, etc, and these span kernel space and userspace.
Your description above is correct.

>
>> A "futex" becomes a PI futex when it is "created" via a PI futex op
>>code.
>
>Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
>FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?

Based on your wording below about taking a user POV on this, I'm going to
say "yes" here. These opcodes paired with the PI futex value policy
(described below) defines a "futex" as PI aware. These were created very
specifically in support of PI pthread_mutexes, so it makes a lot more
sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
there is a lot of policy and scaffolding that has to be built up around it
to use it properly (this is what a PI pthread_mutex is).

>> At that point, the syscall will ensure a pi_state is populated for the
>> futex_q entry. See futex_lock_pi() for example. Before the locks are
>> taken, there is a call to refill_pi_state_cache() which preps a pi_state
>> for assignment later in futex_lock_pi_atomic(). This pi_state provides
>>the
>> necessary linkage to perform the priority boosting in the event of a
>> priority inversion. This is handled externally from the futexes via the
>> rt_mutex construct.
>> 
>> Clear as mud?
>
>Not quite that bad, but... The thing is, still, the man page has text
>such as the following (based on your wording):
>
>   FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
>  This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
>  Itrequeueswaitersthatareblockedvia
>  FUTEX_WAIT_REQUEUE_PI  on uaddr from a non-PI source futex
>  (uaddr) to a PI target futex (uaddr2).
>
>And elsewhere you said
>
>EINVAL is returned if the non-pi to pi or
>op pairing semantics are violated.
>
>When someone in user-land (e.g., me) reads pieces like that, they then
>want to find somewhere in the man page a description of what makes a
>futex a *PI futex* and probably some statements of the distinction
>between PI and non-PI futexes. And those statements should be from a
>perspective that is somewhat comprehensible to user-space. I'm not
>yet confident that I can do that. Do you care to take a shot at it?

Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy
agreement between kernel and userspace (which is the value of the futex:
0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
op codes when making 

Re: futex(2) man page update help request

2015-01-17 Thread Michael Kerrisk (man-pages)
Hello Darren,

On 01/17/2015 02:33 AM, Darren Hart wrote:
> Corrected Davidlohr's email address.

Thanks!

> On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
>  wrote:
> 
>> Hello Darren,
>>
>> I give you the same apology as to Thomas for the
>> long-delayed response to your mail.
>>
>> And I repeat my note to Thomas:
>> In the next day or two, I hope to send out the new version
>> of the futex(2) page for review. The new draft is a bit
>> bigger (okay -- 4 x bigger) than the current page. And there
>> are a quite number of FIXMEs that I've placed in the page
>> for various points--some minor, but a few major--that need
>> to be checked or fixed. Would you have some time to review
>> that page?
> 
> I'll make the time for that. I've wanted to see this for a while, so thank
> you for working on it!

Great!

>> In the meantime, I have a couple of questions, which, if
>> you could answer them, I would work some changes into the
>> page before sending.
>>
>> 1. In various places, distinction is made between non-PI
>>   futexs and PI futexes. But what determines that distinction?
>>   From the kernel's perspective, hat make a futex one type
>>   or another? I presume it is to do with the types of blocking
>>   waiters on the futex, but it would be good to have a formal
>>   definition.
> 
> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no such
> thing as a futex", it doesn't exist as any kind of identifiable object, so
> these discussions can get rather confusing :-)

So, I want to make sure that I am clear on what you mean you say this.
You say "there is no such thing as a futex" because from the kernel's
perspective there is no visible entity in the uncontended case
(where everything can be dealt with in user space). And from user-space,
in the uncontended case all we're doing is memory operations. Right?

On the other hand, from a kernel perspective, we could say that a 
futex "exists" in the contended phases, since the kernel has allocated
state associated with the uaddr. Right?

> A "futex" becomes a PI futex when it is "created" via a PI futex op code.

Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?

> At that point, the syscall will ensure a pi_state is populated for the
> futex_q entry. See futex_lock_pi() for example. Before the locks are
> taken, there is a call to refill_pi_state_cache() which preps a pi_state
> for assignment later in futex_lock_pi_atomic(). This pi_state provides the
> necessary linkage to perform the priority boosting in the event of a
> priority inversion. This is handled externally from the futexes via the
> rt_mutex construct.
> 
> Clear as mud?

Not quite that bad, but... The thing is, still, the man page has text
such as the following (based on your wording):

   FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
  This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
  Itrequeueswaitersthatareblockedvia
  FUTEX_WAIT_REQUEUE_PI  on uaddr from a non-PI source futex
  (uaddr) to a PI target futex (uaddr2).

And elsewhere you said

EINVAL is returned if the non-pi to pi or 
op pairing semantics are violated.

When someone in user-land (e.g., me) reads pieces like that, they then 
want to find somewhere in the man page a description of what makes a 
futex a *PI futex* and probably some statements of the distinction 
between PI and non-PI futexes. And those statements should be from a 
perspective that is somewhat comprehensible to user-space. I'm not
yet confident that I can do that. Do you care to take a shot at it?

>> 2. Can you say something about the pairing requirements of
>>   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
>>   What is the requirement and why do we need it?
> 
> Briefly, these op codes exist to support a fairly specific use case:
> support for PI aware pthread condvars (glibc patch acceptance STILL
> PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! 

Yes, Jan Kiszka recently alerted me to the existence of 
https://sourceware.org/bugzilla/show_bug.cgi?id=11588
and I still have some text that you proposed (mail titled
("Pthread Condition Variables and Priority Inversion")
quite a long time ago for the pthread_cond_timedwait() page.
One day, when that page exists, I'll try to remember to add it.

> But is shipped with various
> PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
> of the logic can happen on the kernel side (to preserve ownership of an
> rt_mutex with waiters), so in order to ensure userspace and kernelspace
> remain in sync, we pre-specify the target of the requeue in
> futex_wait_requeue_pi. This also limits the attack surface by only
> supporting exactly what it was meant to do. The corner cases get insane
> otherwise.

Thanks. I've added some text on pairing, based on your text above.

> We could walk through the various 

Re: futex(2) man page update help request

2015-01-17 Thread Darren Hart

On 1/17/15, 1:16 AM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:

Hello Darren,

On 01/17/2015 02:33 AM, Darren Hart wrote:
 Corrected Davidlohr's email address.

Thanks!

 On 1/15/15, 7:12 AM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 
 Hello Darren,

 I give you the same apology as to Thomas for the
 long-delayed response to your mail.

 And I repeat my note to Thomas:
 In the next day or two, I hope to send out the new version
 of the futex(2) page for review. The new draft is a bit
 bigger (okay -- 4 x bigger) than the current page. And there
 are a quite number of FIXMEs that I've placed in the page
 for various points--some minor, but a few major--that need
 to be checked or fixed. Would you have some time to review
 that page?
 
 I'll make the time for that. I've wanted to see this for a while, so
thank
 you for working on it!

Great!

 In the meantime, I have a couple of questions, which, if
 you could answer them, I would work some changes into the
 page before sending.

 1. In various places, distinction is made between non-PI
   futexs and PI futexes. But what determines that distinction?
   From the kernel's perspective, hat make a futex one type
   or another? I presume it is to do with the types of blocking
   waiters on the futex, but it would be good to have a formal
   definition.
 
 You're right in that a uaddr is a uaddr is a uaddr. Also there is no
such
 thing as a futex, it doesn't exist as any kind of identifiable object,
so
 these discussions can get rather confusing :-)

So, I want to make sure that I am clear on what you mean you say this.
You say there is no such thing as a futex because from the kernel's
perspective there is no visible entity in the uncontended case
(where everything can be dealt with in user space). And from user-space,
in the uncontended case all we're doing is memory operations. Right?

On the other hand, from a kernel perspective, we could say that a
futex exists in the contended phases, since the kernel has allocated
state associated with the uaddr. Right?


Sorry, this was more anecdotal, and probably more of a distraction than
constructive. I just meant that unlike other things which you can point to
a specific struct for (task, rt_mutex, etc.), a futex has it's state
distributed across the backing store (uaddr), the queue (futex_q), the
pi_state, the rt_mutex, etc, and these span kernel space and userspace.
Your description above is correct.


 A futex becomes a PI futex when it is created via a PI futex op
code.

Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?

Based on your wording below about taking a user POV on this, I'm going to
say yes here. These opcodes paired with the PI futex value policy
(described below) defines a futex as PI aware. These were created very
specifically in support of PI pthread_mutexes, so it makes a lot more
sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
there is a lot of policy and scaffolding that has to be built up around it
to use it properly (this is what a PI pthread_mutex is).

 At that point, the syscall will ensure a pi_state is populated for the
 futex_q entry. See futex_lock_pi() for example. Before the locks are
 taken, there is a call to refill_pi_state_cache() which preps a pi_state
 for assignment later in futex_lock_pi_atomic(). This pi_state provides
the
 necessary linkage to perform the priority boosting in the event of a
 priority inversion. This is handled externally from the futexes via the
 rt_mutex construct.
 
 Clear as mud?

Not quite that bad, but... The thing is, still, the man page has text
such as the following (based on your wording):

   FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
  This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
  Itrequeueswaitersthatareblockedvia
  FUTEX_WAIT_REQUEUE_PI  on uaddr from a non-PI source futex
  (uaddr) to a PI target futex (uaddr2).

And elsewhere you said

EINVAL is returned if the non-pi to pi or
op pairing semantics are violated.

When someone in user-land (e.g., me) reads pieces like that, they then
want to find somewhere in the man page a description of what makes a
futex a *PI futex* and probably some statements of the distinction
between PI and non-PI futexes. And those statements should be from a
perspective that is somewhat comprehensible to user-space. I'm not
yet confident that I can do that. Do you care to take a shot at it?

Hrm, tricky indeed. From userspace, what makes a futex PI is the policy
agreement between kernel and userspace (which is the value of the futex:
0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
op codes when making the futex syscalls.

For a longer discussion of this policy, see Documentation/pi-futex.txt.
Also note that this policy can be combined with 

Re: futex(2) man page update help request

2015-01-17 Thread Michael Kerrisk (man-pages)
Hello Darren,

On 01/17/2015 02:33 AM, Darren Hart wrote:
 Corrected Davidlohr's email address.

Thanks!

 On 1/15/15, 7:12 AM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 
 Hello Darren,

 I give you the same apology as to Thomas for the
 long-delayed response to your mail.

 And I repeat my note to Thomas:
 In the next day or two, I hope to send out the new version
 of the futex(2) page for review. The new draft is a bit
 bigger (okay -- 4 x bigger) than the current page. And there
 are a quite number of FIXMEs that I've placed in the page
 for various points--some minor, but a few major--that need
 to be checked or fixed. Would you have some time to review
 that page?
 
 I'll make the time for that. I've wanted to see this for a while, so thank
 you for working on it!

Great!

 In the meantime, I have a couple of questions, which, if
 you could answer them, I would work some changes into the
 page before sending.

 1. In various places, distinction is made between non-PI
   futexs and PI futexes. But what determines that distinction?
   From the kernel's perspective, hat make a futex one type
   or another? I presume it is to do with the types of blocking
   waiters on the futex, but it would be good to have a formal
   definition.
 
 You're right in that a uaddr is a uaddr is a uaddr. Also there is no such
 thing as a futex, it doesn't exist as any kind of identifiable object, so
 these discussions can get rather confusing :-)

So, I want to make sure that I am clear on what you mean you say this.
You say there is no such thing as a futex because from the kernel's
perspective there is no visible entity in the uncontended case
(where everything can be dealt with in user space). And from user-space,
in the uncontended case all we're doing is memory operations. Right?

On the other hand, from a kernel perspective, we could say that a 
futex exists in the contended phases, since the kernel has allocated
state associated with the uaddr. Right?

 A futex becomes a PI futex when it is created via a PI futex op code.

Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?

 At that point, the syscall will ensure a pi_state is populated for the
 futex_q entry. See futex_lock_pi() for example. Before the locks are
 taken, there is a call to refill_pi_state_cache() which preps a pi_state
 for assignment later in futex_lock_pi_atomic(). This pi_state provides the
 necessary linkage to perform the priority boosting in the event of a
 priority inversion. This is handled externally from the futexes via the
 rt_mutex construct.
 
 Clear as mud?

Not quite that bad, but... The thing is, still, the man page has text
such as the following (based on your wording):

   FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
  This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
  Itrequeueswaitersthatareblockedvia
  FUTEX_WAIT_REQUEUE_PI  on uaddr from a non-PI source futex
  (uaddr) to a PI target futex (uaddr2).

And elsewhere you said

EINVAL is returned if the non-pi to pi or 
op pairing semantics are violated.

When someone in user-land (e.g., me) reads pieces like that, they then 
want to find somewhere in the man page a description of what makes a 
futex a *PI futex* and probably some statements of the distinction 
between PI and non-PI futexes. And those statements should be from a 
perspective that is somewhat comprehensible to user-space. I'm not
yet confident that I can do that. Do you care to take a shot at it?

 2. Can you say something about the pairing requirements of
   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
   What is the requirement and why do we need it?
 
 Briefly, these op codes exist to support a fairly specific use case:
 support for PI aware pthread condvars (glibc patch acceptance STILL
 PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! 

Yes, Jan Kiszka recently alerted me to the existence of 
https://sourceware.org/bugzilla/show_bug.cgi?id=11588
and I still have some text that you proposed (mail titled
(Pthread Condition Variables and Priority Inversion)
quite a long time ago for the pthread_cond_timedwait() page.
One day, when that page exists, I'll try to remember to add it.

 But is shipped with various
 PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
 of the logic can happen on the kernel side (to preserve ownership of an
 rt_mutex with waiters), so in order to ensure userspace and kernelspace
 remain in sync, we pre-specify the target of the requeue in
 futex_wait_requeue_pi. This also limits the attack surface by only
 supporting exactly what it was meant to do. The corner cases get insane
 otherwise.

Thanks. I've added some text on pairing, based on your text above.

 We could walk through the various ways in which it would break if these
 pairing restrictions were not in 

Re: futex(2) man page update help request

2015-01-16 Thread Darren Hart
Corrected Davidlohr's email address.

On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
 wrote:

>Hello Darren,
>
>I give you the same apology as to Thomas for the
>long-delayed response to your mail.
>
>And I repeat my note to Thomas:
>In the next day or two, I hope to send out the new version
>of the futex(2) page for review. The new draft is a bit
>bigger (okay -- 4 x bigger) than the current page. And there
>are a quite number of FIXMEs that I've placed in the page
>for various points--some minor, but a few major--that need
>to be checked or fixed. Would you have some time to review
>that page?

I'll make the time for that. I've wanted to see this for a while, so thank
you for working on it!

> 
>
>In the meantime, I have a couple of questions, which, if
>you could answer them, I would work some changes into the
>page before sending.
>
>1. In various places, distinction is made between non-PI
>   futexs and PI futexes. But what determines that distinction?
>   From the kernel's perspective, hat make a futex one type
>   or another? I presume it is to do with the types of blocking
>   waiters on the futex, but it would be good to have a formal
>   definition.

You're right in that a uaddr is a uaddr is a uaddr. Also "there is no such
thing as a futex", it doesn't exist as any kind of identifiable object, so
these discussions can get rather confusing :-)

A "futex" becomes a PI futex when it is "created" via a PI futex op code.
At that point, the syscall will ensure a pi_state is populated for the
futex_q entry. See futex_lock_pi() for example. Before the locks are
taken, there is a call to refill_pi_state_cache() which preps a pi_state
for assignment later in futex_lock_pi_atomic(). This pi_state provides the
necessary linkage to perform the priority boosting in the event of a
priority inversion. This is handled externally from the futexes via the
rt_mutex construct.

Clear as mud?


>
>2. Can you say something about the pairing requirements of
>   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
>   What is the requirement and why do we need it?

Briefly, these op codes exist to support a fairly specific use case:
support for PI aware pthread condvars (glibc patch acceptance STILL
PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! But is shipped with various
PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
of the logic can happen on the kernel side (to preserve ownership of an
rt_mutex with waiters), so in order to ensure userspace and kernelspace
remain in sync, we pre-specify the target of the requeue in
futex_wait_requeue_pi. This also limits the attack surface by only
supporting exactly what it was meant to do. The corner cases get insane
otherwise.

We could walk through the various ways in which it would break if these
pairing restrictions were not in place, but I'll have to take some serious
time to page all those into working memory. Let me know if we need more
detail here and I will.

>
>Most of the rest of this mail is just a checklist noting
>what I did with your comments. No response is needed
>in most cases, but there is one that I have marked with
>"???". If you could reply to that. I'd be grateful.

...

>> For all the PI opcodes, we should probably mention something about the
>> futex value scheme (TID), whereas the other opcodes do not require any
>> specific value scheme.
>> 
>> No Owner:0
>> Owner:   TID
>> Waiters: TID | FUTEX_WAITERS
>> 
>> This is the relevant section from the referenced paper:
>>  
>> The PI futex operations diverge from the oth-
>> ers in that they impose a policy describing how
>> the futex value is to be used. If the lock is un-
>> owned, the futex value shall be 0. If owned, it
>> shall be the thread id (tid) of the owning thread.
>> If there are threads contending for the lock, then
>> the FUTEX_WAITERS flag is set. With this policy in
>> place, userspace can atomically acquire an unowned
>> lock or release an uncontended lock using an atomic
>> instruction and their own tid. A non-zero futex
>> value will force waiters into the kernel to lock. The
>> FUTEX_WAITERS flag forces the owner into the kernel
>> to unlock. If the callers are forced into the kernel,
>> they then deal directly with an underlying rt_mutex
>> which implements the priority inheritance semantics.
>> After the rt_mutex is acquired, the futex value is up-
>> dated accordingly, before the calling thread returns
>> to userspace.
>>
>> It is important to note that the kernel will update the futex value
>>prior
>> to returning to userspace. Unlike other futex op codes,
>> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are
>>designed
>> for the implementation of very specific IPC mechanisms).
>
>??? Great text. May I presume that I can take this text
>and freely adapt it for the man page? (Actually, this is a
>request for forgiveness, rather than permission :-).)

Thanks, and no objection from me.

--
Darren 

Re: futex(2) man page update help request

2015-01-16 Thread Darren Hart
On 1/16/15, 4:56 PM, "Davidlohr Bueso"  wrote:


>On Fri, 2015-01-16 at 21:54 +0100, Michael Kerrisk (man-pages) wrote:
>> On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
>> > On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> > 
>> >> Hello Thomas,
>> >>
>> >> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>> >>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> > [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>> 
>>  ??? I added this, but does this error not occur only for PI
>>requeues?
>> >>>
>> >>> It's equally wrong for normal futexes. And its actually the same
>>code
>> >>> checking for this for all variants.
>> >>
>> >> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>> >> mean:
>> >>
>> >> a) This error text should be there for both normal and PI requeues
>> > 
>> > It is there for both. The requeue code has that check independent of
>> > the requeue type (normal/pi). It never makes sense to requeue
>> > something to itself whether normal or pi futex. We added this for PI,
>> > because there it is harmful, but we did not special case it. So normal
>> > futexes get the same treatment.
>> 
>> Hello Thomas, 
>> 
>> Color me stupid, but I can't see this in futex_requeue(). Where is that
>> check that is "independent of the requeue type (normal/pi)"?
>> 
>> When I look through futex_requeue(), all the likely looking sources
>> of EINVAL are governed by a check on the 'requeue_pi' argument.
>
>Yeah, its not very straightforward, I was also scratching my head. First
>we do:
>
>   if (requeue_pi) {
>   /*
>* Requeue PI only works on two distinct uaddrs. This
>* check is only valid for private futexes. See below.
>*/
>   if (uaddr1 == uaddr2)
>   return -EINVAL;

We check here to abort as early as possible for the usual security reasons.

>
>Then:
>
>   /*
>* The check above which compares uaddrs is not sufficient for
>* shared futexes. We need to compare the keys:
>*/
>   if (requeue_pi && match_futex(, )) {
>   ret = -EINVAL;
>   goto out_put_keys;
>   }
>
>I wonder why we're checking for requeue_pi again, when, at least
>according to the comments, it should be for shared. I guess it would
>make sense depending on the mappings as the keys are the only true way
>of determining if both futexes are the same, so perhaps:
>
>   if ((requeue_pi || (flags & FLAGS_SHARED)) && match_futex())

No, the rule only applies to requeue_pi. This check is the for-sure
version of the uaddr comparison above. We could add if flags &
FLAGS_SHARED, but I'm not sure it's worth it.

--
Darren Hart
Intel Open Source Technology Center



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Davidlohr Bueso
On Fri, 2015-01-16 at 21:54 +0100, Michael Kerrisk (man-pages) wrote:
> On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> > On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> > 
> >> Hello Thomas,
> >>
> >> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> >>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> > [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> 
>  ??? I added this, but does this error not occur only for PI requeues?
> >>>
> >>> It's equally wrong for normal futexes. And its actually the same code
> >>> checking for this for all variants.
> >>
> >> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> >> mean:
> >>
> >> a) This error text should be there for both normal and PI requeues
> > 
> > It is there for both. The requeue code has that check independent of
> > the requeue type (normal/pi). It never makes sense to requeue
> > something to itself whether normal or pi futex. We added this for PI,
> > because there it is harmful, but we did not special case it. So normal
> > futexes get the same treatment.
> 
> Hello Thomas, 
> 
> Color me stupid, but I can't see this in futex_requeue(). Where is that
> check that is "independent of the requeue type (normal/pi)"?
> 
> When I look through futex_requeue(), all the likely looking sources
> of EINVAL are governed by a check on the 'requeue_pi' argument.

Yeah, its not very straightforward, I was also scratching my head. First
we do:

if (requeue_pi) {
/*
 * Requeue PI only works on two distinct uaddrs. This
 * check is only valid for private futexes. See below.
 */
if (uaddr1 == uaddr2)
return -EINVAL;

Then:

/*
 * The check above which compares uaddrs is not sufficient for
 * shared futexes. We need to compare the keys:
 */
if (requeue_pi && match_futex(, )) {
ret = -EINVAL;
goto out_put_keys;
}

I wonder why we're checking for requeue_pi again, when, at least
according to the comments, it should be for shared. I guess it would
make sense depending on the mappings as the keys are the only true way
of determining if both futexes are the same, so perhaps:

if ((requeue_pi || (flags & FLAGS_SHARED)) && match_futex())

That would also align with the retry labels.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Darren Hart




On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
 wrote:

>On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
>> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> 
>>> Hello Thomas,
>>>
>>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
 On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> ??? I added this, but does this error not occur only for PI requeues?

 It's equally wrong for normal futexes. And its actually the same code
 checking for this for all variants.
>>>
>>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>>> mean:
>>>
>>> a) This error text should be there for both normal and PI requeues
>> 
>> It is there for both. The requeue code has that check independent of
>> the requeue type (normal/pi). It never makes sense to requeue
>> something to itself whether normal or pi futex. We added this for PI,
>> because there it is harmful, but we did not special case it. So normal
>> futexes get the same treatment.
>
>Hello Thomas, 
>
>Color me stupid, but I can't see this in futex_requeue(). Where is that
>check that is "independent of the requeue type (normal/pi)"?
>
>When I look through futex_requeue(), all the likely looking sources
>of EINVAL are governed by a check on the 'requeue_pi' argument.


Right, in the non-PI case, I believe there are valid use cases: move to
the back of the FIFO, for example (OK, maybe the only example?). Both
tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
block. The second compares the keys in case they are not FUTEX_PRIVATE
(uaddrs would be different, but still the same backing store).

Thomas, am I missing a test for this someplace else?


-- 
Darren Hart
Intel Open Source Technology Center



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Michael Kerrisk (man-pages)
On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> 
>> Hello Thomas,
>>
>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.

 ??? I added this, but does this error not occur only for PI requeues?
>>>
>>> It's equally wrong for normal futexes. And its actually the same code
>>> checking for this for all variants.
>>
>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>> mean:
>>
>> a) This error text should be there for both normal and PI requeues
> 
> It is there for both. The requeue code has that check independent of
> the requeue type (normal/pi). It never makes sense to requeue
> something to itself whether normal or pi futex. We added this for PI,
> because there it is harmful, but we did not special case it. So normal
> futexes get the same treatment.

Hello Thomas, 

Color me stupid, but I can't see this in futex_requeue(). Where is that
check that is "independent of the requeue type (normal/pi)"?

When I look through futex_requeue(), all the likely looking sources
of EINVAL are governed by a check on the 'requeue_pi' argument.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Thomas Gleixner
On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:

> Hello Thomas,
> 
> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> > On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> >>
> >> ??? I added this, but does this error not occur only for PI requeues?
> > 
> > It's equally wrong for normal futexes. And its actually the same code
> > checking for this for all variants.
> 
> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> mean:
> 
> a) This error text should be there for both normal and PI requeues

It is there for both. The requeue code has that check independent of
the requeue type (normal/pi). It never makes sense to requeue
something to itself whether normal or pi futex. We added this for PI,
because there it is harmful, but we did not special case it. So normal
futexes get the same treatment.

Thanks,

tglx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Michael Kerrisk (man-pages)
Hello Thomas,

On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>
>> ??? I added this, but does this error not occur only for PI requeues?
> 
> It's equally wrong for normal futexes. And its actually the same code
> checking for this for all variants.

I don't understand "equally wrong" in your reply, I'm sorry. Do you
mean:

a) This error text should be there for both normal and PI requeues
OR
a) This error text should be there for neither normal nor PI requeues

>>> [EDEADLOCK] The futex is already locked by the caller or the kernel 
>>> detected a deadlock scenario in a nested lock chain
>>
>> Added.
> 
> It's actually EDEADLK

Yes, sorry -- I should have said that I already found and fixed 
that problem.

>>> [EOWNERDIED] The owner of the futex died and the kernel made the 
>>> caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
>>> futex userspace value. Caller is responsible for cleanup
>>
>> There is no such thing as an EOWNERDIED error. I had a look
>> through the kernel source for the FUTEX_OWNER_DIED cases and didn't 
>> see an obvious error associated with them. Can you clarify? (I think 
>> the point is that this condition, which is described in
>> Documentation/robust-futexes.txt, is not an error as such. However, I'm
>> not yet sure of how to describe it in the man page.)
>> I will add this point as a FIXME in the new draft man page.
> 
> Oops. My bad. That's not the what the kernel does. The kernel merily
> marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
> to deal with that and the posix users return EOWNERDEAD (not
> EOWNERDIED], so it's not part of the futex call itself.
> 
> We had discussions about returning EOWNERDEAD in that case, but then
> glibc with its sophisticated error handling prevented that 

Okay. I'll add a FIXME to the draft page, to see if we get some good 
text together to describe FUTEX_OWNER_DIED and how it is used.

>>> FUTEX_TRYLOCK_PI
>>>
>>> This operation tries to acquire the futex at uaddr. It deals with the
>>> situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
>>> bit is set. User space cannot handle this race free.
>>
>> Added.
>>
>>> The arguments uaddr2, val, timeout and val3 are ignored.
>>
>> ??? But the code reads:
>>
>> case FUTEX_TRYLOCK_PI:
>> return futex_lock_pi(uaddr, flags, 0, timeout, 1);
>>  
>> which momentarily misleads one into thinking that 'timeout' is used.
>> And: it's not quite ignored, since in futex_lock_pi() a non-NULL
>> 'timeout' is unconditionally dereferenced (meaning you could get
>> an EFAULT error for a bad 'timeout' pointer).
>> I'm confused
> 
> Indeed. That's just wrong.
>  
>> Maybe the above code should be
>>
>> case FUTEX_TRYLOCK_PI:
>> return futex_lock_pi(uaddr, flags, 0, NULL, 1);
>> ?
> 
> Care to send a patch?

Will do.
  
[...]

>> ??? I don't believe this can happen. 'val3' is internally set to
>> FUTEX_BITSET_MATCH_ANY. Can you confirm?
> 
> Right. We dont support that bitset stuff in requeue_pi ATM.

Thanks for the confirmation.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Michael Kerrisk (man-pages)
On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
 On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
 
 Hello Thomas,

 On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
 On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
 [EINVAL] uaddr equal uaddr2. Requeue to same futex.

 ??? I added this, but does this error not occur only for PI requeues?

 It's equally wrong for normal futexes. And its actually the same code
 checking for this for all variants.

 I don't understand equally wrong in your reply, I'm sorry. Do you
 mean:

 a) This error text should be there for both normal and PI requeues
 
 It is there for both. The requeue code has that check independent of
 the requeue type (normal/pi). It never makes sense to requeue
 something to itself whether normal or pi futex. We added this for PI,
 because there it is harmful, but we did not special case it. So normal
 futexes get the same treatment.

Hello Thomas, 

Color me stupid, but I can't see this in futex_requeue(). Where is that
check that is independent of the requeue type (normal/pi)?

When I look through futex_requeue(), all the likely looking sources
of EINVAL are governed by a check on the 'requeue_pi' argument.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Darren Hart




On 1/16/15, 12:54 PM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:

On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
 On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
 
 Hello Thomas,

 On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
 On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
 [EINVAL] uaddr equal uaddr2. Requeue to same futex.

 ??? I added this, but does this error not occur only for PI requeues?

 It's equally wrong for normal futexes. And its actually the same code
 checking for this for all variants.

 I don't understand equally wrong in your reply, I'm sorry. Do you
 mean:

 a) This error text should be there for both normal and PI requeues
 
 It is there for both. The requeue code has that check independent of
 the requeue type (normal/pi). It never makes sense to requeue
 something to itself whether normal or pi futex. We added this for PI,
 because there it is harmful, but we did not special case it. So normal
 futexes get the same treatment.

Hello Thomas, 

Color me stupid, but I can't see this in futex_requeue(). Where is that
check that is independent of the requeue type (normal/pi)?

When I look through futex_requeue(), all the likely looking sources
of EINVAL are governed by a check on the 'requeue_pi' argument.


Right, in the non-PI case, I believe there are valid use cases: move to
the back of the FIFO, for example (OK, maybe the only example?). Both
tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
block. The second compares the keys in case they are not FUTEX_PRIVATE
(uaddrs would be different, but still the same backing store).

Thomas, am I missing a test for this someplace else?


-- 
Darren Hart
Intel Open Source Technology Center



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Davidlohr Bueso
On Fri, 2015-01-16 at 21:54 +0100, Michael Kerrisk (man-pages) wrote:
 On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
  On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
  
  Hello Thomas,
 
  On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
  On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
  [EINVAL] uaddr equal uaddr2. Requeue to same futex.
 
  ??? I added this, but does this error not occur only for PI requeues?
 
  It's equally wrong for normal futexes. And its actually the same code
  checking for this for all variants.
 
  I don't understand equally wrong in your reply, I'm sorry. Do you
  mean:
 
  a) This error text should be there for both normal and PI requeues
  
  It is there for both. The requeue code has that check independent of
  the requeue type (normal/pi). It never makes sense to requeue
  something to itself whether normal or pi futex. We added this for PI,
  because there it is harmful, but we did not special case it. So normal
  futexes get the same treatment.
 
 Hello Thomas, 
 
 Color me stupid, but I can't see this in futex_requeue(). Where is that
 check that is independent of the requeue type (normal/pi)?
 
 When I look through futex_requeue(), all the likely looking sources
 of EINVAL are governed by a check on the 'requeue_pi' argument.

Yeah, its not very straightforward, I was also scratching my head. First
we do:

if (requeue_pi) {
/*
 * Requeue PI only works on two distinct uaddrs. This
 * check is only valid for private futexes. See below.
 */
if (uaddr1 == uaddr2)
return -EINVAL;

Then:

/*
 * The check above which compares uaddrs is not sufficient for
 * shared futexes. We need to compare the keys:
 */
if (requeue_pi  match_futex(key1, key2)) {
ret = -EINVAL;
goto out_put_keys;
}

I wonder why we're checking for requeue_pi again, when, at least
according to the comments, it should be for shared. I guess it would
make sense depending on the mappings as the keys are the only true way
of determining if both futexes are the same, so perhaps:

if ((requeue_pi || (flags  FLAGS_SHARED))  match_futex())

That would also align with the retry labels.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Darren Hart
On 1/16/15, 4:56 PM, Davidlohr Bueso d...@stgolabs.net wrote:


On Fri, 2015-01-16 at 21:54 +0100, Michael Kerrisk (man-pages) wrote:
 On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
  On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
  
  Hello Thomas,
 
  On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
  On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
  [EINVAL] uaddr equal uaddr2. Requeue to same futex.
 
  ??? I added this, but does this error not occur only for PI
requeues?
 
  It's equally wrong for normal futexes. And its actually the same
code
  checking for this for all variants.
 
  I don't understand equally wrong in your reply, I'm sorry. Do you
  mean:
 
  a) This error text should be there for both normal and PI requeues
  
  It is there for both. The requeue code has that check independent of
  the requeue type (normal/pi). It never makes sense to requeue
  something to itself whether normal or pi futex. We added this for PI,
  because there it is harmful, but we did not special case it. So normal
  futexes get the same treatment.
 
 Hello Thomas, 
 
 Color me stupid, but I can't see this in futex_requeue(). Where is that
 check that is independent of the requeue type (normal/pi)?
 
 When I look through futex_requeue(), all the likely looking sources
 of EINVAL are governed by a check on the 'requeue_pi' argument.

Yeah, its not very straightforward, I was also scratching my head. First
we do:

   if (requeue_pi) {
   /*
* Requeue PI only works on two distinct uaddrs. This
* check is only valid for private futexes. See below.
*/
   if (uaddr1 == uaddr2)
   return -EINVAL;

We check here to abort as early as possible for the usual security reasons.


Then:

   /*
* The check above which compares uaddrs is not sufficient for
* shared futexes. We need to compare the keys:
*/
   if (requeue_pi  match_futex(key1, key2)) {
   ret = -EINVAL;
   goto out_put_keys;
   }

I wonder why we're checking for requeue_pi again, when, at least
according to the comments, it should be for shared. I guess it would
make sense depending on the mappings as the keys are the only true way
of determining if both futexes are the same, so perhaps:

   if ((requeue_pi || (flags  FLAGS_SHARED))  match_futex())

No, the rule only applies to requeue_pi. This check is the for-sure
version of the uaddr comparison above. We could add if flags 
FLAGS_SHARED, but I'm not sure it's worth it.

--
Darren Hart
Intel Open Source Technology Center



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Darren Hart
Corrected Davidlohr's email address.

On 1/15/15, 7:12 AM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:

Hello Darren,

I give you the same apology as to Thomas for the
long-delayed response to your mail.

And I repeat my note to Thomas:
In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there
are a quite number of FIXMEs that I've placed in the page
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page?

I'll make the time for that. I've wanted to see this for a while, so thank
you for working on it!

 

In the meantime, I have a couple of questions, which, if
you could answer them, I would work some changes into the
page before sending.

1. In various places, distinction is made between non-PI
   futexs and PI futexes. But what determines that distinction?
   From the kernel's perspective, hat make a futex one type
   or another? I presume it is to do with the types of blocking
   waiters on the futex, but it would be good to have a formal
   definition.

You're right in that a uaddr is a uaddr is a uaddr. Also there is no such
thing as a futex, it doesn't exist as any kind of identifiable object, so
these discussions can get rather confusing :-)

A futex becomes a PI futex when it is created via a PI futex op code.
At that point, the syscall will ensure a pi_state is populated for the
futex_q entry. See futex_lock_pi() for example. Before the locks are
taken, there is a call to refill_pi_state_cache() which preps a pi_state
for assignment later in futex_lock_pi_atomic(). This pi_state provides the
necessary linkage to perform the priority boosting in the event of a
priority inversion. This is handled externally from the futexes via the
rt_mutex construct.

Clear as mud?



2. Can you say something about the pairing requirements of
   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
   What is the requirement and why do we need it?

Briefly, these op codes exist to support a fairly specific use case:
support for PI aware pthread condvars (glibc patch acceptance STILL
PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! But is shipped with various
PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
of the logic can happen on the kernel side (to preserve ownership of an
rt_mutex with waiters), so in order to ensure userspace and kernelspace
remain in sync, we pre-specify the target of the requeue in
futex_wait_requeue_pi. This also limits the attack surface by only
supporting exactly what it was meant to do. The corner cases get insane
otherwise.

We could walk through the various ways in which it would break if these
pairing restrictions were not in place, but I'll have to take some serious
time to page all those into working memory. Let me know if we need more
detail here and I will.


Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed
in most cases, but there is one that I have marked with
???. If you could reply to that. I'd be grateful.

...

 For all the PI opcodes, we should probably mention something about the
 futex value scheme (TID), whereas the other opcodes do not require any
 specific value scheme.
 
 No Owner:0
 Owner:   TID
 Waiters: TID | FUTEX_WAITERS
 
 This is the relevant section from the referenced paper:
  
 The PI futex operations diverge from the oth-
 ers in that they impose a policy describing how
 the futex value is to be used. If the lock is un-
 owned, the futex value shall be 0. If owned, it
 shall be the thread id (tid) of the owning thread.
 If there are threads contending for the lock, then
 the FUTEX_WAITERS flag is set. With this policy in
 place, userspace can atomically acquire an unowned
 lock or release an uncontended lock using an atomic
 instruction and their own tid. A non-zero futex
 value will force waiters into the kernel to lock. The
 FUTEX_WAITERS flag forces the owner into the kernel
 to unlock. If the callers are forced into the kernel,
 they then deal directly with an underlying rt_mutex
 which implements the priority inheritance semantics.
 After the rt_mutex is acquired, the futex value is up-
 dated accordingly, before the calling thread returns
 to userspace.

 It is important to note that the kernel will update the futex value
prior
 to returning to userspace. Unlike other futex op codes,
 FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are
designed
 for the implementation of very specific IPC mechanisms).

??? Great text. May I presume that I can take this text
and freely adapt it for the man page? (Actually, this is a
request for forgiveness, rather than permission :-).)

Thanks, and no objection from me.

--
Darren Hart
Intel Open Source Technology Center


--
To unsubscribe from this list: send the line 

Re: futex(2) man page update help request

2015-01-16 Thread Michael Kerrisk (man-pages)
Hello Thomas,

On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
 On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
 [EINVAL] uaddr equal uaddr2. Requeue to same futex.

 ??? I added this, but does this error not occur only for PI requeues?
 
 It's equally wrong for normal futexes. And its actually the same code
 checking for this for all variants.

I don't understand equally wrong in your reply, I'm sorry. Do you
mean:

a) This error text should be there for both normal and PI requeues
OR
a) This error text should be there for neither normal nor PI requeues

 [EDEADLOCK] The futex is already locked by the caller or the kernel 
 detected a deadlock scenario in a nested lock chain

 Added.
 
 It's actually EDEADLK

Yes, sorry -- I should have said that I already found and fixed 
that problem.

 [EOWNERDIED] The owner of the futex died and the kernel made the 
 caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
 futex userspace value. Caller is responsible for cleanup

 There is no such thing as an EOWNERDIED error. I had a look
 through the kernel source for the FUTEX_OWNER_DIED cases and didn't 
 see an obvious error associated with them. Can you clarify? (I think 
 the point is that this condition, which is described in
 Documentation/robust-futexes.txt, is not an error as such. However, I'm
 not yet sure of how to describe it in the man page.)
 I will add this point as a FIXME in the new draft man page.
 
 Oops. My bad. That's not the what the kernel does. The kernel merily
 marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
 to deal with that and the posix users return EOWNERDEAD (not
 EOWNERDIED], so it's not part of the futex call itself.
 
 We had discussions about returning EOWNERDEAD in that case, but then
 glibc with its sophisticated error handling prevented that 

Okay. I'll add a FIXME to the draft page, to see if we get some good 
text together to describe FUTEX_OWNER_DIED and how it is used.

 FUTEX_TRYLOCK_PI

 This operation tries to acquire the futex at uaddr. It deals with the
 situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
 bit is set. User space cannot handle this race free.

 Added.

 The arguments uaddr2, val, timeout and val3 are ignored.

 ??? But the code reads:

 case FUTEX_TRYLOCK_PI:
 return futex_lock_pi(uaddr, flags, 0, timeout, 1);
  
 which momentarily misleads one into thinking that 'timeout' is used.
 And: it's not quite ignored, since in futex_lock_pi() a non-NULL
 'timeout' is unconditionally dereferenced (meaning you could get
 an EFAULT error for a bad 'timeout' pointer).
 I'm confused
 
 Indeed. That's just wrong.
  
 Maybe the above code should be

 case FUTEX_TRYLOCK_PI:
 return futex_lock_pi(uaddr, flags, 0, NULL, 1);
 ?
 
 Care to send a patch?

Will do.
  
[...]

 ??? I don't believe this can happen. 'val3' is internally set to
 FUTEX_BITSET_MATCH_ANY. Can you confirm?
 
 Right. We dont support that bitset stuff in requeue_pi ATM.

Thanks for the confirmation.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-16 Thread Thomas Gleixner
On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:

 Hello Thomas,
 
 On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
  On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
  [EINVAL] uaddr equal uaddr2. Requeue to same futex.
 
  ??? I added this, but does this error not occur only for PI requeues?
  
  It's equally wrong for normal futexes. And its actually the same code
  checking for this for all variants.
 
 I don't understand equally wrong in your reply, I'm sorry. Do you
 mean:
 
 a) This error text should be there for both normal and PI requeues

It is there for both. The requeue code has that check independent of
the requeue type (normal/pi). It never makes sense to requeue
something to itself whether normal or pi futex. We added this for PI,
because there it is harmful, but we did not special case it. So normal
futexes get the same treatment.

Thanks,

tglx



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-15 Thread Thomas Gleixner
On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> > [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> 
> ??? I added this, but does this error not occur only for PI requeues?

It's equally wrong for normal futexes. And its actually the same code
checking for this for all variants.

> > [EDEADLOCK] The futex is already locked by the caller or the kernel 
> > detected a deadlock scenario in a nested lock chain
>
> Added.

It's actually EDEADLK

> 
> > [EOWNERDIED] The owner of the futex died and the kernel made the 
> > caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
> > futex userspace value. Caller is responsible for cleanup
> 
> There is no such thing as an EOWNERDIED error. I had a look
> through the kernel source for the FUTEX_OWNER_DIED cases and didn't 
> see an obvious error associated with them. Can you clarify? (I think 
> the point is that this condition, which is described in
> Documentation/robust-futexes.txt, is not an error as such. However, I'm
> not yet sure of how to describe it in the man page.)
> I will add this point as a FIXME in the new draft man page.

Oops. My bad. That's not the what the kernel does. The kernel merily
marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
to deal with that and the posix users return EOWNERDEAD (not
EOWNERDIED], so it's not part of the futex call itself.

We had discussions about returning EOWNERDEAD in that case, but then
glibc with its sophisticated error handling prevented that 
 
> > FUTEX_TRYLOCK_PI
> > 
> > This operation tries to acquire the futex at uaddr. It deals with the
> > situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
> > bit is set. User space cannot handle this race free.
> 
> Added.
> 
> > The arguments uaddr2, val, timeout and val3 are ignored.
> 
> ??? But the code reads:
> 
> case FUTEX_TRYLOCK_PI:
> return futex_lock_pi(uaddr, flags, 0, timeout, 1);
>  
> which momentarily misleads one into thinking that 'timeout' is used.
> And: it's not quite ignored, since in futex_lock_pi() a non-NULL
> 'timeout' is unconditionally dereferenced (meaning you could get
> an EFAULT error for a bad 'timeout' pointer).
> I'm confused

Indeed. That's just wrong.
 
> Maybe the above code should be
> 
> case FUTEX_TRYLOCK_PI:
> return futex_lock_pi(uaddr, flags, 0, NULL, 1);
> ?

Care to send a patch?
 
> > FUTEX_WAIT_REQUEUE_PI
> > 
> > Wait operation to wait on a non pi futex at uaddr and potentially be
> > requeued on a pi futex at uaddr2. The wait operation on uaddr is the
> > same as FUTEX_WAIT. The waiter can be removed from the wait on uaddr
> > via FUTEX_WAKE without requeuing on uaddr2.
> 
> Added.
> 
> > The timeout argument is handled as described in FUTEX_WAIT.
> 
> The above seems not to be correct. I've written the discussion of
> 'timeout' up as I understand it, and added a FIXME to the draft page.
> 
> > Darren, can you fill in the missing details?
> 
> > Return values:
> > 
> > [EFAULT] Kernel was unable to access the futex value at uaddr or
> > uaddr2
> 
> Already covered.
> 
> > [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
> > valid object, i.e. pointer is not 4 byte aligned
> 
> Already covered.
> 
> > [EINVAL] The supplied timeout argument is not normalized.
> 
> Already covered.
> 
> > [EINVAL] The supplied bitset is zero.
> 
> ??? I don't believe this can happen. 'val3' is internally set to
> FUTEX_BITSET_MATCH_ANY. Can you confirm?

Right. We dont support that bitset stuff in requeue_pi ATM.
 
Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2015-01-15 Thread Michael Kerrisk (man-pages)
Hello Darren,

I give you the same apology as to Thomas for the 
long-delayed response to your mail.

And I repeat my note to Thomas:
In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there 
are a quite number of FIXMEs that I've placed in the page 
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page? 

In the meantime, I have a couple of questions, which, if 
you could answer them, I would work some changes into the 
page before sending.

1. In various places, distinction is made between non-PI 
   futexs and PI futexes. But what determines that distinction?
   From the kernel's perspective, hat make a futex one type
   or another? I presume it is to do with the types of blocking
   waiters on the futex, but it would be good to have a formal
   definition.

2. Can you say something about the pairing requirements of
   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI. 
   What is the requirement and why do we need it?

Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed 
in most cases, but there is one that I have marked with
"???". If you could reply to that. I'd be grateful.

On 05/15/2014 10:35 PM, Darren Hart wrote:
> On 5/15/14, 7:14, "Thomas Gleixner"  wrote:
> 
> Wow Thomas, I planned to do exactly this and you beat me to it. Again.
> Thanks for getting this started.
> 
> Michael, I imagine you want something more condensed, and I'll add to what
> tglx posted (inline below) to try and get you that, but if you have
> questions and need to fill in the gap, the paper I presented at RTLWS11 in
> '09 covers this particularly nasty OPCODE in detail:
> 
> http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
> 
> I believe Michael is looking for some higher level documentation, like how
> to use these and what they are intended for. 

Yes, that would be good.

> Probably something more like
> Ulrich's Futexes are Tricky paper - but let's start with getting the op
> codes, arguments, and return codes fleshed out.

Okay.

> For all the PI opcodes, we should probably mention something about the
> futex value scheme (TID), whereas the other opcodes do not require any
> specific value scheme.
> 
> No Owner: 0
> Owner:TID
> Waiters:  TID | FUTEX_WAITERS
> 
> This is the relevant section from the referenced paper:
>   
> The PI futex operations diverge from the oth-
> ers in that they impose a policy describing how
> the futex value is to be used. If the lock is un-
> owned, the futex value shall be 0. If owned, it
> shall be the thread id (tid) of the owning thread.
> If there are threads contending for the lock, then
> the FUTEX_WAITERS flag is set. With this policy in
> place, userspace can atomically acquire an unowned
> lock or release an uncontended lock using an atomic
> instruction and their own tid. A non-zero futex
> value will force waiters into the kernel to lock. The
> FUTEX_WAITERS flag forces the owner into the kernel
> to unlock. If the callers are forced into the kernel,
> they then deal directly with an underlying rt_mutex
> which implements the priority inheritance semantics.
> After the rt_mutex is acquired, the futex value is up-
> dated accordingly, before the calling thread returns
> to userspace.
>
> It is important to note that the kernel will update the futex value prior
> to returning to userspace. Unlike other futex op codes,
> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are designed
> for the implementation of very specific IPC mechanisms).

??? Great text. May I presume that I can take this text 
and freely adapt it for the man page? (Actually, this is a 
request for forgiveness, rather than permission :-).)

>> FUTEX_CMP_REQUEUE_PI
>>
>>  PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
>>  a non PI futex. Outer futex to which is requeued is a PI futex
>>  at uaddr2.
> 
> Inner/outer terminology applies specifically to the glibc pthread
> condition variable and mutex use case, but is overly specific for the man
> page. Consider:
> 
> PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
> FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
> futex (uaddr2).

Thanks for that text. It is easier to grasp.

>>
>>  The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>>
>>  The argument val is contains the number of waiters on uaddr
>>  which are immediately woken up. Must be 1 for this opcode.
> 
> Because the point is to avoid the thundering herd in the first place, and
> other nasty little races and faulting corner cases...

I added the piece about "thundering herd".

>>  The timeout argument is abused to transport the number of
>>  waiters which are requeued on to the futex at 

Re: futex(2) man page update help request

2015-01-15 Thread Michael Kerrisk (man-pages)
[Adding a few people to CC that have expressed interest in the 
progress of the updates of this page, or who may be able to
provide review feedback. Eventually, you'll all get CCed on
the new draft of the page.]

Hello Thomas,

On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>> And that universe would love to have your documentation of 
>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
> 
> I give you almost the full treatment, but I leave REQUEUE_PI to
> Darren and FUTEX_WAKE_OP to Jakub. :)

Thank you for the great effort you put into compiling the
text below, and apologies for my long delay in following up.

I've integrated almost all of your suggestions into the 
manual page. I will shortly send out a new draft of the
page that contains various FIXMEs for points that remain 
unclear.

Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed 
in most cases, but there are a very few open questions in 
this mail that, to help you find them, I have marked with
"???". If you (or someone else) could reply to those, I 
would be grateful.

In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there 
are a quite number of FIXMEs that I've placed in the page 
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page? 

For that matter, if anyone else would have time for
reviewing the page, could they shout out now. It's perhaps
unlikely, but I am worried about getting a thundering herd
of comments, and bringing the page to the state where I have 
it now has already been a fairly demanding task.

==

> FUTEX_WAIT
> 
> < Existing blurb seems ok >
> 
> Related return values
> 
> [EFAULT] Kernel was unable to access the futex value at uaddr.

Added/reworked.

> [EINVAL] The supplied uaddr argument does not pouint to a valid 
> object, i.e. pointer is not 4 byte aligned

Added.

> [EINVAL] The supplied timeout argument is not normalized.

Added, but with more detail.

> [EWOULDBLOCK] The atomic enqueueing failed. 

Added.

Note, however, that for consistency, I'll use EAGAIN throughout 
the page.

>  User space value at uaddr
> is not equal val argument.

Was already present.

> [ETIMEDOUT] timeout expired

Was present, but I have now added more detail.

==

> FUTEX_WAKE
> 
> < Existing blurb seems ok >
> 
> Related return values
> 
> [EFAULT] Kernel was unable to access the futex value at uaddr.

Added/reworked.

> [EINVAL] The supplied uaddr argument does not point to a valid 
> object, i.e. pointer is not 4 byte aligned

Added.

> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI

Added.

==

> FUTEX_REQUEUE
> 
> Existing blurb seems ok , except for this:
> 
> The argument val contains the number of waiters on uaddr which are
> immediately woken up.
> The timeout argument is abused to transport the number of waiters
> which are requeued to the futex at uaddr2. The pointer is typecasted
> to u32.

What I've actually done with the main text for FUTEX_REQUEUE is defer 
to the (now-expanded) discussion of FUTEX_CMP_REQUEUE. 

> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2

Added/reworked.

> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned

Added.

> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr

Added.

> [EINVAL] uaddr equal uaddr2. Requeue to same futex.

??? I added this, but does this error not occur only for PI requeues?

==

> FUTEX_REQUEUE_CMP
> 
> Existing blurb seems ok , except for this:

[[
> The argument val is contains the number of waiters on uaddr which are
> immediately woken up.
> 
> The timeout argument is abused to transport the number of waiters
> which are requeued to the futex at uaddr2. The pointer is typecasted
> to u32.
]]

Covered now (in more detail).

> Related return values
> 
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2

Added/reworked.

> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned

Added.

> [EINVAL] uaddr equal uaddr2. Requeue to same futex.

Added.

> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr

Added

> [EAGAIN] uaddr1 readout is not equal the compare value in argument
> val3

Was already present.

==

> FUTEX_WAKE_OP
> 
> 
> Jakub, can you please explain it? I'm lost :)

I had a read of 

Re: futex(2) man page update help request

2015-01-15 Thread Michael Kerrisk (man-pages)
[Adding a few people to CC that have expressed interest in the 
progress of the updates of this page, or who may be able to
provide review feedback. Eventually, you'll all get CCed on
the new draft of the page.]

Hello Thomas,

On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
 On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
 And that universe would love to have your documentation of 
 FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
 
 I give you almost the full treatment, but I leave REQUEUE_PI to
 Darren and FUTEX_WAKE_OP to Jakub. :)

Thank you for the great effort you put into compiling the
text below, and apologies for my long delay in following up.

I've integrated almost all of your suggestions into the 
manual page. I will shortly send out a new draft of the
page that contains various FIXMEs for points that remain 
unclear.

Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed 
in most cases, but there are a very few open questions in 
this mail that, to help you find them, I have marked with
???. If you (or someone else) could reply to those, I 
would be grateful.

In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there 
are a quite number of FIXMEs that I've placed in the page 
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page? 

For that matter, if anyone else would have time for
reviewing the page, could they shout out now. It's perhaps
unlikely, but I am worried about getting a thundering herd
of comments, and bringing the page to the state where I have 
it now has already been a fairly demanding task.

==

 FUTEX_WAIT
 
  Existing blurb seems ok 
 
 Related return values
 
 [EFAULT] Kernel was unable to access the futex value at uaddr.

Added/reworked.

 [EINVAL] The supplied uaddr argument does not pouint to a valid 
 object, i.e. pointer is not 4 byte aligned

Added.

 [EINVAL] The supplied timeout argument is not normalized.

Added, but with more detail.

 [EWOULDBLOCK] The atomic enqueueing failed. 

Added.

Note, however, that for consistency, I'll use EAGAIN throughout 
the page.

  User space value at uaddr
 is not equal val argument.

Was already present.

 [ETIMEDOUT] timeout expired

Was present, but I have now added more detail.

==

 FUTEX_WAKE
 
  Existing blurb seems ok 
 
 Related return values
 
 [EFAULT] Kernel was unable to access the futex value at uaddr.

Added/reworked.

 [EINVAL] The supplied uaddr argument does not point to a valid 
 object, i.e. pointer is not 4 byte aligned

Added.

 [EINVAL] The kernel detected inconsistent state between the user
 space state at uaddr and the kernel state, i.e. it detected a waiter
 which waits in FUTEX_LOCK_PI

Added.

==

 FUTEX_REQUEUE
 
 Existing blurb seems ok , except for this:
 
 The argument val contains the number of waiters on uaddr which are
 immediately woken up.
 The timeout argument is abused to transport the number of waiters
 which are requeued to the futex at uaddr2. The pointer is typecasted
 to u32.

What I've actually done with the main text for FUTEX_REQUEUE is defer 
to the (now-expanded) discussion of FUTEX_CMP_REQUEUE. 

 [EFAULT] Kernel was unable to access the futex value at uaddr or
 uaddr2

Added/reworked.

 [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
 object, i.e. pointer is not 4 byte aligned

Added.

 [EINVAL] The kernel detected inconsistent state between the user
 space state at uaddr and the kernel state, i.e. it detected a waiter
 which waits in FUTEX_LOCK_PI on uaddr

Added.

 [EINVAL] uaddr equal uaddr2. Requeue to same futex.

??? I added this, but does this error not occur only for PI requeues?

==

 FUTEX_REQUEUE_CMP
 
 Existing blurb seems ok , except for this:

[[
 The argument val is contains the number of waiters on uaddr which are
 immediately woken up.
 
 The timeout argument is abused to transport the number of waiters
 which are requeued to the futex at uaddr2. The pointer is typecasted
 to u32.
]]

Covered now (in more detail).

 Related return values
 
 [EFAULT] Kernel was unable to access the futex value at uaddr or
 uaddr2

Added/reworked.

 [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
 object, i.e. pointer is not 4 byte aligned

Added.

 [EINVAL] uaddr equal uaddr2. Requeue to same futex.

Added.

 [EINVAL] The kernel detected inconsistent state between the user
 space state at uaddr and the kernel state, i.e. it detected a waiter
 which waits in FUTEX_LOCK_PI on uaddr

Added

 [EAGAIN] uaddr1 readout is not equal the compare value in argument
 val3

Was already present.

==

 FUTEX_WAKE_OP
 
 
 Jakub, can you please explain it? I'm lost :)

I had a read of Ulrich Drepper's Futexes are Tricky, and the source 
code, and took a shot at it. I'd 

Re: futex(2) man page update help request

2015-01-15 Thread Michael Kerrisk (man-pages)
Hello Darren,

I give you the same apology as to Thomas for the 
long-delayed response to your mail.

And I repeat my note to Thomas:
In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there 
are a quite number of FIXMEs that I've placed in the page 
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page? 

In the meantime, I have a couple of questions, which, if 
you could answer them, I would work some changes into the 
page before sending.

1. In various places, distinction is made between non-PI 
   futexs and PI futexes. But what determines that distinction?
   From the kernel's perspective, hat make a futex one type
   or another? I presume it is to do with the types of blocking
   waiters on the futex, but it would be good to have a formal
   definition.

2. Can you say something about the pairing requirements of
   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI. 
   What is the requirement and why do we need it?

Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed 
in most cases, but there is one that I have marked with
???. If you could reply to that. I'd be grateful.

On 05/15/2014 10:35 PM, Darren Hart wrote:
 On 5/15/14, 7:14, Thomas Gleixner t...@linutronix.de wrote:
 
 Wow Thomas, I planned to do exactly this and you beat me to it. Again.
 Thanks for getting this started.
 
 Michael, I imagine you want something more condensed, and I'll add to what
 tglx posted (inline below) to try and get you that, but if you have
 questions and need to fill in the gap, the paper I presented at RTLWS11 in
 '09 covers this particularly nasty OPCODE in detail:
 
 http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
 
 I believe Michael is looking for some higher level documentation, like how
 to use these and what they are intended for. 

Yes, that would be good.

 Probably something more like
 Ulrich's Futexes are Tricky paper - but let's start with getting the op
 codes, arguments, and return codes fleshed out.

Okay.

 For all the PI opcodes, we should probably mention something about the
 futex value scheme (TID), whereas the other opcodes do not require any
 specific value scheme.
 
 No Owner: 0
 Owner:TID
 Waiters:  TID | FUTEX_WAITERS
 
 This is the relevant section from the referenced paper:
   
 The PI futex operations diverge from the oth-
 ers in that they impose a policy describing how
 the futex value is to be used. If the lock is un-
 owned, the futex value shall be 0. If owned, it
 shall be the thread id (tid) of the owning thread.
 If there are threads contending for the lock, then
 the FUTEX_WAITERS flag is set. With this policy in
 place, userspace can atomically acquire an unowned
 lock or release an uncontended lock using an atomic
 instruction and their own tid. A non-zero futex
 value will force waiters into the kernel to lock. The
 FUTEX_WAITERS flag forces the owner into the kernel
 to unlock. If the callers are forced into the kernel,
 they then deal directly with an underlying rt_mutex
 which implements the priority inheritance semantics.
 After the rt_mutex is acquired, the futex value is up-
 dated accordingly, before the calling thread returns
 to userspace.

 It is important to note that the kernel will update the futex value prior
 to returning to userspace. Unlike other futex op codes,
 FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are designed
 for the implementation of very specific IPC mechanisms).

??? Great text. May I presume that I can take this text 
and freely adapt it for the man page? (Actually, this is a 
request for forgiveness, rather than permission :-).)

 FUTEX_CMP_REQUEUE_PI

  PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
  a non PI futex. Outer futex to which is requeued is a PI futex
  at uaddr2.
 
 Inner/outer terminology applies specifically to the glibc pthread
 condition variable and mutex use case, but is overly specific for the man
 page. Consider:
 
 PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
 FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
 futex (uaddr2).

Thanks for that text. It is easier to grasp.


  The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.

  The argument val is contains the number of waiters on uaddr
  which are immediately woken up. Must be 1 for this opcode.
 
 Because the point is to avoid the thundering herd in the first place, and
 other nasty little races and faulting corner cases...

I added the piece about thundering herd.

  The timeout argument is abused to transport the number of
  waiters which are requeued on to the futex at uaddr2. The
  pointer is typecasted to u32.
 
 
   val3 

Re: futex(2) man page update help request

2015-01-15 Thread Thomas Gleixner
On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
  [EINVAL] uaddr equal uaddr2. Requeue to same futex.
 
 ??? I added this, but does this error not occur only for PI requeues?

It's equally wrong for normal futexes. And its actually the same code
checking for this for all variants.

  [EDEADLOCK] The futex is already locked by the caller or the kernel 
  detected a deadlock scenario in a nested lock chain

 Added.

It's actually EDEADLK

 
  [EOWNERDIED] The owner of the futex died and the kernel made the 
  caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
  futex userspace value. Caller is responsible for cleanup
 
 There is no such thing as an EOWNERDIED error. I had a look
 through the kernel source for the FUTEX_OWNER_DIED cases and didn't 
 see an obvious error associated with them. Can you clarify? (I think 
 the point is that this condition, which is described in
 Documentation/robust-futexes.txt, is not an error as such. However, I'm
 not yet sure of how to describe it in the man page.)
 I will add this point as a FIXME in the new draft man page.

Oops. My bad. That's not the what the kernel does. The kernel merily
marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
to deal with that and the posix users return EOWNERDEAD (not
EOWNERDIED], so it's not part of the futex call itself.

We had discussions about returning EOWNERDEAD in that case, but then
glibc with its sophisticated error handling prevented that 
 
  FUTEX_TRYLOCK_PI
  
  This operation tries to acquire the futex at uaddr. It deals with the
  situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
  bit is set. User space cannot handle this race free.
 
 Added.
 
  The arguments uaddr2, val, timeout and val3 are ignored.
 
 ??? But the code reads:
 
 case FUTEX_TRYLOCK_PI:
 return futex_lock_pi(uaddr, flags, 0, timeout, 1);
  
 which momentarily misleads one into thinking that 'timeout' is used.
 And: it's not quite ignored, since in futex_lock_pi() a non-NULL
 'timeout' is unconditionally dereferenced (meaning you could get
 an EFAULT error for a bad 'timeout' pointer).
 I'm confused

Indeed. That's just wrong.
 
 Maybe the above code should be
 
 case FUTEX_TRYLOCK_PI:
 return futex_lock_pi(uaddr, flags, 0, NULL, 1);
 ?

Care to send a patch?
 
  FUTEX_WAIT_REQUEUE_PI
  
  Wait operation to wait on a non pi futex at uaddr and potentially be
  requeued on a pi futex at uaddr2. The wait operation on uaddr is the
  same as FUTEX_WAIT. The waiter can be removed from the wait on uaddr
  via FUTEX_WAKE without requeuing on uaddr2.
 
 Added.
 
  The timeout argument is handled as described in FUTEX_WAIT.
 
 The above seems not to be correct. I've written the discussion of
 'timeout' up as I understand it, and added a FIXME to the draft page.
 
  Darren, can you fill in the missing details?
 
  Return values:
  
  [EFAULT] Kernel was unable to access the futex value at uaddr or
  uaddr2
 
 Already covered.
 
  [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
  valid object, i.e. pointer is not 4 byte aligned
 
 Already covered.
 
  [EINVAL] The supplied timeout argument is not normalized.
 
 Already covered.
 
  [EINVAL] The supplied bitset is zero.
 
 ??? I don't believe this can happen. 'val3' is internally set to
 FUTEX_BITSET_MATCH_ANY. Can you confirm?

Right. We dont support that bitset stuff in requeue_pi ATM.
 
Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-11-26 Thread Cyril Hrubis
Hi!
> >For this complexity of tests you would just need to call the tst_resm()
> >interface to report success/failure and, at the end of the test,
> >tst_exit() to return the stored overall test status.
> >
> >And ideally call the standard option parsing code and call the test in
> >standard loop so that the test can take advantage of standard options as
> >number of iterations to run, etc.
> >
> >Have a look at:
> >
> >https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines
> >
> >there is simple test example as well as description of the interfaces.
> 
> 
> Thanks Cyril,
> 
> I'll follow up with you in a couple weeks most likely. I have some urgent
> things that will be taking all my time and then some until then. Feel free
> to poke me though if I lose track of it :-)

Do you still plan to work on this?

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-11-26 Thread Cyril Hrubis
Hi!
 For this complexity of tests you would just need to call the tst_resm()
 interface to report success/failure and, at the end of the test,
 tst_exit() to return the stored overall test status.
 
 And ideally call the standard option parsing code and call the test in
 standard loop so that the test can take advantage of standard options as
 number of iterations to run, etc.
 
 Have a look at:
 
 https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines
 
 there is simple test example as well as description of the interfaces.
 
 
 Thanks Cyril,
 
 I'll follow up with you in a couple weeks most likely. I have some urgent
 things that will be taking all my time and then some until then. Feel free
 to poke me though if I lose track of it :-)

Do you still plan to work on this?

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-08-11 Thread chrubis
Hi!
> >> How much LTP harness type code needs to be used?
> >
> >Not much.
> >
> >For this complexity of tests you would just need to call the tst_resm()
> >interface to report success/failure and, at the end of the test,
> >tst_exit() to return the stored overall test status.
> >
> >And ideally call the standard option parsing code and call the test in
> >standard loop so that the test can take advantage of standard options as
> >number of iterations to run, etc.
> >
> >Have a look at:
> >
> >https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines
> >
> >there is simple test example as well as description of the interfaces.
> 
> 
> Thanks Cyril,
> 
> I'll follow up with you in a couple weeks most likely. I have some urgent
> things that will be taking all my time and then some until then. Feel free
> to poke me though if I lose track of it :-)

Ping :)

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-08-11 Thread chrubis
Hi!
  How much LTP harness type code needs to be used?
 
 Not much.
 
 For this complexity of tests you would just need to call the tst_resm()
 interface to report success/failure and, at the end of the test,
 tst_exit() to return the stored overall test status.
 
 And ideally call the standard option parsing code and call the test in
 standard loop so that the test can take advantage of standard options as
 number of iterations to run, etc.
 
 Have a look at:
 
 https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines
 
 there is simple test example as well as description of the interfaces.
 
 
 Thanks Cyril,
 
 I'll follow up with you in a couple weeks most likely. I have some urgent
 things that will be taking all my time and then some until then. Feel free
 to poke me though if I lose track of it :-)

Ping :)

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-08-04 Thread Carlos O'Donell
On 05/15/2014 04:19 PM, Michael Kerrisk (man-pages) wrote:
> On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
>> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>>> And that universe would love to have your documentation of
>>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
>>
>> I give you almost the full treatment, but I leave REQUEUE_PI to Darren
>> and FUTEX_WAKE_OP to Jakub. :)
> 
> Thanks Thomas--that's fantastic! Hopefully, Darren and Jakub fill in those
> missing pieces...

Michael,

Do you need any help getting these additional futex error codes
into the linux kernel man pages project? Thomas provided the
missing bits and Darren commented... what else do we need?

I'm asking because I want to point other Red Hat engineers at
these pages to say: "these are the canonical error codes." 

We're trying to cleanup the userspace side of things.

Cheers,
Carlos.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-08-04 Thread Carlos O'Donell
On 05/15/2014 04:19 PM, Michael Kerrisk (man-pages) wrote:
 On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
 On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
 And that universe would love to have your documentation of
 FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),

 I give you almost the full treatment, but I leave REQUEUE_PI to Darren
 and FUTEX_WAKE_OP to Jakub. :)
 
 Thanks Thomas--that's fantastic! Hopefully, Darren and Jakub fill in those
 missing pieces...

Michael,

Do you need any help getting these additional futex error codes
into the linux kernel man pages project? Thomas provided the
missing bits and Darren commented... what else do we need?

I'm asking because I want to point other Red Hat engineers at
these pages to say: these are the canonical error codes. 

We're trying to cleanup the userspace side of things.

Cheers,
Carlos.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Darren Hart
On 5/15/14, 7:14, "Thomas Gleixner"  wrote:

Wow Thomas, I planned to do exactly this and you beat me to it. Again.
Thanks for getting this started.

Michael, I imagine you want something more condensed, and I'll add to what
tglx posted (inline below) to try and get you that, but if you have
questions and need to fill in the gap, the paper I presented at RTLWS11 in
'09 covers this particularly nasty OPCODE in detail:

http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf

I believe Michael is looking for some higher level documentation, like how
to use these and what they are intended for. Probably something more like
Ulrich's Futexes are Tricky paper - but let's start with getting the op
codes, arguments, and return codes fleshed out.



For all the PI opcodes, we should probably mention something about the
futex value scheme (TID), whereas the other opcodes do not require any
specific value scheme.

No Owner:   0
Owner:  TID
Waiters:TID | FUTEX_WAITERS

This is the relevant section from the referenced paper:










The PI futex operations diverge from 
the oth-
ers in that they impose a policy describing how
the futex value is to be used. If the lock is un-
owned, the futex value shall be 0. If owned, it
shall be the thread id (tid) of the owning thread.
If there are threads contending for the lock, then
the FUTEX_WAITERS flag is set. With this policy in
place, userspace can atomically acquire an unowned
lock or release an uncontended lock using an atomic
instruction and their own tid. A non-zero futex
value will force waiters into the kernel to lock. The
FUTEX_WAITERS flag forces the owner into the kernel
to unlock. If the callers are forced into the kernel,
they then deal directly with an underlying rt_mutex
which implements the priority inheritance semantics.
After the rt_mutex is acquired, the futex value is up-
dated accordingly, before the calling thread returns
to userspace.





It is important to note that the kernel will update the futex value prior
to returning to userspace. Unlike other futex op codes,
FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are designed
for the implementation of very specific IPC mechanisms).


>FUTEX_CMP_REQUEUE_PI
>
>   PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
>   a non PI futex. Outer futex to which is requeued is a PI futex
>   at uaddr2.

Inner/outer terminology applies specifically to the glibc pthread
condition variable and mutex use case, but is overly specific for the man
page. Consider:

PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
futex (uaddr2).

>
>   The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>
>   The argument val is contains the number of waiters on uaddr
>   which are immediately woken up. Must be 1 for this opcode.

Because the point is to avoid the thundering herd in the first place, and
other nasty little races and faulting corner cases...

>
>   The timeout argument is abused to transport the number of
>   waiters which are requeued on to the futex at uaddr2. The
>   pointer is typecasted to u32.


  val3 contains the expected value of uaddr (same as
FUTEX_CMP_REQUEUE)


>
>Darren, can you fill in the missing details?

Yup...

>
>   [EFAULT] Kernel was unable to access the futex value at uaddr
>or uaddr2
>
>   [ENOMEM] Kernel could not allocate state
>
>   [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
>valid object, i.e. pointer is not 4 byte aligned
>
>   [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
>   [EINVAL] The kernel detected inconsistent state between the
>user space state at uaddr and the kernel state,
>i.e. it detected a waiter which waits in
>FUTEX_LOCK_PI on uaddr

   instead of FUTEX_WAIT_REQUEUE_PI.

>
>   [EINVAL] The kernel detected inconsistent state between the
>user space state at uaddr and the kernel state,
>i.e. it detected a waiter which waits in
>FUTEX_WAIT[_BITSET] on uaddr
>
>   [EINVAL] The kernel detected inconsistent state between the
>user space state at uaddr2 and the kernel state,
>i.e. it detected a waiter which waits in
>FUTEX_WAIT on uaddr2.

  [EINVAL] The kernel detected the FUTEX_CMP_REQUEUE_PI call is
   attempting to requeue a task to a futex other than that
   specified by the matching FUTEX_WAIT_REQUEUE_PI call for
   that task.

A number of these 

Re: futex(2) man page update help request

2014-05-15 Thread Michael Kerrisk (man-pages)
On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>> And that universe would love to have your documentation of
>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
> 
> I give you almost the full treatment, but I leave REQUEUE_PI to Darren
> and FUTEX_WAKE_OP to Jakub. :)

Thanks Thomas--that's fantastic! Hopefully, Darren and Jakub fill in those
missing pieces...

Cheers,

Michael


> FUTEX_WAIT
> 
>   < Existing blurb seems ok >
> 
>   Related return values
> 
>   [EFAULT] Kernel was unable to access the futex value at uaddr.
> 
>   [EINVAL] The supplied uaddr argument does not point to a valid
>object, i.e. pointer is not 4 byte aligned
> 
>   [EINVAL] The supplied timeout argument is not normalized.
> 
>   [EWOULDBLOCK] The atomic enqueueing failed. User space value
> at uaddr is not equal val argument.
> 
>   [ETIMEDOUT] timeout expired 
> 
> 
> FUTEX_WAKE
> 
>   < Existing blurb seems ok >
> 
>   Related return values
> 
>   [EFAULT] Kernel was unable to access the futex value at uaddr.
> 
>   [EINVAL] The supplied uaddr argument does not point to a valid
>object, i.e. pointer is not 4 byte aligned
> 
>   [EINVAL] The kernel detected inconsistent state between the
>user space state at uaddr and the kernel state,
>i.e. it detected a waiter which waits in
>FUTEX_LOCK_PI
> 
> FUTEX_REQUEUE
> 
>   Existing blurb seems ok , except for this:
> 
>   The argument val contains the number of waiters on uaddr which
>   are immediately woken up.
> 
>   The timeout argument is abused to transport the number of
>   waiters which are requeued to the futex at uaddr2. The pointer
>   is typecasted to u32.
> 
> 
>   [EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
> 
>   [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
>valid object, i.e. pointer is not 4 byte aligned
> 
>   [EINVAL] The kernel detected inconsistent state between the
>user space state at uaddr and the kernel state,
>i.e. it detected a waiter which waits in
>FUTEX_LOCK_PI on uaddr
> 
>   [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> 
> FUTEX_REQUEUE_CMP
> 
>   Existing blurb seems ok , except for this:
> 
>   The argument val is contains the number of waiters on uaddr
>   which are immediately woken up.
> 
>   The timeout argument is abused to transport the number of
>   waiters which are requeued to the futex at uaddr2. The pointer
>   is typecasted to u32.
> 
>   Related return values
> 
>   [EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
> 
>   [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
>valid object, i.e. pointer is not 4 byte aligned
> 
>   [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> 
>   [EINVAL] The kernel detected inconsistent state between the
>user space state at uaddr and the kernel state,
>i.e. it detected a waiter which waits in
>FUTEX_LOCK_PI on uaddr
> 
>   [EAGAIN] uaddr1 readout is not equal the compare value in
>argument val3
> 
> FUTEX_WAKE_OP
> 
> 
> Jakub, can you please explain it? I'm lost :)
> 
> 
>   The argument val contains the maximum number of waiters on
>   uaddr which are immediately woken up.
> 
>   The timeout argument is abused to transport the maximum
>   number of waiters on uaddr2 which are woken up. The pointer
>   is typecasted to u32.
> 
>   Related return values
> 
>   [EFAULT] Kernel was unable to access the futex values at uaddr
>or uaddr2
> 
>   [EINVAL] The supplied uaddr or uaddr2 argument does not point
>to a valid object, i.e. pointer is not 4 byte aligned
> 
>   [EINVAL] The kernel detected inconsistent state between the
>user space state at uaddr and the kernel state,
>i.e. it detected a waiter which waits in
>FUTEX_LOCK_PI on uaddr
> 
> 
> FUTEX_WAIT_BITSET
> 
>   The same as FUTEX_WAIT except that val3 is used to provide a
>   32bit bitset to the kernel. This bitset is stored in the
>   kernel internal state of the waiter.
> 
>   This futex op also allows to have the option bit
>   FUTEX_CLOCK_REALTIME set.
> 
>   Related return values
> 
>   [EFAULT] Kernel was unable to access the futex value at uaddr.
> 
>   [EINVAL] The supplied uaddr argument does not point to a valid
>object, i.e. pointer is not 4 byte aligned
> 
>   [EINVAL] The supplied bitset is zero.
> 
>   [EINVAL] The supplied timeout argument is not normalized.
> 
>   [ETIMEDOUT] timeout expired 
> 
> 
> FUTEX_WAKE_BITSET
> 
>   The same as 

Re: futex(2) man page update help request

2014-05-15 Thread Darren Hart
On 5/15/14, 12:05, "chru...@suse.cz"  wrote:

>Hi!
>> >> I've used LTP in the past (quite a bit), and I felt there was some
>> >> advantage to keeping futextest independent.
>> >
>> >What advantages did you have in mind?
>> 
>> Not CVS was a big one at the time ;-)
>> 
>> OK, I don't mean to be disparaging here... But since you asked, back in
>> '09 LTP had some test quality issues and I felt I could maintain
>>futextest
>> to a higher bar independently.
>
>To be honest LTP was one of the messiest codebases I've seen and it was
>hacked up by mostly clueless people (there were even tests with race
>conditions that were carefully disabled in a way that was not easy to
>see). It took me months to get to a state where it compiled fine on
>major distributions.
>
>Today we still have quite a bit of legacy code that needs to be cleaned
>up, however that gets better every day.
>
>And most of the testcases are pretty stable, etc. unfortunatelly LTP has
>a bad reputation which is lot harder to fix than the code itself.
>
>> >> Perhaps things have changed enough since then (~2009 era) that we
>> >> should reconsider.
>> >
>> >I've been working on LTP for a about three years now and we happen to
>>do
>> >quite a lot in that time. The most visible changes would be more proper
>> >development practices (git, proper build system, code review, LKML
>> >coding style, documentation, ...) and also huge number of fixes. Now we
>> >are trying to catch up in coverage too.
>> >
>> >> We can discuss the pros/cons there if you like.
>> >
>> >I would love to :).
>> 
>> Does LTP need to own the code, or can it incorporate existing projects
>>and
>> a sort of aggregator?
>
>That is possible as well but not optimal. This approach would need a
>wrapper script to convert the test exit values to be LTP compatible.
>
>> How much LTP harness type code needs to be used?
>
>Not much.
>
>For this complexity of tests you would just need to call the tst_resm()
>interface to report success/failure and, at the end of the test,
>tst_exit() to return the stored overall test status.
>
>And ideally call the standard option parsing code and call the test in
>standard loop so that the test can take advantage of standard options as
>number of iterations to run, etc.
>
>Have a look at:
>
>https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines
>
>there is simple test example as well as description of the interfaces.


Thanks Cyril,

I'll follow up with you in a couple weeks most likely. I have some urgent
things that will be taking all my time and then some until then. Feel free
to poke me though if I lose track of it :-)

-- 
Darren Hart Open Source Technology Center
darren.h...@intel.com   Intel Corporation



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Carlos O'Donell
On 05/14/2014 08:28 PM, H. Peter Anvin wrote:
> On 05/14/2014 01:56 PM, Davidlohr Bueso wrote:
>>>
 However, unless I'm sorely mistaken, the larger problem is that glibc
 removed the futex() call entirely, so these man pages don't describe
>>>
>>> I don't think futex() ever was in glibc--that's by design, and
>>> completely understandable: no user-space application would want to
>>> directly use futex(). 
>>
>> That's actually not quite true. There are plenty of software efforts out
>> there that use futex calls directly to implement userspace serialization
>> mechanisms as an alternative to the bulky sysv semaphores. I worked
>> closely with an in-memory DB project that makes heavy use of them. Not
>> everyone can simply rely on pthreads.
>>
> 
> More fundamentally, futex(2), like clone(2), are things that can be
> legitimately by user space without automatically breaking all of glibc.
>  There are some other things where that is *not* true, because glibc
> relies on being able to mediate all accesses to a kernel facility, but
> not here.

Careful there. There is *some* danger in using clone(2) because of the
coordination required to implement thread-local storage. I'm sure you're
aware of this, but I'd like the record to show that we're going to need
clear documentation of what's considered safe given the known
implementations.

Cheers,
Carlos.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> >> I've used LTP in the past (quite a bit), and I felt there was some
> >> advantage to keeping futextest independent.
> >
> >What advantages did you have in mind?
> 
> Not CVS was a big one at the time ;-)
> 
> OK, I don't mean to be disparaging here... But since you asked, back in
> '09 LTP had some test quality issues and I felt I could maintain futextest
> to a higher bar independently.

To be honest LTP was one of the messiest codebases I've seen and it was
hacked up by mostly clueless people (there were even tests with race
conditions that were carefully disabled in a way that was not easy to
see). It took me months to get to a state where it compiled fine on
major distributions.

Today we still have quite a bit of legacy code that needs to be cleaned
up, however that gets better every day.

And most of the testcases are pretty stable, etc. unfortunatelly LTP has
a bad reputation which is lot harder to fix than the code itself.

> >> Perhaps things have changed enough since then (~2009 era) that we
> >> should reconsider.
> >
> >I've been working on LTP for a about three years now and we happen to do
> >quite a lot in that time. The most visible changes would be more proper
> >development practices (git, proper build system, code review, LKML
> >coding style, documentation, ...) and also huge number of fixes. Now we
> >are trying to catch up in coverage too.
> >
> >> We can discuss the pros/cons there if you like.
> >
> >I would love to :).
> 
> Does LTP need to own the code, or can it incorporate existing projects and
> a sort of aggregator?

That is possible as well but not optimal. This approach would need a
wrapper script to convert the test exit values to be LTP compatible.

> How much LTP harness type code needs to be used?

Not much.

For this complexity of tests you would just need to call the tst_resm()
interface to report success/failure and, at the end of the test,
tst_exit() to return the stored overall test status.

And ideally call the standard option parsing code and call the test in
standard loop so that the test can take advantage of standard options as
number of iterations to run, etc.

Have a look at:

https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines

there is simple test example as well as description of the interfaces.

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Darren Hart
On 5/15/14, 9:30, "chru...@suse.cz"  wrote:

>Hi!
>> I've used LTP in the past (quite a bit), and I felt there was some
>> advantage to keeping futextest independent.
>
>What advantages did you have in mind?

Not CVS was a big one at the time ;-)

OK, I don't mean to be disparaging here... But since you asked, back in
'09 LTP had some test quality issues and I felt I could maintain futextest
to a higher bar independently.

>
>> Perhaps things have changed enough since then (~2009 era) that we
>> should reconsider.
>
>I've been working on LTP for a about three years now and we happen to do
>quite a lot in that time. The most visible changes would be more proper
>development practices (git, proper build system, code review, LKML
>coding style, documentation, ...) and also huge number of fixes. Now we
>are trying to catch up in coverage too.
>
>> We can discuss the pros/cons there if you like.
>
>I would love to :).

Does LTP need to own the code, or can it incorporate existing projects and
a sort of aggregator?

How much LTP harness type code needs to be used?

-- 
Darren Hart Open Source Technology Center
darren.h...@intel.com   Intel Corporation



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> > That is not the main concern here. If I extract the code I would have to
> > watch for any changes manually. If it was in a library or a separate
> > repository all that would be needed is to add it as dependency/git
> > submodule and I would get all updates automatically.
> > 
> 
> Yes, and for that to happen someone needs to do the work to extract it.
>  I don't have the cycles myself at the moment.

If that is the only problem, I should be able to allocate some time in
order to have a look at it.

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread H. Peter Anvin
On 05/15/2014 09:17 AM, chru...@suse.cz wrote:
>>
>> It should be quite easy to extract from klibc.
> 
> That is not the main concern here. If I extract the code I would have to
> watch for any changes manually. If it was in a library or a separate
> repository all that would be needed is to add it as dependency/git
> submodule and I would get all updates automatically.
> 

Yes, and for that to happen someone needs to do the work to extract it.
 I don't have the cycles myself at the moment.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> I've used LTP in the past (quite a bit), and I felt there was some
> advantage to keeping futextest independent.

What advantages did you have in mind?

> Perhaps things have changed enough since then (~2009 era) that we
> should reconsider.

I've been working on LTP for a about three years now and we happen to do
quite a lot in that time. The most visible changes would be more proper
development practices (git, proper build system, code review, LKML
coding style, documentation, ...) and also huge number of fixes. Now we
are trying to catch up in coverage too.

> We can discuss the pros/cons there if you like.

I would love to :).

> I have agreed to move the performance related tests over to perf, and
> Davidlohr has added some other such tests to perf. Trinity now covers
> the planned fuzz testing for futexes (very well... Obviously) so that
> idea will be dropped, leaving pure functional tests in futextest.

Well LTP mostly consists of functional tests, so that would fit the
purpose very well.

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> >> I really believe the proper fix is to use assembly syscall stubs.  In
> >> klibc I build a fairly elaborate machinery to autogenerate such syscall
> >> stubs for a variety of architectures.
> > 
> > Then it would be nice to share these between klibc and LTP (and possible
> > everybody else).
> > 
> 
> It should be quite easy to extract from klibc.

That is not the main concern here. If I extract the code I would have to
watch for any changes manually. If it was in a library or a separate
repository all that would be needed is to add it as dependency/git
submodule and I would get all updates automatically.

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Darren Hart
On 5/15/14, 8:28, "chru...@suse.cz"  wrote:

>Hi!
>> 
>> However, unless I'm sorely mistaken, the larger problem is that glibc
>> removed the futex() call entirely, so these man pages don't describe
>> something users even have access to anymore. I had to revert to calling
>> the syscalls directly in the futextest test suite because of this:
>> 
>> 
>>http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inc
>>lu
>
>So there actually exists some tests for futexes, I've been asked if we
>have these as a LTP[1] maintainer several times.
>
>Are these tests executed regulary as a part of some automated framework?
>If not it would make sense to port them to LTP (looking at the code that
>would be quite easy task) and get them executed by several QA
>departments for free. What do you think?
>
>[1] http://linux-test-project.github.io/

I've used LTP in the past (quite a bit), and I felt there was some
advantage to keeping futextest independent. Perhaps things have changed
enough since then (~2009 era) that we should reconsider. We can discuss
the pros/cons there if you like. I have agreed to move the performance
related tests over to perf, and Davidlohr has added some other such tests
to perf. Trinity now covers the planned fuzz testing for futexes (very
well... Obviously) so that idea will be dropped, leaving pure functional
tests in futextest.

-- 
Darren Hart Open Source Technology Center
darren.h...@intel.com   Intel Corporation



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread H. Peter Anvin
On 05/15/2014 09:01 AM, chru...@suse.cz wrote:
> 
>> I really believe the proper fix is to use assembly syscall stubs.  In
>> klibc I build a fairly elaborate machinery to autogenerate such syscall
>> stubs for a variety of architectures.
> 
> Then it would be nice to share these between klibc and LTP (and possible
> everybody else).
> 

It should be quite easy to extract from klibc.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> > Have a look at this commit that tries to deal with passing 64 bit
> > numbers to syscalls. On 32 bit ABI (but not on X32) these needs to be
> > split up (accordingly to machine endianity).
> > 
> > https://github.com/linux-test-project/ltp/commit/04afb02b4280a20c262054e8f99a3fad4ad54916
> > 
> 
> That is wrong, too.  That assumes that there will never be padding
> words, which isn't true in the general case, either.

Well, it's still far better than the mess we had previously and it works
in most of the cases. However I would love to fix these correctly once
for all.

> I really believe the proper fix is to use assembly syscall stubs.  In
> klibc I build a fairly elaborate machinery to autogenerate such syscall
> stubs for a variety of architectures.

Then it would be nice to share these between klibc and LTP (and possible
everybody else).

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread H. Peter Anvin
On 05/15/2014 08:42 AM, chru...@suse.cz wrote:
> Hi!
>> People have a number of times noted that there are problems
>> with syscall(), but I'm not knowledgeable on the details.
>> I'd happily take a patch to the man page (which, for historical
>> reasons, is actually syscall(2)) that explains the the problems 
>> (and ideally notes those platforms where there are no problems).
> 
> Have a look at this commit that tries to deal with passing 64 bit
> numbers to syscalls. On 32 bit ABI (but not on X32) these needs to be
> split up (accordingly to machine endianity).
> 
> https://github.com/linux-test-project/ltp/commit/04afb02b4280a20c262054e8f99a3fad4ad54916
> 

That is wrong, too.  That assumes that there will never be padding
words, which isn't true in the general case, either.

I really believe the proper fix is to use assembly syscall stubs.  In
klibc I build a fairly elaborate machinery to autogenerate such syscall
stubs for a variety of architectures.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Darren Hart
On 5/15/14, 1:13, "Peter Zijlstra"  wrote:

>On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
>> On 05/14/2014 03:03 PM, Michael Kerrisk (man-pages) wrote:
>> >> However, unless I'm sorely mistaken, the larger problem is that glibc
>> >> removed the futex() call entirely, so these man pages don't describe
>> > 
>> > I don't think futex() ever was in glibc--that's by design, and
>> > completely understandable: no user-space application would want to
>> > directly use futex(). (BTW, I mispoke in my earlier mail when I said I
>> > wanted documentation suitable for "writers of library functions" -- I
>> > meant suitable for "writers of *C library*".)
>> 
>> I fully agree with Michael here.
>> 
>> The futex() syscall was never exposed to userspace specifically because
>> it was an interface we did not want to support forever with a stable
>>ABI.
>> The futex() syscall is an implementation detail that is shared between
>> the kernel and the writers of core runtimes for Linux.
>
>That ship has sailed.. for one we must always support old glibc which
>uses the futex() syscall, and secondly there are known other programs
>that actually use the futex syscall.
>
>So that's really a non-argument, we're hard tied to the ABI.

Indeed. This is specifically why FUTEX_REQUEUE still exists (despite it's
bugs) when only FUTEX_CMP_REQUEUE should ever be used in new programs.


-- 
Darren Hart Open Source Technology Center
darren.h...@intel.com   Intel Corporation



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Darren Hart
On 5/15/14, 6:46, "Michael Kerrisk (man-pages)" 
wrote:

>On 05/15/2014 07:21 AM, Darren Hart wrote:
>> On 5/14/14, 17:18, "H. Peter Anvin"  wrote:
>> 
>>> On 05/14/2014 09:18 AM, Darren Hart wrote:

 However, unless I'm sorely mistaken, the larger problem is that glibc
 removed the futex() call entirely, so these man pages don't describe
 something users even have access to anymore. I had to revert to
calling
 the syscalls directly in the futextest test suite because of this:


 
http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/i
nc
 lu
 de/futextest.h#n67

>>>
>>> This really comes down to the fact that we should have a libinux which
>>> contains the basic system call wrapper machinery for Linux specific
>>> things and nothing else.
>>>
>>> syscall(3) is toxic and breaks randomly on some platforms.
>> 
>> Peter Z and I have had a good time discussing this in the past And
>> here it is again. :-)
>
>People have a number of times noted that there are problems
>with syscall(), but I'm not knowledgeable on the details.
>I'd happily take a patch to the man page (which, for historical
>reasons, is actually syscall(2)) that explains the the problems
>(and ideally notes those platforms where there are no problems).


>From my perspective, a named interface with specific documented interfaces
is far more usable than a vargs direct syscall. That just leaves all kinds
of room for error - which of course is why we all write our own wrappers
in our apps rather than use it directly... If we all do it, it seems to me
that is a strong indicator we should provide it in some kind of common
library. Maybe that's libc... Maybe that's libnix...
-- 
Darren Hart Open Source Technology Center
darren.h...@intel.com   Intel Corporation



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Steven Rostedt
On Thu, 15 May 2014 17:28:35 +0200
chru...@suse.cz wrote:

> Hi!
> > 
> > However, unless I'm sorely mistaken, the larger problem is that glibc
> > removed the futex() call entirely, so these man pages don't describe
> > something users even have access to anymore. I had to revert to calling
> > the syscalls directly in the futextest test suite because of this:
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inclu
> 
> So there actually exists some tests for futexes, I've been asked if we
> have these as a LTP[1] maintainer several times.
> 
> Are these tests executed regulary as a part of some automated framework?
> If not it would make sense to port them to LTP (looking at the code that
> would be quite easy task) and get them executed by several QA
> departments for free. What do you think?
> 
> [1] http://linux-test-project.github.io/
> 

I think Thomas may be working on one. If not, I'd be happy to start
writing one as well.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> People have a number of times noted that there are problems
> with syscall(), but I'm not knowledgeable on the details.
> I'd happily take a patch to the man page (which, for historical
> reasons, is actually syscall(2)) that explains the the problems 
> (and ideally notes those platforms where there are no problems).

Have a look at this commit that tries to deal with passing 64 bit
numbers to syscalls. On 32 bit ABI (but not on X32) these needs to be
split up (accordingly to machine endianity).

https://github.com/linux-test-project/ltp/commit/04afb02b4280a20c262054e8f99a3fad4ad54916

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> > However, unless I'm sorely mistaken, the larger problem is that glibc
> > removed the futex() call entirely, so these man pages don't describe
> > something users even have access to anymore. I had to revert to calling
> > the syscalls directly in the futextest test suite because of this:
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inclu
> > de/futextest.h#n67
> > 
> 
> This really comes down to the fact that we should have a libinux which
> contains the basic system call wrapper machinery for Linux specific
> things and nothing else.
> 
> syscall(3) is toxic and breaks randomly on some platforms.

+1

And while cleaning the LTP[1] testcases, we are slowly extracting the
special cases into commont code.

[1] http://linux-test-project.github.io/

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread chrubis
Hi!
> 
> However, unless I'm sorely mistaken, the larger problem is that glibc
> removed the futex() call entirely, so these man pages don't describe
> something users even have access to anymore. I had to revert to calling
> the syscalls directly in the futextest test suite because of this:
> 
> http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inclu

So there actually exists some tests for futexes, I've been asked if we
have these as a LTP[1] maintainer several times.

Are these tests executed regulary as a part of some automated framework?
If not it would make sense to port them to LTP (looking at the code that
would be quite easy task) and get them executed by several QA
departments for free. What do you think?

[1] http://linux-test-project.github.io/

-- 
Cyril Hrubis
chru...@suse.cz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Joseph S. Myers
On Wed, 14 May 2014, Davidlohr Bueso wrote:

> > If I'm wrong, or we can restore the futex() call, great. If not... Should
> > we keep the man-pages and document it as syscall(SYS_futex, ..., op, ...) ?
> 
> +1, is there anything preventing adding a futex wrapper... glibc folks?

See what I said at 
 (with references 
to previous discussions).  Someone needs to take the lead on pushing to 
consensus the question of what syscalls should have wrappers in glibc, and 
then implement the conclusions.

-- 
Joseph S. Myers
jos...@codesourcery.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Thu, May 15, 2014 at 10:39:09AM -0400, Carlos O'Donell wrote:
> For example does gettid *really* return a pid_t as considered by
> userspace? It's not a full out process...

Yeah, PIDs and TIDs are the same namespace in the kernel. All we have
are tasks and each task has an id. gettid() actually returns the id of
the current task.

getpid() returns the id of the thread group leader, so for that task
gettid() and getpid() return the same id.


pgpD2Abz0zILA.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread H. Peter Anvin
On 05/15/2014 06:46 AM, Michael Kerrisk (man-pages) wrote:
> 
> People have a number of times noted that there are problems
> with syscall(), but I'm not knowledgeable on the details.
> I'd happily take a patch to the man page (which, for historical
> reasons, is actually syscall(2)) that explains the the problems 
> (and ideally notes those platforms where there are no problems).
> 

It has to do with how ABIs deal with doublewidth arguments.

There is a reason why Linux syscall ABIs generally have a 1:1 mapping
with the user space ABIs, and why the system call argument is passed not
in the first argument but in a different place (usually a separately
clobbered register, e.g. %eax on x86-64).

On some platforms, doublewidth registers have to be aligned in register
pairs.  On some other platforms, enough arguments mean some will be
passed in memory, where they are forced to be aligned, or they are not
allowed to straddle the register-memory boundary.  All of this means
that padding words might be introduced, and they will be introduced in
the wrong place because of the additional argument introduced at the
beginning of the argument sequence.

On the other hand, the old SYSCALL user-space macros just plain didn't
handle doubleword arguments.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Carlos O'Donell
On 05/15/2014 09:49 AM, Michael Kerrisk (man-pages) wrote:
> On Thu, May 15, 2014 at 3:22 PM, Peter Zijlstra  wrote:
>> On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
>>> On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
 On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
> There are other syscalls like gettid() that have a:
> NOTE: There is no glibc wrapper for this system call; see NOTES.

 Yes, can we finally fix that please? It gets tedious having to endlessly
 copy/paste that thing around.
>>>
>>> What exactly would you like fixed?
>>
>> Not having gettid() in glibc.
> 
> Get in the line ;-).
> http://sourceware.org/bugzilla/show_bug.cgi?id=6399

I have no objections to this, but I absolutely object to this without
someone documenting and gathering consensus for consistent terminology
to be used between the kernel and glibc.

The relevant comment is here:
https://sourceware.org/bugzilla/show_bug.cgi?id=6399#c26

I'd like to see a glibc manual patch for the threads.texi file, which
can be completely linux-specific, to document gettid() and nomenclature.
It should talk about the nomenclature used to discuss these interfaces
and explain when it is or isn't valid to use a task id and with what 
functions.

For example does gettid *really* return a pid_t as considered by
userspace? It's not a full out process...

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Thomas Gleixner
On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
> And that universe would love to have your documentation of
> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),

I give you almost the full treatment, but I leave REQUEUE_PI to Darren
and FUTEX_WAKE_OP to Jakub. :)


FUTEX_WAIT

< Existing blurb seems ok >

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr.

[EINVAL] The supplied uaddr argument does not point to a valid
 object, i.e. pointer is not 4 byte aligned

[EINVAL] The supplied timeout argument is not normalized.

[EWOULDBLOCK] The atomic enqueueing failed. User space value
  at uaddr is not equal val argument.

[ETIMEDOUT] timeout expired 


FUTEX_WAKE

< Existing blurb seems ok >

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr.

[EINVAL] The supplied uaddr argument does not point to a valid
 object, i.e. pointer is not 4 byte aligned

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI

FUTEX_REQUEUE

Existing blurb seems ok , except for this:

The argument val contains the number of waiters on uaddr which
are immediately woken up.

The timeout argument is abused to transport the number of
waiters which are requeued to the futex at uaddr2. The pointer
is typecasted to u32.


[EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2

[EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
 valid object, i.e. pointer is not 4 byte aligned

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI on uaddr

[EINVAL] uaddr equal uaddr2. Requeue to same futex.

FUTEX_REQUEUE_CMP

Existing blurb seems ok , except for this:

The argument val is contains the number of waiters on uaddr
which are immediately woken up.

The timeout argument is abused to transport the number of
waiters which are requeued to the futex at uaddr2. The pointer
is typecasted to u32.

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2

[EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
 valid object, i.e. pointer is not 4 byte aligned

[EINVAL] uaddr equal uaddr2. Requeue to same futex.

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI on uaddr

[EAGAIN] uaddr1 readout is not equal the compare value in
 argument val3

FUTEX_WAKE_OP


Jakub, can you please explain it? I'm lost :)


The argument val contains the maximum number of waiters on
uaddr which are immediately woken up.

The timeout argument is abused to transport the maximum
number of waiters on uaddr2 which are woken up. The pointer
is typecasted to u32.

Related return values

[EFAULT] Kernel was unable to access the futex values at uaddr
 or uaddr2

[EINVAL] The supplied uaddr or uaddr2 argument does not point
 to a valid object, i.e. pointer is not 4 byte aligned

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI on uaddr


FUTEX_WAIT_BITSET

The same as FUTEX_WAIT except that val3 is used to provide a
32bit bitset to the kernel. This bitset is stored in the
kernel internal state of the waiter.

This futex op also allows to have the option bit
FUTEX_CLOCK_REALTIME set.

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr.

[EINVAL] The supplied uaddr argument does not point to a valid
 object, i.e. pointer is not 4 byte aligned

[EINVAL] The supplied bitset is zero.

[EINVAL] The supplied timeout argument is not normalized.

[ETIMEDOUT] timeout expired 


FUTEX_WAKE_BITSET

The same as FUTEX_WAKE except that val3 is used to provide a
32bit bitset to the kernel. This bitset is used to select
waiters on the futex. The selection is done by a bitwise AND
of the wake side supplied bitset and the bitset which is
stored in the kernel internal state of 

Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Thu, May 15, 2014 at 03:49:10PM +0200, Michael Kerrisk (man-pages) wrote:
> On Thu, May 15, 2014 at 3:22 PM, Peter Zijlstra  wrote:
> > On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
> >> On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
> >> > On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
> >> >> There are other syscalls like gettid() that have a:
> >> >> NOTE: There is no glibc wrapper for this system call; see NOTES.
> >> >
> >> > Yes, can we finally fix that please? It gets tedious having to endlessly
> >> > copy/paste that thing around.
> >>
> >> What exactly would you like fixed?
> >
> > Not having gettid() in glibc.
> 
> Get in the line ;-).
> http://sourceware.org/bugzilla/show_bug.cgi?id=6399

Oh hey, it moved.. :-) Hadn't seen the comments since the 2008
time-frame.


pgpdsvSMxjgUe.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Michael Kerrisk (man-pages)
On Thu, May 15, 2014 at 3:22 PM, Peter Zijlstra  wrote:
> On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
>> On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
>> > On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
>> >> There are other syscalls like gettid() that have a:
>> >> NOTE: There is no glibc wrapper for this system call; see NOTES.
>> >
>> > Yes, can we finally fix that please? It gets tedious having to endlessly
>> > copy/paste that thing around.
>>
>> What exactly would you like fixed?
>
> Not having gettid() in glibc.

Get in the line ;-).
http://sourceware.org/bugzilla/show_bug.cgi?id=6399

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Michael Kerrisk (man-pages)
On 05/15/2014 07:21 AM, Darren Hart wrote:
> On 5/14/14, 17:18, "H. Peter Anvin"  wrote:
> 
>> On 05/14/2014 09:18 AM, Darren Hart wrote:
>>>
>>> However, unless I'm sorely mistaken, the larger problem is that glibc
>>> removed the futex() call entirely, so these man pages don't describe
>>> something users even have access to anymore. I had to revert to calling
>>> the syscalls directly in the futextest test suite because of this:
>>>
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inc
>>> lu
>>> de/futextest.h#n67
>>>
>>
>> This really comes down to the fact that we should have a libinux which
>> contains the basic system call wrapper machinery for Linux specific
>> things and nothing else.
>>
>> syscall(3) is toxic and breaks randomly on some platforms.
> 
> Peter Z and I have had a good time discussing this in the past And
> here it is again. :-)

People have a number of times noted that there are problems
with syscall(), but I'm not knowledgeable on the details.
I'd happily take a patch to the man page (which, for historical
reasons, is actually syscall(2)) that explains the the problems 
(and ideally notes those platforms where there are no problems).

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
> On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
> > On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
> >> There are other syscalls like gettid() that have a:
> >> NOTE: There is no glibc wrapper for this system call; see NOTES.
> > 
> > Yes, can we finally fix that please? It gets tedious having to endlessly
> > copy/paste that thing around.
> 
> What exactly would you like fixed?

Not having gettid() in glibc.


pgp_vDCNtC0X6.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Carlos O'Donell
On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
>> There are other syscalls like gettid() that have a:
>> NOTE: There is no glibc wrapper for this system call; see NOTES.
> 
> Yes, can we finally fix that please? It gets tedious having to endlessly
> copy/paste that thing around.

What exactly would you like fixed?

Cheers,
Carlos.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Wed, May 14, 2014 at 10:21:52PM -0700, Darren Hart wrote:
> On 5/14/14, 17:18, "H. Peter Anvin"  wrote:
> 
> >On 05/14/2014 09:18 AM, Darren Hart wrote:
> >> 
> >> However, unless I'm sorely mistaken, the larger problem is that glibc
> >> removed the futex() call entirely, so these man pages don't describe
> >> something users even have access to anymore. I had to revert to calling
> >> the syscalls directly in the futextest test suite because of this:
> >> 
> >> 
> >>http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inc
> >>lu
> >> de/futextest.h#n67
> >> 
> >
> >This really comes down to the fact that we should have a libinux which
> >contains the basic system call wrapper machinery for Linux specific
> >things and nothing else.
> >
> >syscall(3) is toxic and breaks randomly on some platforms.
> 
> Peter Z and I have had a good time discussing this in the past And
> here it is again. :-)

Oh but we wanted _way_ more than bare syscalls in there ;-)

For a start we wanted to make the vDSO a proper DSO that gets included
in the (dynamic) link chain.

/sys/lib/libdso{32,64}.so like

That would also allow all those archs that expose raw dso function
pointers for things like cmpxchg or memory barriers to just provide
platform functions instead, far more usable.

And yes, we wanted to hijack libpthread in order to finally fix the
futex mess :-)



pgpo_DNKkeif7.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
> There are other syscalls like gettid() that have a:
> NOTE: There is no glibc wrapper for this system call; see NOTES.

Yes, can we finally fix that please? It gets tedious having to endlessly
copy/paste that thing around.


pgp1M1syNrefX.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
> On 05/14/2014 03:03 PM, Michael Kerrisk (man-pages) wrote:
> >> However, unless I'm sorely mistaken, the larger problem is that glibc
> >> removed the futex() call entirely, so these man pages don't describe
> > 
> > I don't think futex() ever was in glibc--that's by design, and
> > completely understandable: no user-space application would want to
> > directly use futex(). (BTW, I mispoke in my earlier mail when I said I
> > wanted documentation suitable for "writers of library functions" -- I
> > meant suitable for "writers of *C library*".)
> 
> I fully agree with Michael here.
> 
> The futex() syscall was never exposed to userspace specifically because
> it was an interface we did not want to support forever with a stable ABI.
> The futex() syscall is an implementation detail that is shared between
> the kernel and the writers of core runtimes for Linux.

That ship has sailed.. for one we must always support old glibc which
uses the futex() syscall, and secondly there are known other programs
that actually use the futex syscall.

So that's really a non-argument, we're hard tied to the ABI.


pgpcZdphnfUtn.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
 On 05/14/2014 03:03 PM, Michael Kerrisk (man-pages) wrote:
  However, unless I'm sorely mistaken, the larger problem is that glibc
  removed the futex() call entirely, so these man pages don't describe
  
  I don't think futex() ever was in glibc--that's by design, and
  completely understandable: no user-space application would want to
  directly use futex(). (BTW, I mispoke in my earlier mail when I said I
  wanted documentation suitable for writers of library functions -- I
  meant suitable for writers of *C library*.)
 
 I fully agree with Michael here.
 
 The futex() syscall was never exposed to userspace specifically because
 it was an interface we did not want to support forever with a stable ABI.
 The futex() syscall is an implementation detail that is shared between
 the kernel and the writers of core runtimes for Linux.

That ship has sailed.. for one we must always support old glibc which
uses the futex() syscall, and secondly there are known other programs
that actually use the futex syscall.

So that's really a non-argument, we're hard tied to the ABI.


pgpcZdphnfUtn.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
 There are other syscalls like gettid() that have a:
 NOTE: There is no glibc wrapper for this system call; see NOTES.

Yes, can we finally fix that please? It gets tedious having to endlessly
copy/paste that thing around.


pgp1M1syNrefX.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Wed, May 14, 2014 at 10:21:52PM -0700, Darren Hart wrote:
 On 5/14/14, 17:18, H. Peter Anvin h...@zytor.com wrote:
 
 On 05/14/2014 09:18 AM, Darren Hart wrote:
  
  However, unless I'm sorely mistaken, the larger problem is that glibc
  removed the futex() call entirely, so these man pages don't describe
  something users even have access to anymore. I had to revert to calling
  the syscalls directly in the futextest test suite because of this:
  
  
 http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inc
 lu
  de/futextest.h#n67
  
 
 This really comes down to the fact that we should have a libinux which
 contains the basic system call wrapper machinery for Linux specific
 things and nothing else.
 
 syscall(3) is toxic and breaks randomly on some platforms.
 
 Peter Z and I have had a good time discussing this in the past And
 here it is again. :-)

Oh but we wanted _way_ more than bare syscalls in there ;-)

For a start we wanted to make the vDSO a proper DSO that gets included
in the (dynamic) link chain.

/sys/lib/libdso{32,64}.so like

That would also allow all those archs that expose raw dso function
pointers for things like cmpxchg or memory barriers to just provide
platform functions instead, far more usable.

And yes, we wanted to hijack libpthread in order to finally fix the
futex mess :-)



pgpo_DNKkeif7.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Carlos O'Donell
On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
 On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
 There are other syscalls like gettid() that have a:
 NOTE: There is no glibc wrapper for this system call; see NOTES.
 
 Yes, can we finally fix that please? It gets tedious having to endlessly
 copy/paste that thing around.

What exactly would you like fixed?

Cheers,
Carlos.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
 On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
  On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
  There are other syscalls like gettid() that have a:
  NOTE: There is no glibc wrapper for this system call; see NOTES.
  
  Yes, can we finally fix that please? It gets tedious having to endlessly
  copy/paste that thing around.
 
 What exactly would you like fixed?

Not having gettid() in glibc.


pgp_vDCNtC0X6.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Michael Kerrisk (man-pages)
On 05/15/2014 07:21 AM, Darren Hart wrote:
 On 5/14/14, 17:18, H. Peter Anvin h...@zytor.com wrote:
 
 On 05/14/2014 09:18 AM, Darren Hart wrote:

 However, unless I'm sorely mistaken, the larger problem is that glibc
 removed the futex() call entirely, so these man pages don't describe
 something users even have access to anymore. I had to revert to calling
 the syscalls directly in the futextest test suite because of this:


 http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/tree/inc
 lu
 de/futextest.h#n67


 This really comes down to the fact that we should have a libinux which
 contains the basic system call wrapper machinery for Linux specific
 things and nothing else.

 syscall(3) is toxic and breaks randomly on some platforms.
 
 Peter Z and I have had a good time discussing this in the past And
 here it is again. :-)

People have a number of times noted that there are problems
with syscall(), but I'm not knowledgeable on the details.
I'd happily take a patch to the man page (which, for historical
reasons, is actually syscall(2)) that explains the the problems 
(and ideally notes those platforms where there are no problems).

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Michael Kerrisk (man-pages)
On Thu, May 15, 2014 at 3:22 PM, Peter Zijlstra pet...@infradead.org wrote:
 On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
 On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
  On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
  There are other syscalls like gettid() that have a:
  NOTE: There is no glibc wrapper for this system call; see NOTES.
 
  Yes, can we finally fix that please? It gets tedious having to endlessly
  copy/paste that thing around.

 What exactly would you like fixed?

 Not having gettid() in glibc.

Get in the line ;-).
http://sourceware.org/bugzilla/show_bug.cgi?id=6399

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread Peter Zijlstra
On Thu, May 15, 2014 at 03:49:10PM +0200, Michael Kerrisk (man-pages) wrote:
 On Thu, May 15, 2014 at 3:22 PM, Peter Zijlstra pet...@infradead.org wrote:
  On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
  On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
   On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
   There are other syscalls like gettid() that have a:
   NOTE: There is no glibc wrapper for this system call; see NOTES.
  
   Yes, can we finally fix that please? It gets tedious having to endlessly
   copy/paste that thing around.
 
  What exactly would you like fixed?
 
  Not having gettid() in glibc.
 
 Get in the line ;-).
 http://sourceware.org/bugzilla/show_bug.cgi?id=6399

Oh hey, it moved.. :-) Hadn't seen the comments since the 2008
time-frame.


pgpdsvSMxjgUe.pgp
Description: PGP signature


Re: futex(2) man page update help request

2014-05-15 Thread Thomas Gleixner
On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
 And that universe would love to have your documentation of
 FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),

I give you almost the full treatment, but I leave REQUEUE_PI to Darren
and FUTEX_WAKE_OP to Jakub. :)


FUTEX_WAIT

 Existing blurb seems ok 

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr.

[EINVAL] The supplied uaddr argument does not point to a valid
 object, i.e. pointer is not 4 byte aligned

[EINVAL] The supplied timeout argument is not normalized.

[EWOULDBLOCK] The atomic enqueueing failed. User space value
  at uaddr is not equal val argument.

[ETIMEDOUT] timeout expired 


FUTEX_WAKE

 Existing blurb seems ok 

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr.

[EINVAL] The supplied uaddr argument does not point to a valid
 object, i.e. pointer is not 4 byte aligned

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI

FUTEX_REQUEUE

Existing blurb seems ok , except for this:

The argument val contains the number of waiters on uaddr which
are immediately woken up.

The timeout argument is abused to transport the number of
waiters which are requeued to the futex at uaddr2. The pointer
is typecasted to u32.


[EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2

[EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
 valid object, i.e. pointer is not 4 byte aligned

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI on uaddr

[EINVAL] uaddr equal uaddr2. Requeue to same futex.

FUTEX_REQUEUE_CMP

Existing blurb seems ok , except for this:

The argument val is contains the number of waiters on uaddr
which are immediately woken up.

The timeout argument is abused to transport the number of
waiters which are requeued to the futex at uaddr2. The pointer
is typecasted to u32.

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2

[EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
 valid object, i.e. pointer is not 4 byte aligned

[EINVAL] uaddr equal uaddr2. Requeue to same futex.

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI on uaddr

[EAGAIN] uaddr1 readout is not equal the compare value in
 argument val3

FUTEX_WAKE_OP


Jakub, can you please explain it? I'm lost :)


The argument val contains the maximum number of waiters on
uaddr which are immediately woken up.

The timeout argument is abused to transport the maximum
number of waiters on uaddr2 which are woken up. The pointer
is typecasted to u32.

Related return values

[EFAULT] Kernel was unable to access the futex values at uaddr
 or uaddr2

[EINVAL] The supplied uaddr or uaddr2 argument does not point
 to a valid object, i.e. pointer is not 4 byte aligned

[EINVAL] The kernel detected inconsistent state between the
 user space state at uaddr and the kernel state,
 i.e. it detected a waiter which waits in
 FUTEX_LOCK_PI on uaddr


FUTEX_WAIT_BITSET

The same as FUTEX_WAIT except that val3 is used to provide a
32bit bitset to the kernel. This bitset is stored in the
kernel internal state of the waiter.

This futex op also allows to have the option bit
FUTEX_CLOCK_REALTIME set.

Related return values

[EFAULT] Kernel was unable to access the futex value at uaddr.

[EINVAL] The supplied uaddr argument does not point to a valid
 object, i.e. pointer is not 4 byte aligned

[EINVAL] The supplied bitset is zero.

[EINVAL] The supplied timeout argument is not normalized.

[ETIMEDOUT] timeout expired 


FUTEX_WAKE_BITSET

The same as FUTEX_WAKE except that val3 is used to provide a
32bit bitset to the kernel. This bitset is used to select
waiters on the futex. The selection is done by a bitwise AND
of the wake side supplied bitset and the bitset which is
stored in the kernel internal state of the 

Re: futex(2) man page update help request

2014-05-15 Thread Carlos O'Donell
On 05/15/2014 09:49 AM, Michael Kerrisk (man-pages) wrote:
 On Thu, May 15, 2014 at 3:22 PM, Peter Zijlstra pet...@infradead.org wrote:
 On Thu, May 15, 2014 at 09:18:22AM -0400, Carlos O'Donell wrote:
 On 05/15/2014 04:14 AM, Peter Zijlstra wrote:
 On Wed, May 14, 2014 at 04:23:38PM -0400, Carlos O'Donell wrote:
 There are other syscalls like gettid() that have a:
 NOTE: There is no glibc wrapper for this system call; see NOTES.

 Yes, can we finally fix that please? It gets tedious having to endlessly
 copy/paste that thing around.

 What exactly would you like fixed?

 Not having gettid() in glibc.
 
 Get in the line ;-).
 http://sourceware.org/bugzilla/show_bug.cgi?id=6399

I have no objections to this, but I absolutely object to this without
someone documenting and gathering consensus for consistent terminology
to be used between the kernel and glibc.

The relevant comment is here:
https://sourceware.org/bugzilla/show_bug.cgi?id=6399#c26

I'd like to see a glibc manual patch for the threads.texi file, which
can be completely linux-specific, to document gettid() and nomenclature.
It should talk about the nomenclature used to discuss these interfaces
and explain when it is or isn't valid to use a task id and with what 
functions.

For example does gettid *really* return a pid_t as considered by
userspace? It's not a full out process...

Cheers,
Carlos.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(2) man page update help request

2014-05-15 Thread H. Peter Anvin
On 05/15/2014 06:46 AM, Michael Kerrisk (man-pages) wrote:
 
 People have a number of times noted that there are problems
 with syscall(), but I'm not knowledgeable on the details.
 I'd happily take a patch to the man page (which, for historical
 reasons, is actually syscall(2)) that explains the the problems 
 (and ideally notes those platforms where there are no problems).
 

It has to do with how ABIs deal with doublewidth arguments.

There is a reason why Linux syscall ABIs generally have a 1:1 mapping
with the user space ABIs, and why the system call argument is passed not
in the first argument but in a different place (usually a separately
clobbered register, e.g. %eax on x86-64).

On some platforms, doublewidth registers have to be aligned in register
pairs.  On some other platforms, enough arguments mean some will be
passed in memory, where they are forced to be aligned, or they are not
allowed to straddle the register-memory boundary.  All of this means
that padding words might be introduced, and they will be introduced in
the wrong place because of the additional argument introduced at the
beginning of the argument sequence.

On the other hand, the old SYSCALL user-space macros just plain didn't
handle doubleword arguments.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >