Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Michael Kerrisk (man-pages)
On 12/18/2015 12:21 PM, Torvald Riegel wrote:
> On Tue, 2015-12-15 at 13:18 -0800, Darren Hart wrote:
>> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
>>>
>>>When executing a futex operation that requests to block a thread,
>>>the kernel will block only if the futex word has the  value  that
>>>the  calling  thread  supplied  (as  one  of the arguments of the
>>>futex() call) as the expected value of the futex word.  The load‐
>>>ing  of the futex word's value, the comparison of that value with
>>>the expected value, and the actual blocking  will  happen  atomi‐
>>>
>>> FIXME: for next line, it would be good to have an explanation of
>>> "totally ordered" somewhere around here.
>>>
>>>cally  and totally ordered with respect to concurrently executing
>>
>> Totally ordered with respect futex operations refers to semantics of the
>> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
>> writes. The kernel futex operations are protected by spinlocks, which ensure
>> that that all operations are serialized with respect to one another.
>>
>> This is a lot to attempt to define in this document. Perhaps a reference to
>> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
>> perhaps for this manual, "serialized" would be sufficient, with a footnote
>> regarding "totally ordered" and a pointer to the memory-barrier 
>> documentation?
> 
> I'd strongly prefer to document the semantics for users here.  

Yes, please.

> And I
> don't think users use the kernel's memory model -- instead, if we assume
> that most users will call futex ops from C or C++, then the best we have
> is the C11 / C++11 memory model.  

Agreed.

> Therefore, if we want to expand that,

I think we should. And by we, I mean you ;-)

> we should specify semantics in terms of as-if equivalence to C11 pseudo
> code.  I had proposed that in the past but, IIRC, Michael didn't want to
> add a C11 "dependency" in the semantics back then, at least for the
> initial release.

I'd like to avoid it if possible, since many of us don't understand
all the details of those C11 semantics--and by us, I mean
me :-/. But maybe I'll be forced to educate myself better.

> Here's what I wrote back then (atomic_*_relaxed() is like C11
> atomic_*(..., memory_order_relaxed), lock/unlock have normal C11 mutex
> semantics):
> 
> 
> 
> For example, we could say that futex_wait is, in terms of
> synchronization semantics, *as if* we'd execute a piece of C11 code.
> Here's a part of the docs for a glibc-internal futex wrapper that I'm
> working on; this is futex_wait ... :
> 
> /* Atomically wrt other futex operations, this blocks iff the value at
>*FUTEX matches the expected value.  This is semantically equivalent to: 
>  l =  (FUTEX);
>  wait_flag =  (FUTEX);
>  lock (l);
>  val = atomic_load_relaxed (FUTEX);
>  if (val != expected) { unlock (l); return EAGAIN; }
>  atomic_store_relaxed (wait_flag, 1);
>  unlock (l);
>  // Now block; can time out in futex_time_wait (see below)
>  while (atomic_load_relaxed(wait_flag));
> 
>Note that no guarantee of a happens-before relation between a woken
>futex_wait and a futex_wake is documented; however, this does not matter
>in practice because we have to consider spurious wake-ups (see below),
>and thus would not be able to reason which futex_wake woke us anyway.
> 
> 
> ... and this is futex_wake:
> 
> /* Atomically wrt other futex operations, this unblocks the specified
>number of processes, or all processes blocked on this futex if there are
>fewer than the specified number.  Semantically, this is equivalent to:
>  l =  (futex);
>  lock (l);
>  for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
>if () break;
>wf =  (futex);
>// No happens-before guarantee with woken futex_wait (see above)
>atomic_store_relaxed (wf, 0);
>  }
>  return res;
> 
> This allows a programmer to really infer the guarantees he/she can get
> from a futex in terms of synchronization, without the docs having to use
> prose to describe that.  This should also not constrain the kernel in
> terms of how to implement it, because it is a conceptual as-if relation
> (e.g., the kernel won't spin-wait the whole time, and we might want to
> make this clear for the PI case).
> 
> Of course, there are several as-if representations we could use, and we
> might want to be a bit more pseudo-code-ish to make this also easy to
> understand for people not familiar with C11 (e.g., using mutex + condvar
> with some relaxation of condvar guaranteees).

Okay -- I'm open to all of the above.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this 

Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Michael Kerrisk (man-pages)
On 12/18/2015 12:11 PM, Torvald Riegel wrote:
> On Wed, 2015-12-16 at 16:54 +0100, Michael Kerrisk (man-pages) wrote:
>> Hello Darren,
>>
>> On 12/15/2015 10:18 PM, Darren Hart wrote:
>>> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
>>
>> [...]
>>
When executing a futex operation that requests to block a thread,
the kernel will block only if the futex word has the  value  that
the  calling  thread  supplied  (as  one  of the arguments of the
futex() call) as the expected value of the futex word.  The load‐
ing  of the futex word's value, the comparison of that value with
the expected value, and the actual blocking  will  happen  atomi‐

 FIXME: for next line, it would be good to have an explanation of
 "totally ordered" somewhere around here.

cally  and totally ordered with respect to concurrently executing
>>>
>>> Totally ordered with respect futex operations refers to semantics of the
>>> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
>>> writes. The kernel futex operations are protected by spinlocks, which ensure
>>> that that all operations are serialized with respect to one another.
>>>
>>> This is a lot to attempt to define in this document. Perhaps a reference to
>>> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? 
>>> Or
>>> perhaps for this manual, "serialized" would be sufficient, with a footnote
>>> regarding "totally ordered" and a pointer to the memory-barrier 
>>> documentation?
>>
>> I think I'll just settle for writing serialized in the man page, and be 
>> done with it :-).
> 
> I'd prefer if you'd not just use "serialized" :)  

Sigh :-). Okay--removed.

> Eventually, I'd prefer
> if we can explain the semantics for the user in terms of the terminology
> and semantics of the memory model of the programming language that users
> will likely use to call futex ops (ie, C11 / C++11).

And I'd be really happy to see such an explanation land in the page.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Jonathan Wakely

On 18/12/15 12:11 +0100, Torvald Riegel wrote:

On Wed, 2015-12-16 at 16:54 +0100, Michael Kerrisk (man-pages) wrote:

Hello Darren,

On 12/15/2015 10:18 PM, Darren Hart wrote:
> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:

[...]

>>When executing a futex operation that requests to block a thread,
>>the kernel will block only if the futex word has the  value  that
>>the  calling  thread  supplied  (as  one  of the arguments of the
>>futex() call) as the expected value of the futex word.  The load‐
>>ing  of the futex word's value, the comparison of that value with
>>the expected value, and the actual blocking  will  happen  atomi‐
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>cally  and totally ordered with respect to concurrently executing
>
> Totally ordered with respect futex operations refers to semantics of the
> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> writes. The kernel futex operations are protected by spinlocks, which ensure
> that that all operations are serialized with respect to one another.
>
> This is a lot to attempt to define in this document. Perhaps a reference to
> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
> perhaps for this manual, "serialized" would be sufficient, with a footnote
> regarding "totally ordered" and a pointer to the memory-barrier documentation?

I think I'll just settle for writing serialized in the man page, and be
done with it :-).


I'd prefer if you'd not just use "serialized" :)  Eventually, I'd prefer
if we can explain the semantics for the user in terms of the terminology
and semantics of the memory model of the programming language that users
will likely use to call futex ops (ie, C11 / C++11).


FWIW a couple of uses of "serialized" were replaced in the C++11 final
draft due to comments pointing out that term is not defined in the
standard, see http://wg21.link/lwg1494 and http://wg21.link/lwg1504

That's not quite the same, because an ISO standard is supposed to
define all terms it uses, even for something like "serialized" where
the meaning is commonly understood by those in the field.

But I do like Torvald's suggestion to describe the semantics in
similar terms to C11, because that's the user-space model that
non-kernel folks (like me) are more likely to be familiar with.

Overall I like the new page a lot, I found it clear and readable. Nice
work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Torvald Riegel
On Tue, 2015-12-15 at 14:41 -0800, Davidlohr Bueso wrote:
> On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote:
> 
> >   When executing a futex operation that requests to block a thread,
> >   the kernel will block only if the futex word has the  value  that
> >   the  calling  thread  supplied  (as  one  of the arguments of the
> >   futex() call) as the expected value of the futex word.  The load???
> >   ing  of the futex word's value, the comparison of that value with
> >   the expected value, and the actual blocking  will  happen  atomi???
> >
> >FIXME: for next line, it would be good to have an explanation of
> >"totally ordered" somewhere around here.
> >
> >   cally  and totally ordered with respect to concurrently executing
> >   futex operations on the same futex word.
> 
> So there are two things here regarding ordering. One is the most obvious
> which is ordered due to the taking/dropping the hb spinlock.

I suppose that this means what is described in the manpage already?
That is, that futex operations (ie, the syscalls) are atomic wrt each
other and in a strict total order?

> Secondly, its
> the cases which Peter brought up a while ago that involves atomic futex ops
> futex_atomic_*(), which   do not have clearly defined semantics, and you 
> get
> inconsistencies with certain archs (tile being the worst iirc).

OK.  So, from a user's POV, this is about the semantics of the kernel's
accesses to the futex word.  I agree that specifying this more clearly
would be helpful.

First, there are the comparisons of the futex words used in, for
example, FUTEX_WAIT.  They should use an atomic load within the
conceptual critical sections that make up futex operations.  This load
itself doesn't need to establish any ordering, so it can be equivalent
to a C11 memory_order_relaxed load.  Are there any objections to that?

Second, We have the write accesses in FUTEX_[TRY]LOCK_PI and
FUTEX_UNLOCK_PI.  We already specify those as atomic and within the
conceptual critical sections of the futex operation.  In addition, they
should establish ordering themselves, so C11 have memory_order_acquire /
memory_order_release semantics.  Specifying this would be good.  Any
objections to these semantics?

Third, we have the atomic read-modify-write operation that is part of
FUTEX_WAKE_OP (ie, AFAIU, the case you pointed at specifically).  I
don't have a strong opinion on what it should be, because I think
userspace can enforce the orderings it needs on its own (eg, if I
interpret Peter Zijlstra's example correctly, userspace can add
appropriate fences before the CPU0/futex_unlock and after the
CPU2/futex_load calls).  FUTEX_WAKE_OP accesses no other userspace
memory location, so there's no ordering relation to other accesses to
userspace memory that userspace cannot affect.
OTOH, legacy userspace may have assumed strong semantics, so making the
read-modify-write have memory_order_seq_cst semantics is probably a safe
bet.  Futex operations typically shouldn't be on the fast paths anyway.

> But anyway, the important thing users need to know about is that the atomic
> futex operation must be totally ordered wrt any other user tasks that are 
> trying
> to access that address.

I'm not sure what you mean precisely.  One can't order the whole futex
operations totally wrt memory accesses by userspace because they'd need
to synchronize to do that, and thus userspace would to hvae either hook
into the kernel's synchronization or use HTM or such.

> This is not necessarily the case for kernel ops. Peter
> illustrates this nicely with lock stealing example; 
> (see https://lkml.org/lkml/2015/8/26/596).
> 
> Internally, I believe we decided that making it fully ordered (as opposed to
> making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having
> an MB ll/sc MB kind of setup.

OK.  So, any objections to documenting that the read-modify-write op in
FUTEX_WAKE_OP has memory_order_seq_cst semantics?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Torvald Riegel
On Tue, 2015-12-15 at 13:18 -0800, Darren Hart wrote:
> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> > 
> >When executing a futex operation that requests to block a thread,
> >the kernel will block only if the futex word has the  value  that
> >the  calling  thread  supplied  (as  one  of the arguments of the
> >futex() call) as the expected value of the futex word.  The load‐
> >ing  of the futex word's value, the comparison of that value with
> >the expected value, and the actual blocking  will  happen  atomi‐
> > 
> > FIXME: for next line, it would be good to have an explanation of
> > "totally ordered" somewhere around here.
> > 
> >cally  and totally ordered with respect to concurrently executing
> 
> Totally ordered with respect futex operations refers to semantics of the
> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> writes. The kernel futex operations are protected by spinlocks, which ensure
> that that all operations are serialized with respect to one another.
> 
> This is a lot to attempt to define in this document. Perhaps a reference to
> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
> perhaps for this manual, "serialized" would be sufficient, with a footnote
> regarding "totally ordered" and a pointer to the memory-barrier documentation?

I'd strongly prefer to document the semantics for users here.  And I
don't think users use the kernel's memory model -- instead, if we assume
that most users will call futex ops from C or C++, then the best we have
is the C11 / C++11 memory model.  Therefore, if we want to expand that,
we should specify semantics in terms of as-if equivalence to C11 pseudo
code.  I had proposed that in the past but, IIRC, Michael didn't want to
add a C11 "dependency" in the semantics back then, at least for the
initial release.

Here's what I wrote back then (atomic_*_relaxed() is like C11
atomic_*(..., memory_order_relaxed), lock/unlock have normal C11 mutex
semantics):



For example, we could say that futex_wait is, in terms of
synchronization semantics, *as if* we'd execute a piece of C11 code.
Here's a part of the docs for a glibc-internal futex wrapper that I'm
working on; this is futex_wait ... :

/* Atomically wrt other futex operations, this blocks iff the value at
   *FUTEX matches the expected value.  This is semantically equivalent to: 
 l =  (FUTEX);
 wait_flag =  (FUTEX);
 lock (l);
 val = atomic_load_relaxed (FUTEX);
 if (val != expected) { unlock (l); return EAGAIN; }
 atomic_store_relaxed (wait_flag, 1);
 unlock (l);
 // Now block; can time out in futex_time_wait (see below)
 while (atomic_load_relaxed(wait_flag));

   Note that no guarantee of a happens-before relation between a woken
   futex_wait and a futex_wake is documented; however, this does not matter
   in practice because we have to consider spurious wake-ups (see below),
   and thus would not be able to reason which futex_wake woke us anyway.


... and this is futex_wake:

/* Atomically wrt other futex operations, this unblocks the specified
   number of processes, or all processes blocked on this futex if there are
   fewer than the specified number.  Semantically, this is equivalent to:
 l =  (futex);
 lock (l);
 for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
   if () break;
   wf =  (futex);
   // No happens-before guarantee with woken futex_wait (see above)
   atomic_store_relaxed (wf, 0);
 }
 return res;

This allows a programmer to really infer the guarantees he/she can get
from a futex in terms of synchronization, without the docs having to use
prose to describe that.  This should also not constrain the kernel in
terms of how to implement it, because it is a conceptual as-if relation
(e.g., the kernel won't spin-wait the whole time, and we might want to
make this clear for the PI case).

Of course, there are several as-if representations we could use, and we
might want to be a bit more pseudo-code-ish to make this also easy to
understand for people not familiar with C11 (e.g., using mutex + condvar
with some relaxation of condvar guaranteees).

=

I will go through the discussion pointed out by Davidlohr next.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Torvald Riegel
On Wed, 2015-12-16 at 16:54 +0100, Michael Kerrisk (man-pages) wrote:
> Hello Darren,
> 
> On 12/15/2015 10:18 PM, Darren Hart wrote:
> > On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> 
> [...]
> 
> >>When executing a futex operation that requests to block a thread,
> >>the kernel will block only if the futex word has the  value  that
> >>the  calling  thread  supplied  (as  one  of the arguments of the
> >>futex() call) as the expected value of the futex word.  The load‐
> >>ing  of the futex word's value, the comparison of that value with
> >>the expected value, and the actual blocking  will  happen  atomi‐
> >>
> >> FIXME: for next line, it would be good to have an explanation of
> >> "totally ordered" somewhere around here.
> >>
> >>cally  and totally ordered with respect to concurrently executing
> > 
> > Totally ordered with respect futex operations refers to semantics of the
> > ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> > writes. The kernel futex operations are protected by spinlocks, which ensure
> > that that all operations are serialized with respect to one another.
> > 
> > This is a lot to attempt to define in this document. Perhaps a reference to
> > linux/Documentation/memory-barriers.txt as a footnote would be sufficient? 
> > Or
> > perhaps for this manual, "serialized" would be sufficient, with a footnote
> > regarding "totally ordered" and a pointer to the memory-barrier 
> > documentation?
> 
> I think I'll just settle for writing serialized in the man page, and be 
> done with it :-).

I'd prefer if you'd not just use "serialized" :)  Eventually, I'd prefer
if we can explain the semantics for the user in terms of the terminology
and semantics of the memory model of the programming language that users
will likely use to call futex ops (ie, C11 / C++11).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Torvald Riegel
On Wed, 2015-12-16 at 16:54 +0100, Michael Kerrisk (man-pages) wrote:
> Hello Darren,
> 
> On 12/15/2015 10:18 PM, Darren Hart wrote:
> > On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> 
> [...]
> 
> >>When executing a futex operation that requests to block a thread,
> >>the kernel will block only if the futex word has the  value  that
> >>the  calling  thread  supplied  (as  one  of the arguments of the
> >>futex() call) as the expected value of the futex word.  The load‐
> >>ing  of the futex word's value, the comparison of that value with
> >>the expected value, and the actual blocking  will  happen  atomi‐
> >>
> >> FIXME: for next line, it would be good to have an explanation of
> >> "totally ordered" somewhere around here.
> >>
> >>cally  and totally ordered with respect to concurrently executing
> > 
> > Totally ordered with respect futex operations refers to semantics of the
> > ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> > writes. The kernel futex operations are protected by spinlocks, which ensure
> > that that all operations are serialized with respect to one another.
> > 
> > This is a lot to attempt to define in this document. Perhaps a reference to
> > linux/Documentation/memory-barriers.txt as a footnote would be sufficient? 
> > Or
> > perhaps for this manual, "serialized" would be sufficient, with a footnote
> > regarding "totally ordered" and a pointer to the memory-barrier 
> > documentation?
> 
> I think I'll just settle for writing serialized in the man page, and be 
> done with it :-).

I'd prefer if you'd not just use "serialized" :)  Eventually, I'd prefer
if we can explain the semantics for the user in terms of the terminology
and semantics of the memory model of the programming language that users
will likely use to call futex ops (ie, C11 / C++11).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Torvald Riegel
On Tue, 2015-12-15 at 13:18 -0800, Darren Hart wrote:
> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> > 
> >When executing a futex operation that requests to block a thread,
> >the kernel will block only if the futex word has the  value  that
> >the  calling  thread  supplied  (as  one  of the arguments of the
> >futex() call) as the expected value of the futex word.  The load‐
> >ing  of the futex word's value, the comparison of that value with
> >the expected value, and the actual blocking  will  happen  atomi‐
> > 
> > FIXME: for next line, it would be good to have an explanation of
> > "totally ordered" somewhere around here.
> > 
> >cally  and totally ordered with respect to concurrently executing
> 
> Totally ordered with respect futex operations refers to semantics of the
> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> writes. The kernel futex operations are protected by spinlocks, which ensure
> that that all operations are serialized with respect to one another.
> 
> This is a lot to attempt to define in this document. Perhaps a reference to
> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
> perhaps for this manual, "serialized" would be sufficient, with a footnote
> regarding "totally ordered" and a pointer to the memory-barrier documentation?

I'd strongly prefer to document the semantics for users here.  And I
don't think users use the kernel's memory model -- instead, if we assume
that most users will call futex ops from C or C++, then the best we have
is the C11 / C++11 memory model.  Therefore, if we want to expand that,
we should specify semantics in terms of as-if equivalence to C11 pseudo
code.  I had proposed that in the past but, IIRC, Michael didn't want to
add a C11 "dependency" in the semantics back then, at least for the
initial release.

Here's what I wrote back then (atomic_*_relaxed() is like C11
atomic_*(..., memory_order_relaxed), lock/unlock have normal C11 mutex
semantics):



For example, we could say that futex_wait is, in terms of
synchronization semantics, *as if* we'd execute a piece of C11 code.
Here's a part of the docs for a glibc-internal futex wrapper that I'm
working on; this is futex_wait ... :

/* Atomically wrt other futex operations, this blocks iff the value at
   *FUTEX matches the expected value.  This is semantically equivalent to: 
 l =  (FUTEX);
 wait_flag =  (FUTEX);
 lock (l);
 val = atomic_load_relaxed (FUTEX);
 if (val != expected) { unlock (l); return EAGAIN; }
 atomic_store_relaxed (wait_flag, 1);
 unlock (l);
 // Now block; can time out in futex_time_wait (see below)
 while (atomic_load_relaxed(wait_flag));

   Note that no guarantee of a happens-before relation between a woken
   futex_wait and a futex_wake is documented; however, this does not matter
   in practice because we have to consider spurious wake-ups (see below),
   and thus would not be able to reason which futex_wake woke us anyway.


... and this is futex_wake:

/* Atomically wrt other futex operations, this unblocks the specified
   number of processes, or all processes blocked on this futex if there are
   fewer than the specified number.  Semantically, this is equivalent to:
 l =  (futex);
 lock (l);
 for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
   if () break;
   wf =  (futex);
   // No happens-before guarantee with woken futex_wait (see above)
   atomic_store_relaxed (wf, 0);
 }
 return res;

This allows a programmer to really infer the guarantees he/she can get
from a futex in terms of synchronization, without the docs having to use
prose to describe that.  This should also not constrain the kernel in
terms of how to implement it, because it is a conceptual as-if relation
(e.g., the kernel won't spin-wait the whole time, and we might want to
make this clear for the PI case).

Of course, there are several as-if representations we could use, and we
might want to be a bit more pseudo-code-ish to make this also easy to
understand for people not familiar with C11 (e.g., using mutex + condvar
with some relaxation of condvar guaranteees).

=

I will go through the discussion pointed out by Davidlohr next.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Torvald Riegel
On Tue, 2015-12-15 at 14:41 -0800, Davidlohr Bueso wrote:
> On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote:
> 
> >   When executing a futex operation that requests to block a thread,
> >   the kernel will block only if the futex word has the  value  that
> >   the  calling  thread  supplied  (as  one  of the arguments of the
> >   futex() call) as the expected value of the futex word.  The load???
> >   ing  of the futex word's value, the comparison of that value with
> >   the expected value, and the actual blocking  will  happen  atomi???
> >
> >FIXME: for next line, it would be good to have an explanation of
> >"totally ordered" somewhere around here.
> >
> >   cally  and totally ordered with respect to concurrently executing
> >   futex operations on the same futex word.
> 
> So there are two things here regarding ordering. One is the most obvious
> which is ordered due to the taking/dropping the hb spinlock.

I suppose that this means what is described in the manpage already?
That is, that futex operations (ie, the syscalls) are atomic wrt each
other and in a strict total order?

> Secondly, its
> the cases which Peter brought up a while ago that involves atomic futex ops
> futex_atomic_*(), which   do not have clearly defined semantics, and you 
> get
> inconsistencies with certain archs (tile being the worst iirc).

OK.  So, from a user's POV, this is about the semantics of the kernel's
accesses to the futex word.  I agree that specifying this more clearly
would be helpful.

First, there are the comparisons of the futex words used in, for
example, FUTEX_WAIT.  They should use an atomic load within the
conceptual critical sections that make up futex operations.  This load
itself doesn't need to establish any ordering, so it can be equivalent
to a C11 memory_order_relaxed load.  Are there any objections to that?

Second, We have the write accesses in FUTEX_[TRY]LOCK_PI and
FUTEX_UNLOCK_PI.  We already specify those as atomic and within the
conceptual critical sections of the futex operation.  In addition, they
should establish ordering themselves, so C11 have memory_order_acquire /
memory_order_release semantics.  Specifying this would be good.  Any
objections to these semantics?

Third, we have the atomic read-modify-write operation that is part of
FUTEX_WAKE_OP (ie, AFAIU, the case you pointed at specifically).  I
don't have a strong opinion on what it should be, because I think
userspace can enforce the orderings it needs on its own (eg, if I
interpret Peter Zijlstra's example correctly, userspace can add
appropriate fences before the CPU0/futex_unlock and after the
CPU2/futex_load calls).  FUTEX_WAKE_OP accesses no other userspace
memory location, so there's no ordering relation to other accesses to
userspace memory that userspace cannot affect.
OTOH, legacy userspace may have assumed strong semantics, so making the
read-modify-write have memory_order_seq_cst semantics is probably a safe
bet.  Futex operations typically shouldn't be on the fast paths anyway.

> But anyway, the important thing users need to know about is that the atomic
> futex operation must be totally ordered wrt any other user tasks that are 
> trying
> to access that address.

I'm not sure what you mean precisely.  One can't order the whole futex
operations totally wrt memory accesses by userspace because they'd need
to synchronize to do that, and thus userspace would to hvae either hook
into the kernel's synchronization or use HTM or such.

> This is not necessarily the case for kernel ops. Peter
> illustrates this nicely with lock stealing example; 
> (see https://lkml.org/lkml/2015/8/26/596).
> 
> Internally, I believe we decided that making it fully ordered (as opposed to
> making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having
> an MB ll/sc MB kind of setup.

OK.  So, any objections to documenting that the read-modify-write op in
FUTEX_WAKE_OP has memory_order_seq_cst semantics?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Jonathan Wakely

On 18/12/15 12:11 +0100, Torvald Riegel wrote:

On Wed, 2015-12-16 at 16:54 +0100, Michael Kerrisk (man-pages) wrote:

Hello Darren,

On 12/15/2015 10:18 PM, Darren Hart wrote:
> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:

[...]

>>When executing a futex operation that requests to block a thread,
>>the kernel will block only if the futex word has the  value  that
>>the  calling  thread  supplied  (as  one  of the arguments of the
>>futex() call) as the expected value of the futex word.  The load‐
>>ing  of the futex word's value, the comparison of that value with
>>the expected value, and the actual blocking  will  happen  atomi‐
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>cally  and totally ordered with respect to concurrently executing
>
> Totally ordered with respect futex operations refers to semantics of the
> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> writes. The kernel futex operations are protected by spinlocks, which ensure
> that that all operations are serialized with respect to one another.
>
> This is a lot to attempt to define in this document. Perhaps a reference to
> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
> perhaps for this manual, "serialized" would be sufficient, with a footnote
> regarding "totally ordered" and a pointer to the memory-barrier documentation?

I think I'll just settle for writing serialized in the man page, and be
done with it :-).


I'd prefer if you'd not just use "serialized" :)  Eventually, I'd prefer
if we can explain the semantics for the user in terms of the terminology
and semantics of the memory model of the programming language that users
will likely use to call futex ops (ie, C11 / C++11).


FWIW a couple of uses of "serialized" were replaced in the C++11 final
draft due to comments pointing out that term is not defined in the
standard, see http://wg21.link/lwg1494 and http://wg21.link/lwg1504

That's not quite the same, because an ISO standard is supposed to
define all terms it uses, even for something like "serialized" where
the meaning is commonly understood by those in the field.

But I do like Torvald's suggestion to describe the semantics in
similar terms to C11, because that's the user-space model that
non-kernel folks (like me) are more likely to be familiar with.

Overall I like the new page a lot, I found it clear and readable. Nice
work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Michael Kerrisk (man-pages)
On 12/18/2015 12:11 PM, Torvald Riegel wrote:
> On Wed, 2015-12-16 at 16:54 +0100, Michael Kerrisk (man-pages) wrote:
>> Hello Darren,
>>
>> On 12/15/2015 10:18 PM, Darren Hart wrote:
>>> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
>>
>> [...]
>>
When executing a futex operation that requests to block a thread,
the kernel will block only if the futex word has the  value  that
the  calling  thread  supplied  (as  one  of the arguments of the
futex() call) as the expected value of the futex word.  The load‐
ing  of the futex word's value, the comparison of that value with
the expected value, and the actual blocking  will  happen  atomi‐

 FIXME: for next line, it would be good to have an explanation of
 "totally ordered" somewhere around here.

cally  and totally ordered with respect to concurrently executing
>>>
>>> Totally ordered with respect futex operations refers to semantics of the
>>> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
>>> writes. The kernel futex operations are protected by spinlocks, which ensure
>>> that that all operations are serialized with respect to one another.
>>>
>>> This is a lot to attempt to define in this document. Perhaps a reference to
>>> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? 
>>> Or
>>> perhaps for this manual, "serialized" would be sufficient, with a footnote
>>> regarding "totally ordered" and a pointer to the memory-barrier 
>>> documentation?
>>
>> I think I'll just settle for writing serialized in the man page, and be 
>> done with it :-).
> 
> I'd prefer if you'd not just use "serialized" :)  

Sigh :-). Okay--removed.

> Eventually, I'd prefer
> if we can explain the semantics for the user in terms of the terminology
> and semantics of the memory model of the programming language that users
> will likely use to call futex ops (ie, C11 / C++11).

And I'd be really happy to see such an explanation land in the page.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-18 Thread Michael Kerrisk (man-pages)
On 12/18/2015 12:21 PM, Torvald Riegel wrote:
> On Tue, 2015-12-15 at 13:18 -0800, Darren Hart wrote:
>> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
>>>
>>>When executing a futex operation that requests to block a thread,
>>>the kernel will block only if the futex word has the  value  that
>>>the  calling  thread  supplied  (as  one  of the arguments of the
>>>futex() call) as the expected value of the futex word.  The load‐
>>>ing  of the futex word's value, the comparison of that value with
>>>the expected value, and the actual blocking  will  happen  atomi‐
>>>
>>> FIXME: for next line, it would be good to have an explanation of
>>> "totally ordered" somewhere around here.
>>>
>>>cally  and totally ordered with respect to concurrently executing
>>
>> Totally ordered with respect futex operations refers to semantics of the
>> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
>> writes. The kernel futex operations are protected by spinlocks, which ensure
>> that that all operations are serialized with respect to one another.
>>
>> This is a lot to attempt to define in this document. Perhaps a reference to
>> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
>> perhaps for this manual, "serialized" would be sufficient, with a footnote
>> regarding "totally ordered" and a pointer to the memory-barrier 
>> documentation?
> 
> I'd strongly prefer to document the semantics for users here.  

Yes, please.

> And I
> don't think users use the kernel's memory model -- instead, if we assume
> that most users will call futex ops from C or C++, then the best we have
> is the C11 / C++11 memory model.  

Agreed.

> Therefore, if we want to expand that,

I think we should. And by we, I mean you ;-)

> we should specify semantics in terms of as-if equivalence to C11 pseudo
> code.  I had proposed that in the past but, IIRC, Michael didn't want to
> add a C11 "dependency" in the semantics back then, at least for the
> initial release.

I'd like to avoid it if possible, since many of us don't understand
all the details of those C11 semantics--and by us, I mean
me :-/. But maybe I'll be forced to educate myself better.

> Here's what I wrote back then (atomic_*_relaxed() is like C11
> atomic_*(..., memory_order_relaxed), lock/unlock have normal C11 mutex
> semantics):
> 
> 
> 
> For example, we could say that futex_wait is, in terms of
> synchronization semantics, *as if* we'd execute a piece of C11 code.
> Here's a part of the docs for a glibc-internal futex wrapper that I'm
> working on; this is futex_wait ... :
> 
> /* Atomically wrt other futex operations, this blocks iff the value at
>*FUTEX matches the expected value.  This is semantically equivalent to: 
>  l =  (FUTEX);
>  wait_flag =  (FUTEX);
>  lock (l);
>  val = atomic_load_relaxed (FUTEX);
>  if (val != expected) { unlock (l); return EAGAIN; }
>  atomic_store_relaxed (wait_flag, 1);
>  unlock (l);
>  // Now block; can time out in futex_time_wait (see below)
>  while (atomic_load_relaxed(wait_flag));
> 
>Note that no guarantee of a happens-before relation between a woken
>futex_wait and a futex_wake is documented; however, this does not matter
>in practice because we have to consider spurious wake-ups (see below),
>and thus would not be able to reason which futex_wake woke us anyway.
> 
> 
> ... and this is futex_wake:
> 
> /* Atomically wrt other futex operations, this unblocks the specified
>number of processes, or all processes blocked on this futex if there are
>fewer than the specified number.  Semantically, this is equivalent to:
>  l =  (futex);
>  lock (l);
>  for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
>if () break;
>wf =  (futex);
>// No happens-before guarantee with woken futex_wait (see above)
>atomic_store_relaxed (wf, 0);
>  }
>  return res;
> 
> This allows a programmer to really infer the guarantees he/she can get
> from a futex in terms of synchronization, without the docs having to use
> prose to describe that.  This should also not constrain the kernel in
> terms of how to implement it, because it is a conceptual as-if relation
> (e.g., the kernel won't spin-wait the whole time, and we might want to
> make this clear for the PI case).
> 
> Of course, there are several as-if representations we could use, and we
> might want to be a bit more pseudo-code-ish to make this also easy to
> understand for people not familiar with C11 (e.g., using mutex + condvar
> with some relaxation of condvar guaranteees).

Okay -- I'm open to all of the above.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this 

Re: futex(3) man page, final draft for pre-release review

2015-12-16 Thread Michael Kerrisk (man-pages)
Hello Darren,

On 12/15/2015 10:18 PM, Darren Hart wrote:
> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:

[...]

>>When executing a futex operation that requests to block a thread,
>>the kernel will block only if the futex word has the  value  that
>>the  calling  thread  supplied  (as  one  of the arguments of the
>>futex() call) as the expected value of the futex word.  The load‐
>>ing  of the futex word's value, the comparison of that value with
>>the expected value, and the actual blocking  will  happen  atomi‐
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>cally  and totally ordered with respect to concurrently executing
> 
> Totally ordered with respect futex operations refers to semantics of the
> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> writes. The kernel futex operations are protected by spinlocks, which ensure
> that that all operations are serialized with respect to one another.
> 
> This is a lot to attempt to define in this document. Perhaps a reference to
> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
> perhaps for this manual, "serialized" would be sufficient, with a footnote
> regarding "totally ordered" and a pointer to the memory-barrier documentation?

I think I'll just settle for writing serialized in the man page, and be 
done with it :-).

>>futex operations on the same futex word.  Thus, the futex word is
>>used to connect the synchronization in user space with the imple‐
>>mentation of blocking by the kernel.  Analogously  to  an  atomic
>>compare-and-exchange  operation  that  potentially changes shared
>>memory, blocking via a futex is an atomic compare-and-block oper‐
>>ation.
> 
> ...
> 
>>Futex operations
>>The futex_op argument consists of two parts: a command that spec‐
>>ifies  the  operation to be performed, bit-wise ORed with zero or
>>or more options that modify the behaviour of the operation.   The
>>options that may be included in futex_op are as follows:
> 
> ...
> 
>>
>>FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
>>   This   option   bit   can   be   employed  only  with  the
>>   FUTEX_WAIT_BITSET and FUTEX_WAIT_REQUEUE_PI operations.
> 
> That caught me by surprise, but it's true. We reject FUTEX_WAIT |
> FUTEX_CLOCK_REALTIME, even though FUTEX_WAIT treated as FUTEX_WAIT_BITSET with
> val3=FUTEX_BITSET_MATCH_ANY.

You uncover all sorts of interesting stuff when you document APIs ;-).

> 
> Thomas, this looks like an oversight to me - do you recall if we intentionally
> disallow FUTEX_CLOCK_REALTIME with FUTEX_WAIT?
> 
>>   If this option is set, the kernel  treats  timeout  as  an
>>   absolute time based on CLOCK_REALTIME.
>>
>>   If  this  option  is not set, the kernel treats timeout as
>>   relative time, measured against the CLOCK_MONOTONIC clock.
> 
> ...
> 
>>Priority-inheritance futexes
> 
> ...
> 
>>*  If  the lock is owned and there are threads contending for the
>>   lock, then the FUTEX_WAITERS bit shall be  set  in  the  futex
>>   word's value; in other words, this value is:
>>
>>   FUTEX_WAITERS | TID
>>
>>
>>   (Note that is invalid for a PI futex word to have no owner and
> 
>   ^ it
> 
>>   FUTEX_WAITERS set.)
> ...
> 
>>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>>   This operation tries to acquire the futex at uaddr.  It is
>>   invoked when a user-space atomic acquire did  not  succeed
>>   because the futex word was not 0.
>>
>>
>> FIXME(Next sentence) The wording "The trylock in kernel" below 
>> needs clarification. Suggestions?
>>
>>   The trylock in kernel might succeed because the futex word
> 
> The lock acquisition might succeed in the kernel because the futex word

Already did some rewording here which I think makes things better.

>>   contains stale state (FUTEX_WAITERS and/or
>>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>>   futex died.  User space cannot handle this condition in  a
>>   race-free  manner,  but  the  kernel  can  fix this up and
>>   acquire the futex.
>>
>>   The uaddr2, val, timeout, and val3 arguments are ignored.
> 
> ...
> 
>>EXAMPLE
>>
>> FIXME I think it would be helpful here to say a few more words about
>>   the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
>>   Can someone propose something?
> 
> Hrm. It seems pretty straightforward to me. I guess I'm too close to it. What
> about it seems unclear and needs clarification?

On reflection, I agree that the difference is 

Re: futex(3) man page, final draft for pre-release review

2015-12-16 Thread Michael Kerrisk (man-pages)
Hi David,

On 12/15/2015 11:41 PM, Davidlohr Bueso wrote:
> On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote:
> 
>>   When executing a futex operation that requests to block a thread,
>>   the kernel will block only if the futex word has the  value  that
>>   the  calling  thread  supplied  (as  one  of the arguments of the
>>   futex() call) as the expected value of the futex word.  The load???
>>   ing  of the futex word's value, the comparison of that value with
>>   the expected value, and the actual blocking  will  happen  atomi???
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>   cally  and totally ordered with respect to concurrently executing
>>   futex operations on the same futex word.
> 
> So there are two things here regarding ordering. One is the most obvious
> which is ordered due to the taking/dropping the hb spinlock. Secondly, its
> the cases which Peter brought up a while ago that involves atomic futex ops
> futex_atomic_*(), which   do not have clearly defined semantics, and you 
> get
> inconsistencies with certain archs (tile being the worst iirc).
> 
> But anyway, the important thing users need to know about is that the atomic
> futex operation must be totally ordered wrt any other user tasks that are 
> trying
> to access that address. This is not necessarily the case for kernel ops. Peter
> illustrates this nicely with lock stealing example; 
> (see https://lkml.org/lkml/2015/8/26/596).

Thanks. I reworded things here a little.

> Internally, I believe we decided that making it fully ordered (as opposed to
> making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having
> an MB ll/sc MB kind of setup.
> 
> [...]
> 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>
>>   #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
>>   } while (0)
> 
> Nit, but for this we have err(3).

I don't much like them though (not in POSIX).

Thanks for the help David.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-16 Thread Michael Kerrisk (man-pages)
Hi David,

On 12/15/2015 11:41 PM, Davidlohr Bueso wrote:
> On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote:
> 
>>   When executing a futex operation that requests to block a thread,
>>   the kernel will block only if the futex word has the  value  that
>>   the  calling  thread  supplied  (as  one  of the arguments of the
>>   futex() call) as the expected value of the futex word.  The load???
>>   ing  of the futex word's value, the comparison of that value with
>>   the expected value, and the actual blocking  will  happen  atomi???
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>   cally  and totally ordered with respect to concurrently executing
>>   futex operations on the same futex word.
> 
> So there are two things here regarding ordering. One is the most obvious
> which is ordered due to the taking/dropping the hb spinlock. Secondly, its
> the cases which Peter brought up a while ago that involves atomic futex ops
> futex_atomic_*(), which   do not have clearly defined semantics, and you 
> get
> inconsistencies with certain archs (tile being the worst iirc).
> 
> But anyway, the important thing users need to know about is that the atomic
> futex operation must be totally ordered wrt any other user tasks that are 
> trying
> to access that address. This is not necessarily the case for kernel ops. Peter
> illustrates this nicely with lock stealing example; 
> (see https://lkml.org/lkml/2015/8/26/596).

Thanks. I reworded things here a little.

> Internally, I believe we decided that making it fully ordered (as opposed to
> making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having
> an MB ll/sc MB kind of setup.
> 
> [...]
> 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>
>>   #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
>>   } while (0)
> 
> Nit, but for this we have err(3).

I don't much like them though (not in POSIX).

Thanks for the help David.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-16 Thread Michael Kerrisk (man-pages)
Hello Darren,

On 12/15/2015 10:18 PM, Darren Hart wrote:
> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:

[...]

>>When executing a futex operation that requests to block a thread,
>>the kernel will block only if the futex word has the  value  that
>>the  calling  thread  supplied  (as  one  of the arguments of the
>>futex() call) as the expected value of the futex word.  The load‐
>>ing  of the futex word's value, the comparison of that value with
>>the expected value, and the actual blocking  will  happen  atomi‐
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>cally  and totally ordered with respect to concurrently executing
> 
> Totally ordered with respect futex operations refers to semantics of the
> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> writes. The kernel futex operations are protected by spinlocks, which ensure
> that that all operations are serialized with respect to one another.
> 
> This is a lot to attempt to define in this document. Perhaps a reference to
> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
> perhaps for this manual, "serialized" would be sufficient, with a footnote
> regarding "totally ordered" and a pointer to the memory-barrier documentation?

I think I'll just settle for writing serialized in the man page, and be 
done with it :-).

>>futex operations on the same futex word.  Thus, the futex word is
>>used to connect the synchronization in user space with the imple‐
>>mentation of blocking by the kernel.  Analogously  to  an  atomic
>>compare-and-exchange  operation  that  potentially changes shared
>>memory, blocking via a futex is an atomic compare-and-block oper‐
>>ation.
> 
> ...
> 
>>Futex operations
>>The futex_op argument consists of two parts: a command that spec‐
>>ifies  the  operation to be performed, bit-wise ORed with zero or
>>or more options that modify the behaviour of the operation.   The
>>options that may be included in futex_op are as follows:
> 
> ...
> 
>>
>>FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
>>   This   option   bit   can   be   employed  only  with  the
>>   FUTEX_WAIT_BITSET and FUTEX_WAIT_REQUEUE_PI operations.
> 
> That caught me by surprise, but it's true. We reject FUTEX_WAIT |
> FUTEX_CLOCK_REALTIME, even though FUTEX_WAIT treated as FUTEX_WAIT_BITSET with
> val3=FUTEX_BITSET_MATCH_ANY.

You uncover all sorts of interesting stuff when you document APIs ;-).

> 
> Thomas, this looks like an oversight to me - do you recall if we intentionally
> disallow FUTEX_CLOCK_REALTIME with FUTEX_WAIT?
> 
>>   If this option is set, the kernel  treats  timeout  as  an
>>   absolute time based on CLOCK_REALTIME.
>>
>>   If  this  option  is not set, the kernel treats timeout as
>>   relative time, measured against the CLOCK_MONOTONIC clock.
> 
> ...
> 
>>Priority-inheritance futexes
> 
> ...
> 
>>*  If  the lock is owned and there are threads contending for the
>>   lock, then the FUTEX_WAITERS bit shall be  set  in  the  futex
>>   word's value; in other words, this value is:
>>
>>   FUTEX_WAITERS | TID
>>
>>
>>   (Note that is invalid for a PI futex word to have no owner and
> 
>   ^ it
> 
>>   FUTEX_WAITERS set.)
> ...
> 
>>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>>   This operation tries to acquire the futex at uaddr.  It is
>>   invoked when a user-space atomic acquire did  not  succeed
>>   because the futex word was not 0.
>>
>>
>> FIXME(Next sentence) The wording "The trylock in kernel" below 
>> needs clarification. Suggestions?
>>
>>   The trylock in kernel might succeed because the futex word
> 
> The lock acquisition might succeed in the kernel because the futex word

Already did some rewording here which I think makes things better.

>>   contains stale state (FUTEX_WAITERS and/or
>>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>>   futex died.  User space cannot handle this condition in  a
>>   race-free  manner,  but  the  kernel  can  fix this up and
>>   acquire the futex.
>>
>>   The uaddr2, val, timeout, and val3 arguments are ignored.
> 
> ...
> 
>>EXAMPLE
>>
>> FIXME I think it would be helpful here to say a few more words about
>>   the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
>>   Can someone propose something?
> 
> Hrm. It seems pretty straightforward to me. I guess I'm too close to it. What
> about it seems unclear and needs clarification?

On reflection, I agree that the difference is 

Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Davidlohr Bueso

On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote:


  When executing a futex operation that requests to block a thread,
  the kernel will block only if the futex word has the  value  that
  the  calling  thread  supplied  (as  one  of the arguments of the
  futex() call) as the expected value of the futex word.  The load???
  ing  of the futex word's value, the comparison of that value with
  the expected value, and the actual blocking  will  happen  atomi???

FIXME: for next line, it would be good to have an explanation of
"totally ordered" somewhere around here.

  cally  and totally ordered with respect to concurrently executing
  futex operations on the same futex word.


So there are two things here regarding ordering. One is the most obvious
which is ordered due to the taking/dropping the hb spinlock. Secondly, its
the cases which Peter brought up a while ago that involves atomic futex ops
futex_atomic_*(), which do not have clearly defined semantics, and you get
inconsistencies with certain archs (tile being the worst iirc).

But anyway, the important thing users need to know about is that the atomic
futex operation must be totally ordered wrt any other user tasks that are trying
to access that address. This is not necessarily the case for kernel ops. Peter
illustrates this nicely with lock stealing example; 
(see https://lkml.org/lkml/2015/8/26/596).


Internally, I believe we decided that making it fully ordered (as opposed to
making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having
an MB ll/sc MB kind of setup.

[...]


  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 

  #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
  } while (0)


Nit, but for this we have err(3).



  static int *futex1, *futex2, *iaddr;

  static int
  futex(int *uaddr, int futex_op, int val,
const struct timespec *timeout, int *uaddr2, int val3)
  {
  return syscall(SYS_futex, uaddr, futex_op, val,
 timeout, uaddr, val3);
  }

  /* Acquire the futex pointed to by 'futexp': wait for its value to
 become 1, and then set the value to 0. */

  static void
  fwait(int *futexp)
  {
  int s;

  /* __sync_bool_compare_and_swap(ptr, oldval, newval) is a gcc
 built-in function.  It atomically performs the equivalent of:

 if (*ptr == oldval)
 *ptr = newval;

 It returns true if the test yielded true and *ptr was updated.
 The alternative here would be to employ the equivalent atomic
 machine-language instructions.  For further information, see
 the GCC Manual. */

  while (1) {

  /* Is the futex available? */

  if (__sync_bool_compare_and_swap(futexp, 1, 0))
  break;  /* Yes */

  /* Futex is not available; wait */

  s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
  if (s == -1 && errno != EAGAIN)
  errExit("futex-FUTEX_WAIT");
  }
  }

  /* Release the futex pointed to by 'futexp': if the futex currently
 has the value 0, set its value to 1 and the wake any futex waiters,
 so that if the peer is blocked in fpost(), it can proceed. */

  static void
  fpost(int *futexp)
  {
  int s;

  /* __sync_bool_compare_and_swap() was described in comments above */

  if (__sync_bool_compare_and_swap(futexp, 0, 1)) {

  s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
  if (s  == -1)
  errExit("futex-FUTEX_WAKE");
  }
  }

  int
  main(int argc, char *argv[])
  {
  pid_t childPid;
  int j, nloops;

  setbuf(stdout, NULL);

  nloops = (argc > 1) ? atoi(argv[1]) : 5;

  /* Create a shared anonymous mapping that will hold the futexes.
 Since the futexes are being shared between processes, we
 subsequently use the "shared" futex operations (i.e., not the
 ones suffixed "_PRIVATE") */

  iaddr = mmap(NULL, sizeof(int) * 2, PROT_READ | PROT_WRITE,
  MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  if (iaddr == MAP_FAILED)
  errExit("mmap");

  futex1 = [0];
  futex2 = [1];

  *futex1 = 0;/* State: unavailable */
  *futex2 = 1;/* State: available */

  /* Create a child process that inherits the shared anonymous
 mapping */

  childPid = fork();
  if (childPid == -1)
  errExit("fork");

  if (childPid == 0) {/* Child */
  for (j = 0; j < nloops; j++) {
  fwait(futex1);
  

Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Darren Hart
On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Hello all,
> 
> After much too long a time, the revised futex man page *will*
> go out in the next man pages release (it has been merged
> into master).
> 
> There are various places where the page could still be improved,
> but it is much better (and more than 5 times longer) than the
> existing page.
> 
> The rendered version of the page is shown below, so that people
> can make any final comments/suggestions for improvements
> before the release (but of course I'll also take any
> improvements after release as well). The page source is
> available from the Git repo 
> (http://git.kernel.org/cgit/docs/man-pages/man-pages.git).
> 
> As I mention above, there are various places where the page
> could still be better, so the rendered text below is annotated
> with some FIXMEs, in case anyone wants to address these before
> release.
> 
> Thanks
> 
> Michael

Fantastic! A few comments below.

...

> 
>When executing a futex operation that requests to block a thread,
>the kernel will block only if the futex word has the  value  that
>the  calling  thread  supplied  (as  one  of the arguments of the
>futex() call) as the expected value of the futex word.  The load‐
>ing  of the futex word's value, the comparison of that value with
>the expected value, and the actual blocking  will  happen  atomi‐
> 
> FIXME: for next line, it would be good to have an explanation of
> "totally ordered" somewhere around here.
> 
>cally  and totally ordered with respect to concurrently executing

Totally ordered with respect futex operations refers to semantics of the
ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
writes. The kernel futex operations are protected by spinlocks, which ensure
that that all operations are serialized with respect to one another.

This is a lot to attempt to define in this document. Perhaps a reference to
linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
perhaps for this manual, "serialized" would be sufficient, with a footnote
regarding "totally ordered" and a pointer to the memory-barrier documentation?

>futex operations on the same futex word.  Thus, the futex word is
>used to connect the synchronization in user space with the imple‐
>mentation of blocking by the kernel.  Analogously  to  an  atomic
>compare-and-exchange  operation  that  potentially changes shared
>memory, blocking via a futex is an atomic compare-and-block oper‐
>ation.

...

>Futex operations
>The futex_op argument consists of two parts: a command that spec‐
>ifies  the  operation to be performed, bit-wise ORed with zero or
>or more options that modify the behaviour of the operation.   The
>options that may be included in futex_op are as follows:

...

> 
>FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
>   This   option   bit   can   be   employed  only  with  the
>   FUTEX_WAIT_BITSET and FUTEX_WAIT_REQUEUE_PI operations.

That caught me by surprise, but it's true. We reject FUTEX_WAIT |
FUTEX_CLOCK_REALTIME, even though FUTEX_WAIT treated as FUTEX_WAIT_BITSET with
val3=FUTEX_BITSET_MATCH_ANY.

Thomas, this looks like an oversight to me - do you recall if we intentionally
disallow FUTEX_CLOCK_REALTIME with FUTEX_WAIT?

>   If this option is set, the kernel  treats  timeout  as  an
>   absolute time based on CLOCK_REALTIME.
> 
>   If  this  option  is not set, the kernel treats timeout as
>   relative time, measured against the CLOCK_MONOTONIC clock.

...

>Priority-inheritance futexes

...

>*  If  the lock is owned and there are threads contending for the
>   lock, then the FUTEX_WAITERS bit shall be  set  in  the  futex
>   word's value; in other words, this value is:
> 
>   FUTEX_WAITERS | TID
> 
> 
>   (Note that is invalid for a PI futex word to have no owner and

  ^ it

>   FUTEX_WAITERS set.)
...

>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>   This operation tries to acquire the futex at uaddr.  It is
>   invoked when a user-space atomic acquire did  not  succeed
>   because the futex word was not 0.
> 
> 
> FIXME(Next sentence) The wording "The trylock in kernel" below 
> needs clarification. Suggestions?
> 
>   The trylock in kernel might succeed because the futex word

The lock acquisition might succeed in the kernel because the futex word

>   contains stale state (FUTEX_WAITERS and/or
>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>   futex died.  User space cannot handle this condition in  a
>   race-free  manner,  but  the  kernel  can  fix this up and
>

Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Michael Kerrisk (man-pages)
Hello Torvald,

On 12/15/2015 04:34 PM, Torvald Riegel wrote:
> On Tue, 2015-12-15 at 14:43 +0100, Michael Kerrisk (man-pages) wrote:
>> Hello all,
>>
>> After much too long a time, the revised futex man page *will*
>> go out in the next man pages release (it has been merged
>> into master).
>>
>> There are various places where the page could still be improved,
>> but it is much better (and more than 5 times longer) than the
>> existing page.
> 
> This looks good to me; I just saw minor things (see below).  Thank you
> for all the work you put into this (and to everybody who contributed)!

Hey Torvald, you were one of the biggest contributors, so, thanks!

>>When executing a futex operation that requests to block a thread,
>>the kernel will block only if the futex word has the  value  that
>>the  calling  thread  supplied  (as  one  of the arguments of the
>>futex() call) as the expected value of the futex word.  The load‐
>>ing  of the futex word's value, the comparison of that value with
>>the expected value, and the actual blocking  will  happen  atomi‐
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>cally  and totally ordered with respect to concurrently executing
>>futex operations on the same futex word.  Thus, the futex word is
>>used to connect the synchronization in user space with the imple‐
>>mentation of blocking by the kernel.  Analogously  to  an  atomic
>>compare-and-exchange  operation  that  potentially changes shared
>>memory, blocking via a futex is an atomic compare-and-block oper‐
>>ation.
> 
> Maybe -- should we just say that it refers to the mathematical notion of
> a total order (or, technically, a strict total order in this case)?

I added a sentence along those lines.

> Though I would hope that everyone using futexes is roughly aware of the
> differences between partial and total orders.

>>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>>   This operation tries to acquire the futex at uaddr.  It is
> 
> s/futex/lock/ to make it consistent with FUTEX_LOCK.

Done.

>>   invoked when a user-space atomic acquire did  not  succeed
>>   because the futex word was not 0.
>>
>>
>> FIXME(Next sentence) The wording "The trylock in kernel" below 
>> needs clarification. Suggestions?
>>
>>   The trylock in kernel might succeed because the futex word
>>   contains stale state (FUTEX_WAITERS and/or
>>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>>   futex died.  User space cannot handle this condition in  a
>>   race-free  manner,  but  the  kernel  can  fix this up and
>>   acquire the futex.
>>
>>   The uaddr2, val, timeout, and val3 arguments are ignored.
> 
> What about "The acquisition of the lock might suceed if performed by the
> kernel in cases when the futex word contains stale state...".

Sounds good to me. Changed.

>>FUTEX_WAIT_REQUEUE_PI (since Linux 2.6.31)
>>   Wait  on  a  non-PI  futex  at  uaddr  and  potentially be
>>   requeued (via a FUTEX_CMP_REQUEUE_PI operation in  another
>>   task)  onto  a  PI futex at uaddr2.  The wait operation on
>>   uaddr is the same as for FUTEX_WAIT.
>>
>>   The waiter can be removed from the wait on  uaddr  without
>>   requeueing on uaddr2 via a FUTEX_WAKE operation in another
>>   task.  In this case, the  FUTEX_WAIT_REQUEUE_PI  operation
>>   returns with the error EWOULDBLOCK.
> 
> This should be EAGAIN, I suppose, or the enumeration of errors should
> include EWOULDBLOCK.

Changed. BTW, under ERRORS there is already this text:

  Note:  on Linux, the symbolic names EAGAIN and EWOULDBLOCK
  (both of which appear in different  parts  of  the  kernel
  futex code) have the same value.

Thanks for the comments, Torvald!

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Torvald Riegel
On Tue, 2015-12-15 at 14:43 +0100, Michael Kerrisk (man-pages) wrote:
> Hello all,
> 
> After much too long a time, the revised futex man page *will*
> go out in the next man pages release (it has been merged
> into master).
> 
> There are various places where the page could still be improved,
> but it is much better (and more than 5 times longer) than the
> existing page.

This looks good to me; I just saw minor things (see below).  Thank you
for all the work you put into this (and to everybody who contributed)!

>When executing a futex operation that requests to block a thread,
>the kernel will block only if the futex word has the  value  that
>the  calling  thread  supplied  (as  one  of the arguments of the
>futex() call) as the expected value of the futex word.  The load‐
>ing  of the futex word's value, the comparison of that value with
>the expected value, and the actual blocking  will  happen  atomi‐
> 
> FIXME: for next line, it would be good to have an explanation of
> "totally ordered" somewhere around here.
> 
>cally  and totally ordered with respect to concurrently executing
>futex operations on the same futex word.  Thus, the futex word is
>used to connect the synchronization in user space with the imple‐
>mentation of blocking by the kernel.  Analogously  to  an  atomic
>compare-and-exchange  operation  that  potentially changes shared
>memory, blocking via a futex is an atomic compare-and-block oper‐
>ation.

Maybe -- should we just say that it refers to the mathematical notion of
a total order (or, technically, a strict total order in this case)?
Though I would hope that everyone using futexes is roughly aware of the
differences between partial and total orders.

>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>   This operation tries to acquire the futex at uaddr.  It is

s/futex/lock/ to make it consistent with FUTEX_LOCK.

>   invoked when a user-space atomic acquire did  not  succeed
>   because the futex word was not 0.
> 
> 
> FIXME(Next sentence) The wording "The trylock in kernel" below 
> needs clarification. Suggestions?
> 
>   The trylock in kernel might succeed because the futex word
>   contains stale state (FUTEX_WAITERS and/or
>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>   futex died.  User space cannot handle this condition in  a
>   race-free  manner,  but  the  kernel  can  fix this up and
>   acquire the futex.
> 
>   The uaddr2, val, timeout, and val3 arguments are ignored.

What about "The acquisition of the lock might suceed if performed by the
kernel in cases when the futex word contains stale state...".

>FUTEX_WAIT_REQUEUE_PI (since Linux 2.6.31)
>   Wait  on  a  non-PI  futex  at  uaddr  and  potentially be
>   requeued (via a FUTEX_CMP_REQUEUE_PI operation in  another
>   task)  onto  a  PI futex at uaddr2.  The wait operation on
>   uaddr is the same as for FUTEX_WAIT.
> 
>   The waiter can be removed from the wait on  uaddr  without
>   requeueing on uaddr2 via a FUTEX_WAKE operation in another
>   task.  In this case, the  FUTEX_WAIT_REQUEUE_PI  operation
>   returns with the error EWOULDBLOCK.

This should be EAGAIN, I suppose, or the enumeration of errors should
include EWOULDBLOCK.

Torvald

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


futex(3) man page, final draft for pre-release review

2015-12-15 Thread Michael Kerrisk (man-pages)
Hello all,

After much too long a time, the revised futex man page *will*
go out in the next man pages release (it has been merged
into master).

There are various places where the page could still be improved,
but it is much better (and more than 5 times longer) than the
existing page.

The rendered version of the page is shown below, so that people
can make any final comments/suggestions for improvements
before the release (but of course I'll also take any
improvements after release as well). The page source is
available from the Git repo 
(http://git.kernel.org/cgit/docs/man-pages/man-pages.git).

As I mention above, there are various places where the page
could still be better, so the rendered text below is annotated
with some FIXMEs, in case anyone wants to address these before
release.

Thanks

Michael


   NAME
   futex - fast user-space locking

   SYNOPSIS
   #include 
   #include 

   int futex(int *uaddr, int futex_op, int val,
 const struct timespec *timeout,   /* or: uint32_t val2 */
 int *uaddr2, int val3);

   Note: There is no glibc wrapper for this system call; see NOTES.

   DESCRIPTION
   The  futex()  system  call  provides a method for waiting until a
   certain condition becomes true.  It is typically used as a block‐
   ing  construct  in  the context of shared-memory synchronization.
   When using futexes, the majority of  the  synchronization  opera‐
   tions  are  performed  in  user  space.   The  user-space program
   employs the futex() system call only when it is likely  that  the
   program  has  to  block  for  a  longer  time until the condition
   becomes true.  The program uses another futex() operation to wake
   anyone waiting for a particular condition.

   A futex is a 32-bit value—referred to below as a futex word—whose
   address is supplied to the futex() system call.  (Futexes are  32
   bits  in  size  on all platforms, including 64-bit systems.)  All
   futex operations are governed by this value.  In order to share a
   futex  between  processes,  the  futex  is  placed in a region of
   shared memory, created using (for example) mmap(2)  or  shmat(2).
   (Thus,  the  futex  word  may have different virtual addresses in
   different processes, but these addresses all refer  to  the  same
   location  in physical memory.)  In a multithreaded program, it is
   sufficient to place the futex word in a global variable shared by
   all threads.

   When executing a futex operation that requests to block a thread,
   the kernel will block only if the futex word has the  value  that
   the  calling  thread  supplied  (as  one  of the arguments of the
   futex() call) as the expected value of the futex word.  The load‐
   ing  of the futex word's value, the comparison of that value with
   the expected value, and the actual blocking  will  happen  atomi‐

FIXME: for next line, it would be good to have an explanation of
"totally ordered" somewhere around here.

   cally  and totally ordered with respect to concurrently executing
   futex operations on the same futex word.  Thus, the futex word is
   used to connect the synchronization in user space with the imple‐
   mentation of blocking by the kernel.  Analogously  to  an  atomic
   compare-and-exchange  operation  that  potentially changes shared
   memory, blocking via a futex is an atomic compare-and-block oper‐
   ation.

   One  use  of futexes is for implementing locks.  The state of the
   lock (i.e., acquired or not acquired) can be  represented  as  an
   atomically  accessed  flag  in shared memory.  In the uncontended
   case, a thread can access or modify the lock  state  with  atomic
   instructions,   for  example  atomically  changing  it  from  not
   acquired  to  acquired  using  an   atomic   compare-and-exchange
   instruction.   (Such  instructions are performed entirely in user
   mode, and the kernel maintains  no  information  about  the  lock
   state.)   On  the other hand, a thread may be unable to acquire a
   lock because it is already acquired by another thread.   It  then
   may pass the lock's flag as a futex word and the value represent‐
   ing the acquired state as the expected value to  a  futex()  wait
   operation.   This futex() call will block if and only if the lock
   is still acquired.  When releasing the  lock,  a  thread  has  to
   first  reset  the  lock  state to not acquired and then execute a
   futex operation that wakes threads blocked on the lock flag  used
   as a futex word (this can be be further optimized to avoid unnec‐
   essary wake-ups).  See futex(7) for more detail  on  how  to  use
   futexes.

   Besides the basic wait and wake-up futex functionality, there are
   further futex operations 

Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Michael Kerrisk (man-pages)
Hello Torvald,

On 12/15/2015 04:34 PM, Torvald Riegel wrote:
> On Tue, 2015-12-15 at 14:43 +0100, Michael Kerrisk (man-pages) wrote:
>> Hello all,
>>
>> After much too long a time, the revised futex man page *will*
>> go out in the next man pages release (it has been merged
>> into master).
>>
>> There are various places where the page could still be improved,
>> but it is much better (and more than 5 times longer) than the
>> existing page.
> 
> This looks good to me; I just saw minor things (see below).  Thank you
> for all the work you put into this (and to everybody who contributed)!

Hey Torvald, you were one of the biggest contributors, so, thanks!

>>When executing a futex operation that requests to block a thread,
>>the kernel will block only if the futex word has the  value  that
>>the  calling  thread  supplied  (as  one  of the arguments of the
>>futex() call) as the expected value of the futex word.  The load‐
>>ing  of the futex word's value, the comparison of that value with
>>the expected value, and the actual blocking  will  happen  atomi‐
>>
>> FIXME: for next line, it would be good to have an explanation of
>> "totally ordered" somewhere around here.
>>
>>cally  and totally ordered with respect to concurrently executing
>>futex operations on the same futex word.  Thus, the futex word is
>>used to connect the synchronization in user space with the imple‐
>>mentation of blocking by the kernel.  Analogously  to  an  atomic
>>compare-and-exchange  operation  that  potentially changes shared
>>memory, blocking via a futex is an atomic compare-and-block oper‐
>>ation.
> 
> Maybe -- should we just say that it refers to the mathematical notion of
> a total order (or, technically, a strict total order in this case)?

I added a sentence along those lines.

> Though I would hope that everyone using futexes is roughly aware of the
> differences between partial and total orders.

>>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>>   This operation tries to acquire the futex at uaddr.  It is
> 
> s/futex/lock/ to make it consistent with FUTEX_LOCK.

Done.

>>   invoked when a user-space atomic acquire did  not  succeed
>>   because the futex word was not 0.
>>
>>
>> FIXME(Next sentence) The wording "The trylock in kernel" below 
>> needs clarification. Suggestions?
>>
>>   The trylock in kernel might succeed because the futex word
>>   contains stale state (FUTEX_WAITERS and/or
>>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>>   futex died.  User space cannot handle this condition in  a
>>   race-free  manner,  but  the  kernel  can  fix this up and
>>   acquire the futex.
>>
>>   The uaddr2, val, timeout, and val3 arguments are ignored.
> 
> What about "The acquisition of the lock might suceed if performed by the
> kernel in cases when the futex word contains stale state...".

Sounds good to me. Changed.

>>FUTEX_WAIT_REQUEUE_PI (since Linux 2.6.31)
>>   Wait  on  a  non-PI  futex  at  uaddr  and  potentially be
>>   requeued (via a FUTEX_CMP_REQUEUE_PI operation in  another
>>   task)  onto  a  PI futex at uaddr2.  The wait operation on
>>   uaddr is the same as for FUTEX_WAIT.
>>
>>   The waiter can be removed from the wait on  uaddr  without
>>   requeueing on uaddr2 via a FUTEX_WAKE operation in another
>>   task.  In this case, the  FUTEX_WAIT_REQUEUE_PI  operation
>>   returns with the error EWOULDBLOCK.
> 
> This should be EAGAIN, I suppose, or the enumeration of errors should
> include EWOULDBLOCK.

Changed. BTW, under ERRORS there is already this text:

  Note:  on Linux, the symbolic names EAGAIN and EWOULDBLOCK
  (both of which appear in different  parts  of  the  kernel
  futex code) have the same value.

Thanks for the comments, Torvald!

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Darren Hart
On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Hello all,
> 
> After much too long a time, the revised futex man page *will*
> go out in the next man pages release (it has been merged
> into master).
> 
> There are various places where the page could still be improved,
> but it is much better (and more than 5 times longer) than the
> existing page.
> 
> The rendered version of the page is shown below, so that people
> can make any final comments/suggestions for improvements
> before the release (but of course I'll also take any
> improvements after release as well). The page source is
> available from the Git repo 
> (http://git.kernel.org/cgit/docs/man-pages/man-pages.git).
> 
> As I mention above, there are various places where the page
> could still be better, so the rendered text below is annotated
> with some FIXMEs, in case anyone wants to address these before
> release.
> 
> Thanks
> 
> Michael

Fantastic! A few comments below.

...

> 
>When executing a futex operation that requests to block a thread,
>the kernel will block only if the futex word has the  value  that
>the  calling  thread  supplied  (as  one  of the arguments of the
>futex() call) as the expected value of the futex word.  The load‐
>ing  of the futex word's value, the comparison of that value with
>the expected value, and the actual blocking  will  happen  atomi‐
> 
> FIXME: for next line, it would be good to have an explanation of
> "totally ordered" somewhere around here.
> 
>cally  and totally ordered with respect to concurrently executing

Totally ordered with respect futex operations refers to semantics of the
ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
writes. The kernel futex operations are protected by spinlocks, which ensure
that that all operations are serialized with respect to one another.

This is a lot to attempt to define in this document. Perhaps a reference to
linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
perhaps for this manual, "serialized" would be sufficient, with a footnote
regarding "totally ordered" and a pointer to the memory-barrier documentation?

>futex operations on the same futex word.  Thus, the futex word is
>used to connect the synchronization in user space with the imple‐
>mentation of blocking by the kernel.  Analogously  to  an  atomic
>compare-and-exchange  operation  that  potentially changes shared
>memory, blocking via a futex is an atomic compare-and-block oper‐
>ation.

...

>Futex operations
>The futex_op argument consists of two parts: a command that spec‐
>ifies  the  operation to be performed, bit-wise ORed with zero or
>or more options that modify the behaviour of the operation.   The
>options that may be included in futex_op are as follows:

...

> 
>FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
>   This   option   bit   can   be   employed  only  with  the
>   FUTEX_WAIT_BITSET and FUTEX_WAIT_REQUEUE_PI operations.

That caught me by surprise, but it's true. We reject FUTEX_WAIT |
FUTEX_CLOCK_REALTIME, even though FUTEX_WAIT treated as FUTEX_WAIT_BITSET with
val3=FUTEX_BITSET_MATCH_ANY.

Thomas, this looks like an oversight to me - do you recall if we intentionally
disallow FUTEX_CLOCK_REALTIME with FUTEX_WAIT?

>   If this option is set, the kernel  treats  timeout  as  an
>   absolute time based on CLOCK_REALTIME.
> 
>   If  this  option  is not set, the kernel treats timeout as
>   relative time, measured against the CLOCK_MONOTONIC clock.

...

>Priority-inheritance futexes

...

>*  If  the lock is owned and there are threads contending for the
>   lock, then the FUTEX_WAITERS bit shall be  set  in  the  futex
>   word's value; in other words, this value is:
> 
>   FUTEX_WAITERS | TID
> 
> 
>   (Note that is invalid for a PI futex word to have no owner and

  ^ it

>   FUTEX_WAITERS set.)
...

>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>   This operation tries to acquire the futex at uaddr.  It is
>   invoked when a user-space atomic acquire did  not  succeed
>   because the futex word was not 0.
> 
> 
> FIXME(Next sentence) The wording "The trylock in kernel" below 
> needs clarification. Suggestions?
> 
>   The trylock in kernel might succeed because the futex word

The lock acquisition might succeed in the kernel because the futex word

>   contains stale state (FUTEX_WAITERS and/or
>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>   futex died.  User space cannot handle this condition in  a
>   race-free  manner,  but  the  kernel  can  fix this up and
>

Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Davidlohr Bueso

On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote:


  When executing a futex operation that requests to block a thread,
  the kernel will block only if the futex word has the  value  that
  the  calling  thread  supplied  (as  one  of the arguments of the
  futex() call) as the expected value of the futex word.  The load???
  ing  of the futex word's value, the comparison of that value with
  the expected value, and the actual blocking  will  happen  atomi???

FIXME: for next line, it would be good to have an explanation of
"totally ordered" somewhere around here.

  cally  and totally ordered with respect to concurrently executing
  futex operations on the same futex word.


So there are two things here regarding ordering. One is the most obvious
which is ordered due to the taking/dropping the hb spinlock. Secondly, its
the cases which Peter brought up a while ago that involves atomic futex ops
futex_atomic_*(), which do not have clearly defined semantics, and you get
inconsistencies with certain archs (tile being the worst iirc).

But anyway, the important thing users need to know about is that the atomic
futex operation must be totally ordered wrt any other user tasks that are trying
to access that address. This is not necessarily the case for kernel ops. Peter
illustrates this nicely with lock stealing example; 
(see https://lkml.org/lkml/2015/8/26/596).


Internally, I believe we decided that making it fully ordered (as opposed to
making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having
an MB ll/sc MB kind of setup.

[...]


  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 

  #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
  } while (0)


Nit, but for this we have err(3).



  static int *futex1, *futex2, *iaddr;

  static int
  futex(int *uaddr, int futex_op, int val,
const struct timespec *timeout, int *uaddr2, int val3)
  {
  return syscall(SYS_futex, uaddr, futex_op, val,
 timeout, uaddr, val3);
  }

  /* Acquire the futex pointed to by 'futexp': wait for its value to
 become 1, and then set the value to 0. */

  static void
  fwait(int *futexp)
  {
  int s;

  /* __sync_bool_compare_and_swap(ptr, oldval, newval) is a gcc
 built-in function.  It atomically performs the equivalent of:

 if (*ptr == oldval)
 *ptr = newval;

 It returns true if the test yielded true and *ptr was updated.
 The alternative here would be to employ the equivalent atomic
 machine-language instructions.  For further information, see
 the GCC Manual. */

  while (1) {

  /* Is the futex available? */

  if (__sync_bool_compare_and_swap(futexp, 1, 0))
  break;  /* Yes */

  /* Futex is not available; wait */

  s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
  if (s == -1 && errno != EAGAIN)
  errExit("futex-FUTEX_WAIT");
  }
  }

  /* Release the futex pointed to by 'futexp': if the futex currently
 has the value 0, set its value to 1 and the wake any futex waiters,
 so that if the peer is blocked in fpost(), it can proceed. */

  static void
  fpost(int *futexp)
  {
  int s;

  /* __sync_bool_compare_and_swap() was described in comments above */

  if (__sync_bool_compare_and_swap(futexp, 0, 1)) {

  s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
  if (s  == -1)
  errExit("futex-FUTEX_WAKE");
  }
  }

  int
  main(int argc, char *argv[])
  {
  pid_t childPid;
  int j, nloops;

  setbuf(stdout, NULL);

  nloops = (argc > 1) ? atoi(argv[1]) : 5;

  /* Create a shared anonymous mapping that will hold the futexes.
 Since the futexes are being shared between processes, we
 subsequently use the "shared" futex operations (i.e., not the
 ones suffixed "_PRIVATE") */

  iaddr = mmap(NULL, sizeof(int) * 2, PROT_READ | PROT_WRITE,
  MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  if (iaddr == MAP_FAILED)
  errExit("mmap");

  futex1 = [0];
  futex2 = [1];

  *futex1 = 0;/* State: unavailable */
  *futex2 = 1;/* State: available */

  /* Create a child process that inherits the shared anonymous
 mapping */

  childPid = fork();
  if (childPid == -1)
  errExit("fork");

  if (childPid == 0) {/* Child */
  for (j = 0; j < nloops; j++) {
  fwait(futex1);
  

futex(3) man page, final draft for pre-release review

2015-12-15 Thread Michael Kerrisk (man-pages)
Hello all,

After much too long a time, the revised futex man page *will*
go out in the next man pages release (it has been merged
into master).

There are various places where the page could still be improved,
but it is much better (and more than 5 times longer) than the
existing page.

The rendered version of the page is shown below, so that people
can make any final comments/suggestions for improvements
before the release (but of course I'll also take any
improvements after release as well). The page source is
available from the Git repo 
(http://git.kernel.org/cgit/docs/man-pages/man-pages.git).

As I mention above, there are various places where the page
could still be better, so the rendered text below is annotated
with some FIXMEs, in case anyone wants to address these before
release.

Thanks

Michael


   NAME
   futex - fast user-space locking

   SYNOPSIS
   #include 
   #include 

   int futex(int *uaddr, int futex_op, int val,
 const struct timespec *timeout,   /* or: uint32_t val2 */
 int *uaddr2, int val3);

   Note: There is no glibc wrapper for this system call; see NOTES.

   DESCRIPTION
   The  futex()  system  call  provides a method for waiting until a
   certain condition becomes true.  It is typically used as a block‐
   ing  construct  in  the context of shared-memory synchronization.
   When using futexes, the majority of  the  synchronization  opera‐
   tions  are  performed  in  user  space.   The  user-space program
   employs the futex() system call only when it is likely  that  the
   program  has  to  block  for  a  longer  time until the condition
   becomes true.  The program uses another futex() operation to wake
   anyone waiting for a particular condition.

   A futex is a 32-bit value—referred to below as a futex word—whose
   address is supplied to the futex() system call.  (Futexes are  32
   bits  in  size  on all platforms, including 64-bit systems.)  All
   futex operations are governed by this value.  In order to share a
   futex  between  processes,  the  futex  is  placed in a region of
   shared memory, created using (for example) mmap(2)  or  shmat(2).
   (Thus,  the  futex  word  may have different virtual addresses in
   different processes, but these addresses all refer  to  the  same
   location  in physical memory.)  In a multithreaded program, it is
   sufficient to place the futex word in a global variable shared by
   all threads.

   When executing a futex operation that requests to block a thread,
   the kernel will block only if the futex word has the  value  that
   the  calling  thread  supplied  (as  one  of the arguments of the
   futex() call) as the expected value of the futex word.  The load‐
   ing  of the futex word's value, the comparison of that value with
   the expected value, and the actual blocking  will  happen  atomi‐

FIXME: for next line, it would be good to have an explanation of
"totally ordered" somewhere around here.

   cally  and totally ordered with respect to concurrently executing
   futex operations on the same futex word.  Thus, the futex word is
   used to connect the synchronization in user space with the imple‐
   mentation of blocking by the kernel.  Analogously  to  an  atomic
   compare-and-exchange  operation  that  potentially changes shared
   memory, blocking via a futex is an atomic compare-and-block oper‐
   ation.

   One  use  of futexes is for implementing locks.  The state of the
   lock (i.e., acquired or not acquired) can be  represented  as  an
   atomically  accessed  flag  in shared memory.  In the uncontended
   case, a thread can access or modify the lock  state  with  atomic
   instructions,   for  example  atomically  changing  it  from  not
   acquired  to  acquired  using  an   atomic   compare-and-exchange
   instruction.   (Such  instructions are performed entirely in user
   mode, and the kernel maintains  no  information  about  the  lock
   state.)   On  the other hand, a thread may be unable to acquire a
   lock because it is already acquired by another thread.   It  then
   may pass the lock's flag as a futex word and the value represent‐
   ing the acquired state as the expected value to  a  futex()  wait
   operation.   This futex() call will block if and only if the lock
   is still acquired.  When releasing the  lock,  a  thread  has  to
   first  reset  the  lock  state to not acquired and then execute a
   futex operation that wakes threads blocked on the lock flag  used
   as a futex word (this can be be further optimized to avoid unnec‐
   essary wake-ups).  See futex(7) for more detail  on  how  to  use
   futexes.

   Besides the basic wait and wake-up futex functionality, there are
   further futex operations 

Re: futex(3) man page, final draft for pre-release review

2015-12-15 Thread Torvald Riegel
On Tue, 2015-12-15 at 14:43 +0100, Michael Kerrisk (man-pages) wrote:
> Hello all,
> 
> After much too long a time, the revised futex man page *will*
> go out in the next man pages release (it has been merged
> into master).
> 
> There are various places where the page could still be improved,
> but it is much better (and more than 5 times longer) than the
> existing page.

This looks good to me; I just saw minor things (see below).  Thank you
for all the work you put into this (and to everybody who contributed)!

>When executing a futex operation that requests to block a thread,
>the kernel will block only if the futex word has the  value  that
>the  calling  thread  supplied  (as  one  of the arguments of the
>futex() call) as the expected value of the futex word.  The load‐
>ing  of the futex word's value, the comparison of that value with
>the expected value, and the actual blocking  will  happen  atomi‐
> 
> FIXME: for next line, it would be good to have an explanation of
> "totally ordered" somewhere around here.
> 
>cally  and totally ordered with respect to concurrently executing
>futex operations on the same futex word.  Thus, the futex word is
>used to connect the synchronization in user space with the imple‐
>mentation of blocking by the kernel.  Analogously  to  an  atomic
>compare-and-exchange  operation  that  potentially changes shared
>memory, blocking via a futex is an atomic compare-and-block oper‐
>ation.

Maybe -- should we just say that it refers to the mathematical notion of
a total order (or, technically, a strict total order in this case)?
Though I would hope that everyone using futexes is roughly aware of the
differences between partial and total orders.

>FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>   This operation tries to acquire the futex at uaddr.  It is

s/futex/lock/ to make it consistent with FUTEX_LOCK.

>   invoked when a user-space atomic acquire did  not  succeed
>   because the futex word was not 0.
> 
> 
> FIXME(Next sentence) The wording "The trylock in kernel" below 
> needs clarification. Suggestions?
> 
>   The trylock in kernel might succeed because the futex word
>   contains stale state (FUTEX_WAITERS and/or
>   FUTEX_OWNER_DIED).   This can happen when the owner of the
>   futex died.  User space cannot handle this condition in  a
>   race-free  manner,  but  the  kernel  can  fix this up and
>   acquire the futex.
> 
>   The uaddr2, val, timeout, and val3 arguments are ignored.

What about "The acquisition of the lock might suceed if performed by the
kernel in cases when the futex word contains stale state...".

>FUTEX_WAIT_REQUEUE_PI (since Linux 2.6.31)
>   Wait  on  a  non-PI  futex  at  uaddr  and  potentially be
>   requeued (via a FUTEX_CMP_REQUEUE_PI operation in  another
>   task)  onto  a  PI futex at uaddr2.  The wait operation on
>   uaddr is the same as for FUTEX_WAIT.
> 
>   The waiter can be removed from the wait on  uaddr  without
>   requeueing on uaddr2 via a FUTEX_WAKE operation in another
>   task.  In this case, the  FUTEX_WAIT_REQUEUE_PI  operation
>   returns with the error EWOULDBLOCK.

This should be EAGAIN, I suppose, or the enumeration of errors should
include EWOULDBLOCK.

Torvald

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/