Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2024-03-09 Thread Jonathan Wakely
On Sat, 9 Mar 2024 at 12:18, Jonathan Wakely  wrote:

>
>
>
> +template
>> +  __wait_result_type
>> +  __wait_for(const __platform_wait_t* __addr, __wait_args __args,
>> +const chrono::duration<_Rep, _Period>& __rtime) noexcept
>> +{
>> +  if (!__rtime.count())
>> +   // no rtime supplied, just spin a bit
>> +   return __detail::__wait_impl(__addr, __args |
>> __wait_flags::__spin_only);
>>
>
> This should set __do_spin | __spin_only if the latter no longer implies
> the former.
>
>
>>
>> +enum class __wait_flags : uint32_t
>>  {
>>
>
>
>> +   __abi_version = 0,
>> +   __proxy_wait = 1,
>> +   __track_contention = 2,
>> +   __do_spin = 4,
>> +   __spin_only = 8 | __do_spin, // implies __do_spin
>>
>
> This should be just 8 and not also imply __do_spin.
>


Alternatively, we could have:

   __spin_only = 4,
   __spin_then_wait = 8,
   __do_spin = __spin_only | __spin_then_wait,

Then testing (flags & do_spin) would be true if either of the others is
set, but (flags & spin_only) would do the right thing.

But if we have spin_then_wait in the default flags, a caller that wants to
use spin_only has to clear the spin_then_wait flag, otherwise there are two
mutually exclusive flags set at once.

I think I prefer:

   __do_spin = 4,
   __spin_only = 8, // Ignored unless __do_spin is also set.

as this is simpler for callers.


Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2024-03-09 Thread Jonathan Wakely
On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely  wrote:

> From: Thomas Rodgers 
>
> These two patches were written by Tom earlier this year, before he left
> Red Hat. We should finish reviewing them for GCC 14 (and probably squash
> them into one?)
>
> Tom, you mentioned further work that changes the __platform_wait_t* to
> uintptr_t, is that ready, or likely to be ready very soon?
>
> Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.
>
>
> -- >8 --
>
> This represents a major refactoring of the previous atomic::wait
> and atomic::notify implementation detail. The aim of this change
> is to simplify the implementation details and position the resulting
> implementation so that much of the current header-only detail
> can be moved into the shared library, while also accounting for
> anticipated changes to wait/notify functionality for C++26.
>
> The previous implementation implemented spin logic in terms of
> the types __default_spin_policy, __timed_backoff_spin_policy, and
> the free function __atomic_spin. These are replaced in favor of
> two new free functions; __spin_impl and __spin_until_impl. These
> currently inline free functions are expected to be moved into the
> libstdc++ shared library in a future commit.
>
> The previous implementation derived untimed and timed wait
> implementation detail from __detail::__waiter_pool_base. This
> is-a relationship is removed in the new version and the previous
> implementation detail is renamed to reflect this change. The
> static _S_for member has been renamed as well to indicate that it
> returns the __waiter_pool_impl entry in the static 'side table'
> for a given awaited address.
>
> This new implementation replaces all of the non-templated waiting
> detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
> __bare_wait with the __wait_impl free function, and the supporting
> __wait_flags enum and __wait_args struct. This currenly inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This new implementation replaces all of the non-templated notifying
> detail of __waiter_base, __waiter_pool, and __waiter with the
> __notify_impl free function. This currently inline free function
> is expected to be moved into the libstdc++ shared library in a
> future commit.
>
> The __atomic_wait_address template function is updated to account
> for the above changes and to support the expected C++26 change to
> pass the most recent observed value to the caller supplied predicate.
>
> A new non-templated __atomic_wait_address_v free function is added
> that only works for atomic types that operate only on __platform_wait_t
> and requires the caller to supply a memory order. This is intended
> to be the simplest code path for such types.
>
> The __atomic_wait_address_v template function is now implemented in
> terms of new __atomic_wait_address template and continues to accept
> a user supplied "value function" to retrieve the current value of
> the atomic.
>
> The __atomic_notify_address template function is updated to account
> for the above changes.
>
> The template __platform_wait_until_impl is renamed to
> __wait_clock_t. The previous __platform_wait_until template is deleted
> and the functionality previously provided is moved t the new tempalate
> function __wait_until. A similar change is made to the
> __cond_wait_until_impl/__cond_wait_until implementation.
>
> This new implementation similarly replaces all of the non-templated
> waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
> the new __wait_until_impl free function. This currently inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This implementation replaces all templated waiting functions that
> manage clock conversion as well as relative waiting (wait_for) with
> the new template functions __wait_until and __wait_for.
>
> Similarly the previous implementation detail for the various
> __atomic_wait_address_Xxx templates is adjusted to account for the
> implementation changes outlined above.
>
> All of the "bare wait" versions of __atomic_wait_Xxx have been removed
> and replaced with a defaulted boolean __bare_wait parameter on the
> new version of these templates.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_timed_wait.h:
> (__detail::__platform_wait_until_impl): Rename to
> __platform_wait_until.
> (__detail::__platform_wait_until): Remove previous
> definition.
> (__detail::__cond_wait_until_impl): Rename to
> __cond_wait_until.
> (__detail::__cond_wait_until): Remove previous
> definition.
> (__detail::__spin_until_impl): New function.
> (__detail::__wait_until_impl): New function.
> (__detail::__wait_until): New function.
> (__detail::__wait_for): New function.
> (__detail::__timed_waiter_pool): Remove type.
> 

Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-14 Thread Nate Eldredge

On Thu, 14 Dec 2023, Thomas Rodgers wrote:


I need to look at this a bit more (and not on my phone, at lunch).
Ultimately, C++26 expects to add predicate waits and returning a
‘tri-state’ result isn’t something that’s been considered or likely to be
approved.


Ok, then that seems to fit best with my second suggestion: the predicate 
should only test the value and do nothing else, and actually trying to 
decrement the semaphore is left up to the caller _M_acquire(), which must 
then retry __atomic_wait_address if the compare-exchange fails.


--
Nate Eldredge
n...@thatsmathematics.com


Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-14 Thread Thomas Rodgers
I need to look at this a bit more (and not on my phone, at lunch).
Ultimately, C++26 expects to add predicate waits and returning a
‘tri-state’ result isn’t something that’s been considered or likely to be
approved.

On Mon, Dec 11, 2023 at 12:18 PM Jonathan Wakely 
wrote:

> CCing Tom's current address, as he's not @redhat.com now.
>
> On Mon, 11 Dec 2023, 19:24 Nate Eldredge, 
> wrote:
>
>> On Mon, 11 Dec 2023, Nate Eldredge wrote:
>>
>> > To fix, we need something like `__args._M_old = __val;` inside the loop
>> in
>> > __atomic_wait_address(), so that we always wait on the exact value that
>> the
>> > predicate __pred() rejected.  Again, there are similar instances in
>> > atomic_timed_wait.h.
>>
>> Thinking through this, there's another problem.  The main loop in
>> __atomic_wait_address() would then be:
>>
>>while (!__pred(__val))
>>  {
>>__args._M_old = __val;
>>__detail::__wait_impl(__wait_addr, &__args);
>>__val = __vfn();
>>  }
>>
>> The idea being that we only call __wait_impl to wait on a value that the
>> predicate said was unacceptable.  But looking for instance at its caller
>> __atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate
>> passed in is _S_do_try_acquire(), whose code is:
>>
>>  _S_do_try_acquire(__detail::__platform_wait_t* __counter,
>>__detail::__platform_wait_t __old) noexcept
>>  {
>>if (__old == 0)
>>  return false;
>>
>>return __atomic_impl::compare_exchange_strong(__counter,
>>  __old, __old - 1,
>>
>>  memory_order::acquire,
>>
>>  memory_order::relaxed);
>>  }
>>
>> It returns false if the value passed in was unacceptable (i.e. zero),
>> *or*
>> if it was nonzero (let's say 1) but the compare_exchange failed because
>> another thread swooped in and modified the semaphore counter.  In that
>> latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which
>> is likewise bad.  If the counter is externally changed back to 1 just
>> before we call __platform_wait (that's the futex call), we would go to
>> sleep waiting on a semaphore that is already available: deadlock.
>>
>> I guess there's a couple ways to fix it.
>>
>> You could have the "predicate" callback instead return a tri-state value:
>> "all done, stop waiting" (like current true), "value passed is not
>> acceptable" (like current false), and "value was acceptable but something
>> else went wrong".  Only the second case should result in calling
>> __wait_impl().  In the third case, __atomic_wait_address() should
>> just reload the value (using __vfn()) and loop again.
>>
>> Or, make the callback __pred() a pure predicate that only tests its input
>> value for acceptability, without any other side effects.  Then have
>> __atomic_wait_address() simply return as soon as __pred(__val) returns
>> true.  It would be up to the caller to actually decrement the semaphore
>> or
>> whatever, and to call __atomic_wait_address() again if this fails.  In
>> that case, __atomic_wait_address should probably return the final value
>> that was read, so the caller can immediately do something like a
>> compare-exchange using it, and not have to do an additional load and
>> predicate test.
>>
>> Or, make __pred() a pure predicate as before, and give
>> __atomic_wait_address yet one more callback function argument, call it
>> __taker(), whose job is to acquire the semaphore etc, and have
>> __atomic_wait_address call it after __pred(__val) returns true.
>>
>> --
>> Nate Eldredge
>> n...@thatsmathematics.com
>>
>>


Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-11 Thread Jonathan Wakely
CCing Tom's current address, as he's not @redhat.com now.

On Mon, 11 Dec 2023, 19:24 Nate Eldredge,  wrote:

> On Mon, 11 Dec 2023, Nate Eldredge wrote:
>
> > To fix, we need something like `__args._M_old = __val;` inside the loop
> in
> > __atomic_wait_address(), so that we always wait on the exact value that
> the
> > predicate __pred() rejected.  Again, there are similar instances in
> > atomic_timed_wait.h.
>
> Thinking through this, there's another problem.  The main loop in
> __atomic_wait_address() would then be:
>
>while (!__pred(__val))
>  {
>__args._M_old = __val;
>__detail::__wait_impl(__wait_addr, &__args);
>__val = __vfn();
>  }
>
> The idea being that we only call __wait_impl to wait on a value that the
> predicate said was unacceptable.  But looking for instance at its caller
> __atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate
> passed in is _S_do_try_acquire(), whose code is:
>
>  _S_do_try_acquire(__detail::__platform_wait_t* __counter,
>__detail::__platform_wait_t __old) noexcept
>  {
>if (__old == 0)
>  return false;
>
>return __atomic_impl::compare_exchange_strong(__counter,
>  __old, __old - 1,
>  memory_order::acquire,
>
>  memory_order::relaxed);
>  }
>
> It returns false if the value passed in was unacceptable (i.e. zero), *or*
> if it was nonzero (let's say 1) but the compare_exchange failed because
> another thread swooped in and modified the semaphore counter.  In that
> latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which
> is likewise bad.  If the counter is externally changed back to 1 just
> before we call __platform_wait (that's the futex call), we would go to
> sleep waiting on a semaphore that is already available: deadlock.
>
> I guess there's a couple ways to fix it.
>
> You could have the "predicate" callback instead return a tri-state value:
> "all done, stop waiting" (like current true), "value passed is not
> acceptable" (like current false), and "value was acceptable but something
> else went wrong".  Only the second case should result in calling
> __wait_impl().  In the third case, __atomic_wait_address() should
> just reload the value (using __vfn()) and loop again.
>
> Or, make the callback __pred() a pure predicate that only tests its input
> value for acceptability, without any other side effects.  Then have
> __atomic_wait_address() simply return as soon as __pred(__val) returns
> true.  It would be up to the caller to actually decrement the semaphore or
> whatever, and to call __atomic_wait_address() again if this fails.  In
> that case, __atomic_wait_address should probably return the final value
> that was read, so the caller can immediately do something like a
> compare-exchange using it, and not have to do an additional load and
> predicate test.
>
> Or, make __pred() a pure predicate as before, and give
> __atomic_wait_address yet one more callback function argument, call it
> __taker(), whose job is to acquire the semaphore etc, and have
> __atomic_wait_address call it after __pred(__val) returns true.
>
> --
> Nate Eldredge
> n...@thatsmathematics.com
>
>


Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-11 Thread Nate Eldredge

On Mon, 11 Dec 2023, Nate Eldredge wrote:

To fix, we need something like `__args._M_old = __val;` inside the loop in 
__atomic_wait_address(), so that we always wait on the exact value that the 
predicate __pred() rejected.  Again, there are similar instances in 
atomic_timed_wait.h.


Thinking through this, there's another problem.  The main loop in 
__atomic_wait_address() would then be:


  while (!__pred(__val))
{
  __args._M_old = __val;
  __detail::__wait_impl(__wait_addr, &__args);
  __val = __vfn();
}

The idea being that we only call __wait_impl to wait on a value that the 
predicate said was unacceptable.  But looking for instance at its caller 
__atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate 
passed in is _S_do_try_acquire(), whose code is:


_S_do_try_acquire(__detail::__platform_wait_t* __counter,
  __detail::__platform_wait_t __old) noexcept
{
  if (__old == 0)
return false;

  return __atomic_impl::compare_exchange_strong(__counter,
__old, __old - 1,
memory_order::acquire,
memory_order::relaxed);
}

It returns false if the value passed in was unacceptable (i.e. zero), *or* 
if it was nonzero (let's say 1) but the compare_exchange failed because 
another thread swooped in and modified the semaphore counter.  In that 
latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which 
is likewise bad.  If the counter is externally changed back to 1 just 
before we call __platform_wait (that's the futex call), we would go to 
sleep waiting on a semaphore that is already available: deadlock.


I guess there's a couple ways to fix it.

You could have the "predicate" callback instead return a tri-state value: 
"all done, stop waiting" (like current true), "value passed is not 
acceptable" (like current false), and "value was acceptable but something 
else went wrong".  Only the second case should result in calling 
__wait_impl().  In the third case, __atomic_wait_address() should 
just reload the value (using __vfn()) and loop again.


Or, make the callback __pred() a pure predicate that only tests its input 
value for acceptability, without any other side effects.  Then have 
__atomic_wait_address() simply return as soon as __pred(__val) returns 
true.  It would be up to the caller to actually decrement the semaphore or 
whatever, and to call __atomic_wait_address() again if this fails.  In 
that case, __atomic_wait_address should probably return the final value 
that was read, so the caller can immediately do something like a 
compare-exchange using it, and not have to do an additional load and 
predicate test.


Or, make __pred() a pure predicate as before, and give 
__atomic_wait_address yet one more callback function argument, call it 
__taker(), whose job is to acquire the semaphore etc, and have 
__atomic_wait_address call it after __pred(__val) returns true.


--
Nate Eldredge
n...@thatsmathematics.com



Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-11 Thread Nate Eldredge
Ref: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636805.html, 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636804.html


I found a couple of bugs in this patch set.

#1: In atomic_wait.h, we have __wait_flags defined to include:

   __do_spin = 4,
   __spin_only = 8 | __do_spin, // implies __do_spin

So __spin_only is defined as two bits, which breaks when we test `__args & 
__wait_flags::__spin_only` in __wait_impl().  The test evaluates true even 
when we have only set __do_spin (bit 2) without bit 3, which is the 
default at least on Linux and which is supposed to request a limited 
number of spins before calling futex() and sleeping.  You can observe this 
by seeing that a thread blocked on std::counting_semaphore::acquire() 
consumes 100% CPU indefinitely.


There is another instance in atomic_timed_wait.h.

Probably __spin_only should just be 8, and any configuration that wants 
it should be responsible for manually setting __do_spin as well.



#2: __atomic_wait_address() does not populate __args._M_old before passing 
to __wait_impl(), so it remains at its default value 0 (set by the 
constructor).  As a result, `__wait_impl` is effectively always waiting on 
the value 0 (i.e. until `*__addr != 0`) rather than whatever value is 
actually wanted.  This is of course wrong, and can deadlock under the 
right race condition.


To fix, we need something like `__args._M_old = __val;` inside the loop in 
__atomic_wait_address(), so that we always wait on the exact value that 
the predicate __pred() rejected.  Again, there are similar instances in 
atomic_timed_wait.h.


(In the current libstdc++ implementation, the predicate function loads the 
value itself and doesn't return it, so we wait instead on a possibly 
different value that was loaded somewhere else in _S_do_spin().  That is 
also wrong, because the latter value might have been acceptable to the 
predicate, and if so then waiting on it may deadlock.  A classic TOCTOU. 
This is Bug #104928, reported in March 2022 and still unfixed: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928.  The patch proposed 
here would fix it, though.)


--
Nate Eldredge
n...@thatsmathematics.com



Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-11-16 Thread Jonathan Wakely
On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely  wrote:
>
> From: Thomas Rodgers 
>
> These two patches were written by Tom earlier this year, before he left
> Red Hat. We should finish reviewing them for GCC 14 (and probably squash
> them into one?)
>
> Tom, you mentioned further work that changes the __platform_wait_t* to
> uintptr_t, is that ready, or likely to be ready very soon?
>
> Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.

I'm seeing this on AIX and Solaris:

WARNING: program timed out.
FAIL: 30_threads/semaphore/try_acquire.cc  -std=gnu++20 execution test



[PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-11-16 Thread Jonathan Wakely
From: Thomas Rodgers 

These two patches were written by Tom earlier this year, before he left
Red Hat. We should finish reviewing them for GCC 14 (and probably squash
them into one?)

Tom, you mentioned further work that changes the __platform_wait_t* to
uintptr_t, is that ready, or likely to be ready very soon?

Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.


-- >8 --

This represents a major refactoring of the previous atomic::wait
and atomic::notify implementation detail. The aim of this change
is to simplify the implementation details and position the resulting
implementation so that much of the current header-only detail
can be moved into the shared library, while also accounting for
anticipated changes to wait/notify functionality for C++26.

The previous implementation implemented spin logic in terms of
the types __default_spin_policy, __timed_backoff_spin_policy, and
the free function __atomic_spin. These are replaced in favor of
two new free functions; __spin_impl and __spin_until_impl. These
currently inline free functions are expected to be moved into the
libstdc++ shared library in a future commit.

The previous implementation derived untimed and timed wait
implementation detail from __detail::__waiter_pool_base. This
is-a relationship is removed in the new version and the previous
implementation detail is renamed to reflect this change. The
static _S_for member has been renamed as well to indicate that it
returns the __waiter_pool_impl entry in the static 'side table'
for a given awaited address.

This new implementation replaces all of the non-templated waiting
detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
__bare_wait with the __wait_impl free function, and the supporting
__wait_flags enum and __wait_args struct. This currenly inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This new implementation replaces all of the non-templated notifying
detail of __waiter_base, __waiter_pool, and __waiter with the
__notify_impl free function. This currently inline free function
is expected to be moved into the libstdc++ shared library in a
future commit.

The __atomic_wait_address template function is updated to account
for the above changes and to support the expected C++26 change to
pass the most recent observed value to the caller supplied predicate.

A new non-templated __atomic_wait_address_v free function is added
that only works for atomic types that operate only on __platform_wait_t
and requires the caller to supply a memory order. This is intended
to be the simplest code path for such types.

The __atomic_wait_address_v template function is now implemented in
terms of new __atomic_wait_address template and continues to accept
a user supplied "value function" to retrieve the current value of
the atomic.

The __atomic_notify_address template function is updated to account
for the above changes.

The template __platform_wait_until_impl is renamed to
__wait_clock_t. The previous __platform_wait_until template is deleted
and the functionality previously provided is moved t the new tempalate
function __wait_until. A similar change is made to the
__cond_wait_until_impl/__cond_wait_until implementation.

This new implementation similarly replaces all of the non-templated
waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
the new __wait_until_impl free function. This currently inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This implementation replaces all templated waiting functions that
manage clock conversion as well as relative waiting (wait_for) with
the new template functions __wait_until and __wait_for.

Similarly the previous implementation detail for the various
__atomic_wait_address_Xxx templates is adjusted to account for the
implementation changes outlined above.

All of the "bare wait" versions of __atomic_wait_Xxx have been removed
and replaced with a defaulted boolean __bare_wait parameter on the
new version of these templates.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h:
(__detail::__platform_wait_until_impl): Rename to
__platform_wait_until.
(__detail::__platform_wait_until): Remove previous
definition.
(__detail::__cond_wait_until_impl): Rename to
__cond_wait_until.
(__detail::__cond_wait_until): Remove previous
definition.
(__detail::__spin_until_impl): New function.
(__detail::__wait_until_impl): New function.
(__detail::__wait_until): New function.
(__detail::__wait_for): New function.
(__detail::__timed_waiter_pool): Remove type.
(__detail::__timed_backoff_spin_policy): Remove type.
(__detail::__timed_waiter): Remove type.
(__detail::__enters_timed_wait): Remove type alias.
(__detail::__bare_timed_wait): Remove type alias.