[lttng-dev] (Userspace-)RCU based HEAP (priority queue)

2024-02-22 Thread Ondřej Surý via lttng-dev
Hi,

we are just exploring the field, so asking here is probably good idea.

We are currently using bucketed heaps for TTL-based expiration of
DNS records in BIND 9.  This doesn't scale that well and has other
problems, so I was thinking that we might be able to replace this with
something better.

Anyone heard or read about good lock-free / wait-free priority queues
preferably based on URCU? :)

Cheers,
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] TSAN build broken on master branch

2023-09-21 Thread Ondřej Surý via lttng-dev
Hi,

I've updated to the latest HEAD and compilation with clang-16 and clang-17 and

./configure --enable-compiler-atomic-builtins --disable-legacy-mb

It fails with:

rculfhash.c:1189:2: error: address argument to atomic operation must be a 
pointer to integer ('typeof (node_next)' (aka 'struct cds_lfht_node **') 
invalid)
uatomic_or_mo(node_next, REMOVED_FLAG, CMM_RELEASE);
^~~
../include/urcu/uatomic/builtins-generic.h:123:10: note: expanded from macro 
'uatomic_or_mo'
(void) __atomic_or_fetch(cmm_cast_volatile(addr), mask, \
   ^ ~~~
rculfhash.c:1440:3: error: address argument to atomic operation must be a 
pointer to integer ('typeof (fini_bucket_next)' (aka 'struct cds_lfht_node **') 
invalid)
uatomic_or(fini_bucket_next, REMOVED_FLAG);
^~
../include/urcu/uatomic/builtins-generic.h:130:2: note: expanded from macro 
'uatomic_or'
uatomic_or_mo(addr, mask, CMM_RELAXED)
^~
../include/urcu/uatomic/builtins-generic.h:123:10: note: expanded from macro 
'uatomic_or_mo'
(void) __atomic_or_fetch(cmm_cast_volatile(addr), mask, \
   ^ ~~~
2 errors generated.

It works with gcc-12, so it seems that clang has stricter rules here.

Ondřej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] LTTng-UST and SystemTap/DTrace

2023-08-14 Thread Ondřej Surý via lttng-dev
Hi,

we are in process of adding USDT probes to the BIND 9 probes and the obvious 
question is whether it’s possible to use the probes with the LTTng tools.

Alternatively, if there’s an easy way how to use one or another. The USDT 
probes are provided by SystemTap on Linux and by DTrace on BSDs and Solaris, so 
they are more universal for our use, but using perf record feels more 
complicated for us.

Ideally, we would like to give users a choice or at least have a backup plan to 
switch if we pick the wrong technology and it doesn’t work for us or our users.

Any thoughts or experiences?

Ondrej
--
Ondřej Surý  (He/Him)
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH] Avoid calling caa_container_of on NULL pointer in cds_lfhash macros

2023-06-22 Thread Ondřej Surý via lttng-dev
(Sorry, I missed closing brackets in both macros, so resending fixed patch...)

The cds_lfht_for_each_entry and cds_lfht_for_each_entry_duplicate macros
would call caa_container_of() macro on NULL pointer.  This is not a
problem under normal circumstances as the check in the for loop fails
and the loop-statement is not called with invalid (pos) value.

However AddressSanitizer doesn't like that and complains about this:

runtime error: applying non-zero offset 18446744073709551056 to null pointer

Move the cds_lfht_iter_get_node(iter) != NULL from the cond-expression
of the for loop into both init-clause and iteration-expression as
conditional operator and check for (pos) value in the cond-expression
instead.

Signed-off-by: Ondřej Surý 
---
 include/urcu/rculfhash.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/urcu/rculfhash.h b/include/urcu/rculfhash.h
index fbd33cc..64cc18f 100644
--- a/include/urcu/rculfhash.h
+++ b/include/urcu/rculfhash.h
@@ -546,22 +546,22 @@ void cds_lfht_resize(struct cds_lfht *ht, unsigned long 
new_size);
 
 #define cds_lfht_for_each_entry(ht, iter, pos, member) \
for (cds_lfht_first(ht, iter),  \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member);\
-   cds_lfht_iter_get_node(iter) != NULL;   \
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL);
\
+   pos != NULL;\
cds_lfht_next(ht, iter),\
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member))
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL))
 
 #define cds_lfht_for_each_entry_duplicate(ht, hash, match, key,
\
iter, pos, member)  \
for (cds_lfht_lookup(ht, hash, match, key, iter),   \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member);\
-   cds_lfht_iter_get_node(iter) != NULL;   \
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL);
\
+   pos != NULL;\
cds_lfht_next_duplicate(ht, match, key, iter),  \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member))
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL))
 
 #ifdef __cplusplus
 }
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH] Avoid calling caa_container_of on NULL pointer in cds_lfhash macros

2023-06-22 Thread Ondřej Surý via lttng-dev
The cds_lfht_for_each_entry and cds_lfht_for_each_entry_duplicate macros
would call caa_container_of() macro on NULL pointer.  This is not a
problem under normal circumstances as the check in the for loop fails
and the loop-statement is not called with invalid (pos) value.

However AddressSanitizer doesn't like that and complains about this:

runtime error: applying non-zero offset 18446744073709551056 to null pointer

Move the cds_lfht_iter_get_node(iter) != NULL from the cond-expression
of the for loop into both init-clause and iteration-expression as
conditional operator and check for (pos) value in the cond-expression
instead.

Signed-off-by: Ondřej Surý 
---
 include/urcu/rculfhash.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/urcu/rculfhash.h b/include/urcu/rculfhash.h
index fbd33cc..aafc455 100644
--- a/include/urcu/rculfhash.h
+++ b/include/urcu/rculfhash.h
@@ -546,22 +546,22 @@ void cds_lfht_resize(struct cds_lfht *ht, unsigned long 
new_size);
 
 #define cds_lfht_for_each_entry(ht, iter, pos, member) \
for (cds_lfht_first(ht, iter),  \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member);\
-   cds_lfht_iter_get_node(iter) != NULL;   \
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL);
\
+   pos != NULL;\
cds_lfht_next(ht, iter),\
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member))
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL)
 
 #define cds_lfht_for_each_entry_duplicate(ht, hash, match, key,
\
iter, pos, member)  \
for (cds_lfht_lookup(ht, hash, match, key, iter),   \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member);\
-   cds_lfht_iter_get_node(iter) != NULL;   \
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL);
\
+   pos != NULL;\
cds_lfht_next_duplicate(ht, match, key, iter),  \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member))
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL)
 
 #ifdef __cplusplus
 }
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH v2 00/12] Add support for TSAN to liburcu

2023-06-07 Thread Ondřej Surý via lttng-dev
Olivier,

is this somewhere in Gerrit again, please? It’s much easier for us to pull the 
changes from git than cherry-pick patches from the mailing list.

Ondřej
--
Ondřej Surý  (He/Him)

> On 7. 6. 2023, at 20:54, Olivier Dion  wrote:
> 
> This patch set adds support for TSAN in liburcu.
> 
> * Change since v1
> 
> ** Adding CMM_SEQ_CST_FENCE memory order to the CMM memory model
> 
>   The C11 memory model is incompatible with the memory model used by liburcu,
>   since the semantic of the C11 memory model is based on a happen-before
>   (acquire/release) relationship of memory accesses, while liburcu is based on
>   memory barriers and relaxed memory accesses.
> 
>   To circumvent this, a new memory order called CMM_SEQ_CST_FENCE is
>   introduced.  It implies a CMM_SEQ_CST while emitting a thread fence after 
> the
>   operation.  Operations that were documented as emitting memory barriers
>   before and after the operation are implemented now in term of this new 
> memory
>   order to keep compatibility.
> 
>   However, this model is redundant in some cases and the memory orders were
>   changed internally in liburcu to simply use the CMM_SEQ_CST.
> 
> ** Adding cmm_emit_legacy_smp_mb() to queue/stack APIs
> 
>   The queue/stack APIs document implicit memory barriers before or after
>   operation.  These are now optionally emitted only if
>   CONFIG_RCU_EMIT_LEGACY_MB is defined in urcu/config.h or manually defined by
>   the user before including liburcu.  That way, users can opt-in even if the
>   system headers were configured without this feature.
> 
>   However, users can not opt-out of that feature if configured by the system.
> 
> * v1
> 
> ** Here are the major changes
> 
>  - Usage of compiler atomic builtins is added to the uatomic API.  This is
>required for TSAN to understand atomic memory accesses.  If the compiler
>supports such builtins, they are used by default.  User can opt-out and use
>the legacy implementation of the uatomic API by using the
>`--disable-atomic-builtins' configuration option.
> 
>  - The CMM memory model is introduced but yet formalized. It tries to be as
>close as possible to the C11 memory model while offering primitives such as
>cmm_smp_wmb(), cmm_smp_rmb() and cmm_mb() that can't be expressed in it.
>For example, cmm_mb() can be used for ordering memory accesses to MMIO
>devices, which is out of the scope of the C11 memory model.
> 
>  - The CMM annotation layer is a new public API that is highly experimental 
> and
>not guaranteed to be stable at this stage.  It serves the dual purpose of
>verifying local (intra-thread) relaxed atomic accesses ordering with a
>memory barrier and global (inter-thread) relaxed atomic accesses with a
>shared state.  The second purpose is necessary for TSAN to understand 
> memory
>accesses ordering since it does not fully support thread fence yet.
> 
> ** CMM annotation example
> 
>  Consider the following pseudo-code of writer side in synchronize_rcu().  An
>  acquire group is defined on the stack of the writer.  Annotations are made
>  onto the group to ensure ordering of relaxed memory accesses in 
> reader_state()
>  before the memory barrier at the end of synchronize_rcu().  It also helps 
> TSAN
>  to understand that the relaxed accesses in reader_state() act like acquire
>  accesses because of the memory barrier in synchronize_rcu().
> 
>  In other words, the purpose of this annotation is to convert a group of
>  load-acquire memory operations into load-relaxed memory operations followed 
> by
>  a single memory barrier.  This highly benefits weakly ordered architectures 
> by
>  having a constant number of memory barriers instead of being linearly
>  proportional to the number of loads.  This does not benefit TSO
>  architectures.  
> 
> 
> Olivier Dion (12):
>  configure: Add --disable-atomic-builtins option
>  urcu/compiler: Use atomic builtins if configured
>  urcu/arch/generic: Use atomic builtins if configured
>  urcu/system: Use atomic builtins if configured
>  urcu/uatomic: Add CMM memory model
>  urcu-wait: Fix wait state load/store
>  tests: Use uatomic for accessing global states
>  benchmark: Use uatomic for accessing global states
>  tests/unit/test_build: Quiet unused return value
>  urcu/annotate: Add CMM annotation
>  Add cmm_emit_legacy_smp_mb()
>  tests: Add tests for checking race conditions
> 
> README.md   |  11 ++
> configure.ac|  39 
> doc/uatomic-api.md  |   3 +-
> include/Makefile.am |   3 +
> include/urcu/annotate.h | 174 ++
> include/urcu/arch.h |   6 +
> include/urcu/arch/generic.h |  37 
> include/urcu/compiler.h |  22 ++-
> include/urcu/config.h.in|   6 +
> include/urcu/static/lfstack.h   |  25 ++-
> include/urcu/static/pointer.h 

Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu

2023-05-26 Thread Ondřej Surý via lttng-dev

> On 26. 5. 2023, at 8:08, Dmitry Vyukov  wrote:
> 
> On Fri, 26 May 2023 at 07:33, Ondřej Surý  wrote:
>> 
>> A little bit related question/problem - some of our system test jobs got 
>> significantly slower (3x-5x) with **clang-16** TSAN after we integrated 
>> liburcu to BIND 9. The GCC (13.1.1) TSAN doesn’t exhibit this behavior.
>> 
>> The liburcu integration in BIND 9 is very light at the moment - we use 
>> liburcu for qptrie and there’s some usage of wfcqueue for queuing 
>> jobs(callbacks) between threads.
>> 
>> We were not able to put a finger on it yet, but saying this aloud might help 
>> diagnose this.
> 
> Interesting. Clang 16 got a completely new tsan runtime
> implementation, which is generally ~2x faster and consumes less
> memory.
> However, of course, it's not possible to optimize for all gazillion of
> programs in the world. I suspect librcu tests are quite extreme in
> terms of synchronization.
> 
> If you collect perf profiles for both, I can look for some low hanging fruit.
> Or is there a reasonable way for me to reproduce?

Looks like it got all resolved by using the latest liburcu patch set with full
TSAN support. I haven't realized that our CI still runs with whatever liburcu
Debian bullseye have. Sorry for the confusion.

And I can no longer reproduce this locally.

FTR here are some numbers from my local system (13th Gen Intel(R) Core(TM) 
i9-13900K)
running TSAN-enabled patched userspace-rcu[1], libuv 1.45.0 and OpenSSL 3.0.8:

gcc-13 baseline

rpz:passed in 198.71s (0:03:18)
rpzrecurse: passed in 310.62s (0:05:10)
rrsetorder: passed in 200.40s (0:03:20)

gcc-13 TSAN

rpz:passed in 305.64s (0:05:05)
rpzrecurse: passed in 427.73s (0:07:07)
rrsetorder: passed in 260.65s (0:04:20)

clang-16 TSAN

rpz:passed in 290.05s (0:04:50)
rpzrecurse: passed in 468.53s (0:07:48)
rrsetorder: passed in 258.97s (0:04:18)

This looks quite reasonable to me.

1. git fetch https://review.lttng.org/userspace-rcu refs/changes/58/10058/1 && 
git checkout FETCH_HEAD

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu

2023-05-25 Thread Ondřej Surý via lttng-dev
A little bit related question/problem - some of our system test jobs got 
significantly slower (3x-5x) with **clang-16** TSAN after we integrated liburcu 
to BIND 9. The GCC (13.1.1) TSAN doesn’t exhibit this behavior.

The liburcu integration in BIND 9 is very light at the moment - we use liburcu 
for qptrie and there’s some usage of wfcqueue for queuing jobs(callbacks) 
between threads.

We were not able to put a finger on it yet, but saying this aloud might help 
diagnose this.

Ondrej
--
Ondřej Surý  (He/Him)

> On 24. 5. 2023, at 10:15, Dmitry Vyukov via lttng-dev 
>  wrote:
> 
> On Tue, 23 May 2023 at 18:05, Olivier Dion  wrote:
>> 
>> Hi Dmitry,
>> 
>> We do have a new issue and we think it might be a limitation from TSAN.
>> 
>> Find attached a test program that we believe is correct.  You can
>> compile it with `gcc -fsanitize=thread test.c -pthread'.
>> 
>> 
>> TSAN flags a race condition between the atomic store relaxed at line 36
>> and the free at line 81.
>> 
>> We can solve the issue by replacing the atomic store relaxed with a
>> atomic store release.  However, the preceding atomic exchange with
>> sequential consistency already acts as an implicit release (in term of
>> memory barrier) for the following store, making the release semantic
>> redundant.
>> 
>> We've found an alternative to fix this by ignoring the thread during the
>> relaxed store (line 39).  However, what we would like is to annotate the
>> memory stored (line 45).
>> 
>> My theory (I don't know the internal of TSAN much) is that TSAN thinks
>> for some reason that the atomic store relaxed happen at the same epoch
>> as the free, resulting in a false positive.  If so, m
> 
> 
> I don't this this is true in the C/C++ memory model:
> "the preceding atomic exchange with sequential consistency already
> acts as an implicit release (in term of memory barrier) for the
> following store".
> 
> std::atomic_thread_fence does affect all preceding/subsequent
> operations, but an atomic memory operation only affects ordering on
> that variable, it doesn't also serve as a standalone memory fence.
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [RFC] Deprecating RCU signal flavor

2023-05-16 Thread Ondřej Surý via lttng-dev
Hi,

we (ISC) have no intention of using signal based URCU in BIND 9. Our plan 
relies on using the QSBR flavor as BIND 9 is internally event-loop based with 
natural quiescent states.

Ondrej 
--
Ondřej Surý  (He/Him)

> On 10. 5. 2023, at 23:10, Olivier Dion via lttng-dev 
>  wrote:
> 
> Hi all,
> 
> We have the intention of deprecating the urcu-signal flavor in the
> future.  We are asking users of URCU for _feedback_ on this before
> going any further.
> 
> Part of this decision is that we are adding support for TSAN in URCU and
> the signal flavor deadlocks with TSAN.  It is also my understanding that
> the urcu-signal flavor was historically made as a fallback for system
> lacking the membarrier(2) system call.  Nowadays, most systems have
> support for that system call, making the urcu-signal an artifact of the
> past.
> 
> The following is a proposed timeline of events:
> 
>  1. Asking for feedback from users of URCU (this message)
>  2. Disabling the signal flavor by default and adding --enable-flavor-signal
>  3. Removing the signal flavor
> 
> -- 
> Olivier Dion
> EfficiOS Inc.
> https://www.efficios.com
> 
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] RCU API usage from call_rcu callbacks?

2023-03-22 Thread Ondřej Surý via lttng-dev
Hi,

the documentation is pretty silent on this, and asking here is probably going 
to be faster
than me trying to use the source to figure this out.

Is it legal to call_rcu() from within the call_rcu() callback?

What about the other RCU (and CDS) API calls?

How does that interact with create_call_rcu_data()?  I have  event loops and 
I am
initializing  1:1 call_rcu helper threads as I need to do some per-thread 
initialization
as some of the destroy-like functions use random numbers (don't ask).

If it's legal to call_rcu() from call_rcu thread, which thread is going to be 
used?

Thank you,
Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up

2023-03-22 Thread Ondřej Surý via lttng-dev
> On 22. 3. 2023, at 9:02, Ondřej Surý via lttng-dev 
>  wrote:
> 
> That's pretty much weird because the "Write" happens on stack local variable,
> while the "Previous write" happens after futex, which lead me to the fact that
> ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:
> 
> https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4

FTR neither annotating the futex with __tsan_acquire(addr) and 
__tsan_release(addr)
nor falling back to compat_futex_async() for ThreadSanitizer has helped.

It seems to me that TSAN still doesn't understand the synchronization between
RCU read-critical sections and call_rcu/synchronize_rcu() as I am also getting
following reports:

  Write of size 8 at 0x7b5409c0 by thread T102:
#0 __tsan_memset  (badcache_test+0x49257d) (BuildId: 
a7c1595d61e3ee411276cf89a536a8daefa959a3)
#1 mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:324:3 
(libisc-9.19.12-dev.so+0x7d136) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
#2 isc__mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:684:2 
(libisc-9.19.12-dev.so+0x7e0c3) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
#3 bcentry_destroy_rcu /home/ondrej/Projects/bind9/lib/dns/badcache.c:163:2 
(libdns-9.19.12-dev.so+0x4e071) (BuildId: 
8a550b795003cd1075ff29590734c806d84e76e6)
#4 call_rcu_thread 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:389:5 
(liburcu-mb.so.8+0x9d6b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)

  Previous atomic write of size 8 at 0x7b5409c0 by main thread (mutexes: 
write M0):
#0 ___cds_wfcq_append 
/home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:202:2 
(liburcu-mb.so.8+0xa8ae) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#1 _cds_wfcq_enqueue 
/home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:223:9 
(liburcu-mb.so.8+0xac09) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#2 _call_rcu 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:719:2 
(liburcu-mb.so.8+0x604f) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#3 urcu_mb_barrier 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:932:3 
(liburcu-mb.so.8+0x4d1b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#4 badcache_flush /home/ondrej/Projects/bind9/lib/dns/badcache.c:329:2 
(libdns-9.19.12-dev.so+0x4d8b3) (BuildId: 
8a550b795003cd1075ff29590734c806d84e76e6)
[...]

E.g. ThreadSanitizer reports a race between a place where bcentry->rcu_head is 
added to call_rcu() queue
and when call_rcu callbacks are called.  Annotating the bcentry with 
acquire/release here helps with this
particular data race, but it does not feel right to me to add annotation at 
this level.

The code is not very complicated there:

static void
bcentry_destroy_rcu(struct rcu_head *rcu_head) {
dns_bcentry_t *bad = caa_container_of(rcu_head, dns_bcentry_t,
  rcu_head);
/* __tsan_release(bad); <-- this helps */
dns_badcache_t *bc = bad->bc;

isc_mem_put(bc->mctx, bad, sizeof(*bad));

dns_badcache_detach();
}

static void
bcentry_evict(struct cds_lfht *ht, dns_bcentry_t *bad) {
/* There can be multiple deleters now */
if (cds_lfht_del(ht, >ht_node) == 0) {
/* __tsan_acquire(bad); <- this helps */
call_rcu(>rcu_head, bcentry_destroy_rcu);
}
}

static void
badcache_flush(dns_badcache_t *bc, struct cds_lfht *ht) {
struct cds_lfht *oldht = rcu_xchg_pointer(>ht, ht);

synchronize_rcu();

rcu_read_lock();
dns_bcentry_t *bad = NULL;
struct cds_lfht_iter iter;
cds_lfht_for_each_entry (oldht, , bad, ht_node) {
bcentry_evict(oldht, bad);
}
rcu_read_unlock();
rcu_barrier();
RUNTIME_CHECK(cds_lfht_destroy(oldht, NULL) == 0);
}

Any ideas?

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up

2023-03-22 Thread Ondřej Surý via lttng-dev
Hi,

this happens with all the patches fully applied and doesn't seem to be caused 
by anything I am doing :)

WARNING: ThreadSanitizer: data race (pid=3995296)
  Write of size 8 at 0x7fb51e5fd048 by thread T296:
#0 __tsan_memset  (badcache_test+0x49257d) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7)
#1 urcu_mb_synchronize_rcu 
/home/ondrej/Projects/userspace-rcu/src/urcu.c:412:2 (liburcu-mb.so.8+0x35e0) 
(BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#2 call_rcu_thread 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 
(liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)

  Previous atomic write of size 4 at 0x7fb51e5fd048 by thread T295:
#0 urcu_adaptative_wake_up 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:138:2 
(liburcu-mb.so.8+0x8cb9) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#1 urcu_wake_all_waiters 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:214:3 
(liburcu-mb.so.8+0x41de) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#2 urcu_mb_synchronize_rcu 
/home/ondrej/Projects/userspace-rcu/src/urcu.c:522:2 (liburcu-mb.so.8+0x3766) 
(BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#3 call_rcu_thread 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 
(liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)

  Location is stack of thread T296.

  Thread T296 (tid=3995606, running) created by thread T272 at:
#0 pthread_create  (badcache_test+0x44d5fb) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7)
#1 call_rcu_data_init 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 
(liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#2 __create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 
(liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#3 urcu_mb_create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 
(liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 
(libisc-9.19.12-dev.so+0x7a0a0) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
#5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 
(libisc-9.19.12-dev.so+0x77890) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
#6 isc__trampoline_run 
/home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 
(libisc-9.19.12-dev.so+0xaa6be) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)

  Thread T295 (tid=3995605, running) created by thread T261 at:
#0 pthread_create  (badcache_test+0x44d5fb) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7)
#1 call_rcu_data_init 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 
(liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#2 __create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 
(liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#3 urcu_mb_create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 
(liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
#4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 
(libisc-9.19.12-dev.so+0x7a0a0) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
#5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 
(libisc-9.19.12-dev.so+0x77890) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
#6 isc__trampoline_run 
/home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 
(libisc-9.19.12-dev.so+0xaa6be) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)

SUMMARY: ThreadSanitizer: data race 
(/home/ondrej/Projects/bind9/tests/dns/.libs/badcache_test+0x49257d) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7) in __tsan_memset

This is between:
- DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
and
- uatomic_or(>state, URCU_WAIT_TEARDOWN);

That's pretty much weird because the "Write" happens on stack local variable,
while the "Previous write" happens after futex, which lead me to the fact that
ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:

https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4

Oh boy...

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] TSAN and the tests

2023-03-21 Thread Ondřej Surý via lttng-dev
> On 21. 3. 2023, at 15:49, Mathieu Desnoyers  
> wrote:
> 
> Agreed, let's see how it holds up to testing under TSAN. :)

The make check is OK, but any of the make regtest, short_bench or long_bench 
produces
lot of TSAN warnings like these below.

Looks like they mostly come from the tests itself and I will start chopping 
them off when I
have little time if nobody beats me to it meanwhile.

Meanwhile I've isolated the use of cds_lfht for a small hashtable with bad 
"names" used
as SERVFAIL cache which doesn't use reference counting nor LRU, and there was
yet-another custom hashtable in BIND 9, so it was perfect for testing.  I'll 
report what
TSAN likes or doesn't like in the real world app soon(ish).

Ondrej


==
WARNING: ThreadSanitizer: data race (pid=1110834)
  Write of size 4 at 0x5612e63ef380 by main thread:
#0 perftestrun 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:217:9 
(rcutorture_urcu_bp+0xd6278) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#1 perftest 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:255:9 
(rcutorture_urcu_bp+0xd55f5) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#2 main 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 
(rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Previous read of size 4 at 0x5612e63ef380 by thread T24:
#0 rcu_read_perf_test 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:148:9 
(rcutorture_urcu_bp+0xd5f61) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Location is global 'goflag' of size 4 at 0x5612e63ef380 
(rcutorture_urcu_bp+0x147f380)

  Thread T24 (tid=1110872, running) created by main thread at:
#0 pthread_create  (rcutorture_urcu_bp+0x4e5bb) (BuildId: 
7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#1 create_thread 
/home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:188:6
 (rcutorture_urcu_bp+0xd5e6c) (BuildId: 
7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#2 perftest 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:251:3 
(rcutorture_urcu_bp+0xd55b7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#3 main 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 
(rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

SUMMARY: ThreadSanitizer: data race 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:217:9 in 
perftestrun
==
==
WARNING: ThreadSanitizer: data race (pid=1110834)
  Read of size 8 at 0x5612e63ef408 by thread T3:
#0 __smp_thread_id 
/home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7
 (rcutorture_urcu_bp+0xd69d8) (BuildId: 
7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#1 smp_thread_id 
/home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:167:10
 (rcutorture_urcu_bp+0xd67be) (BuildId: 
7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#2 rcu_read_perf_test 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:161:2 
(rcutorture_urcu_bp+0xd6007) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Previous write of size 8 at 0x5612e63ef408 by main thread:
#0 create_thread 
/home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:192:21
 (rcutorture_urcu_bp+0xd5eb7) (BuildId: 
7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#1 perftest 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:251:3 
(rcutorture_urcu_bp+0xd55b7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#2 main 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 
(rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Location is global '__thread_id_map' of size 32768 at 0x5612e63ef3f0 
(rcutorture_urcu_bp+0x147f408)

  Thread T3 (tid=1110851, running) created by main thread at:
#0 pthread_create  (rcutorture_urcu_bp+0x4e5bb) (BuildId: 
7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#1 create_thread 
/home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:188:6
 (rcutorture_urcu_bp+0xd5e6c) (BuildId: 
7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#2 perftest 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:251:3 
(rcutorture_urcu_bp+0xd55b7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
#3 main 
/home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 
(rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

SUMMARY: ThreadSanitizer: data race 
/home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7
 in __smp_thread_id
==
==
WARNING: ThreadSanitizer: data race (pid=1110834)
  Write of size 8 at 0x5612e63ef3f8 by main thread:
#0 wait_thread 

[lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-21 Thread Ondřej Surý via lttng-dev
When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to unsigned long to fix the
following compiler error:

rculfhash.c:1201:2: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
   uatomic_or(>next, REMOVED_FLAG);
   ^
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
   (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
 ^ ~~
rculfhash.c:1444:3: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
   uatomic_or(_bucket->next, REMOVED_FLAG);
   ^~~~
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
   (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
 ^ ~~

This was not a problem before because the way the uatomic_or was
implemented, but now we directly pass the addr to __atomic_or_fetch()
and the compiler doesn't like the implicit conversion from pointer to
pointer to integer.

Signed-off-by: Ondřej Surý 
---
 src/rculfhash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..5292725 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 * Knowing which wins the race will be known after the garbage
 * collection phase, stay tuned!
 */
-   uatomic_or(>next, REMOVED_FLAG);
+   uatomic_or((unsigned long *)>next, REMOVED_FLAG);
/* We performed the (logical) deletion. */
 
/*
@@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned 
long i,
dbg_printf("remove entry: order %lu index %lu hash %lu\n",
   i, j, j);
/* Set the REMOVED_FLAG to freeze the ->next for gc */
-   uatomic_or(_bucket->next, REMOVED_FLAG);
+   uatomic_or((unsigned long *)_bucket->next, REMOVED_FLAG);
_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
}
ht->flavor->read_unlock();
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()

2023-03-21 Thread Ondřej Surý via lttng-dev
> On 21. 3. 2023, at 15:46, Mathieu Desnoyers  
> wrote:
> 
> On 2023-03-21 06:21, Ondřej Surý wrote:
>>> On 20. 3. 2023, at 19:37, Mathieu Desnoyers 
>>>  wrote:
>>> 
>>> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>>>> FIXME: This is experiment that adds explicit memory barrier in the
>>>> free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
>>>> free the resources.
>>>> Signed-off-by: Ondřej Surý 
>>>> ---
>>>>  src/workqueue.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>> diff --git a/src/workqueue.c b/src/workqueue.c
>>>> index 1039d72..f21907f 100644
>>>> --- a/src/workqueue.c
>>>> +++ b/src/workqueue.c
>>>> @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
>>>>   struct urcu_workqueue_completion *completion;
>>>> completion = caa_container_of(ref, struct urcu_workqueue_completion, 
>>>> ref);
>>>> + assert(!urcu_ref_get_unless_zero(>ref));
>>> 
>>> Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of 
>>> some sort ?
>> I guess?
>> My experience with TSAN tells me, that you need some kind of memory barrier 
>> when using acquire-release
>> semantics and you do:
>> if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
>>   /* __ATOMIC_ACQUIRE needed here */
>>free(obj);
>> }
>> we end up using following code in BIND 9:
>> if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
>>free(obj);
>> }
>> So, I am guessing after the change of uatomic_sub_return() to 
>> __ATOMIC_ACQ_REL,
>> this patch should no longer be needed.
> 
> Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL.

Yeah, I think I already did that, but wrote the email before that. 
Nevertheless, my main
point was that it should not be needed anymore.

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 1/7] Require __atomic builtins to build

2023-03-21 Thread Ondřej Surý via lttng-dev
Add autoconf checks for all __atomic builtins that urcu require, and
adjust the gcc and clang versions in the README.md.

Signed-off-by: Ondřej Surý 
---
 README.md| 33 +
 configure.ac | 15 +++
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index ba5bb08..a65a07a 100644
--- a/README.md
+++ b/README.md
@@ -68,30 +68,15 @@ Should also work on:
 
 (more testing needed before claiming support for these OS).
 
-Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or
-better.
-
-The C compiler used needs to support at least C99. The C++ compiler used
-needs to support at least C++11.
-
-The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are
-supported, with the following exceptions:
-
-  - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile
-accesses to offsets in a TLS structure on 32-bit x86. These versions are
-therefore not compatible with `liburcu` on x86 32-bit
-(i386, i486, i586, i686).
-The problem has been reported to the GCC community:
-
-  - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build.
-See 
-  - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins
-support. For ARM this was introduced with GCC 4.4:
-.
-  - Linux aarch64 depends on GCC 5.1 or better because prior versions
-perform unsafe access to deallocated stack.
-
-Clang version 3.0 (based on LLVM 3.0) is supported.
+Linux ARM depends on running a Linux kernel 2.6.15 or better.
+
+The C compiler used needs to support at least C99 and __atomic
+builtins. The C++ compiler used needs to support at least C++11
+and __atomic builtins.
+
+The GCC compiler versions 4.7 or better are supported.
+
+Clang version 3.1 (based on LLVM 3.1) is supported.
 
 Glibc >= 2.4 should work but the older version we test against is
 currently 2.17.
diff --git a/configure.ac b/configure.ac
index 909cf1d..cb7ba18 100644
--- a/configure.ac
+++ b/configure.ac
@@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [
   AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is 
detected.])
 ])
 
+# Require __atomic builtins
+AC_COMPILE_IFELSE(
+   [AC_LANG_PROGRAM(
+   [[int x, y;]],
+   [[__atomic_store_n(, 0, __ATOMIC_RELEASE);
+ __atomic_load_n(, __ATOMIC_CONSUME);
+ y = __atomic_exchange_n(, 1, __ATOMIC_ACQ_REL);
+ __atomic_compare_exchange_n(, , 0, 0, __ATOMIC_ACQ_REL, 
__ATOMIC_CONSUME);
+ __atomic_add_fetch(, 1, __ATOMIC_ACQ_REL);
+ __atomic_sub_fetch(, 1, __ATOMIC_ACQ_REL);
+ __atomic_and_fetch(, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_or_fetch(, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_thread_fence(__ATOMIC_ACQ_REL)]])],
+   [],
+   [AC_MSG_ERROR([The compiler does not support __atomic builtins])])
 
 ## ##
 ## Optional features selection ##
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED

2023-03-21 Thread Ondřej Surý via lttng-dev
Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic
builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and
CMM_STORE_SHARED().

Signed-off-by: Ondřej Surý 
---
 include/urcu/system.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/urcu/system.h b/include/urcu/system.h
index faae390..4302253 100644
--- a/include/urcu/system.h
+++ b/include/urcu/system.h
@@ -26,7 +26,7 @@
  * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
  * before the load.
  */
-#define _CMM_LOAD_SHARED(p)   CMM_ACCESS_ONCE(p)
+#define _CMM_LOAD_SHARED(p)   __atomic_load_n(&(p), __ATOMIC_RELAXED)
 
 /*
  * Load a data from shared memory, doing a cache flush if required.
@@ -42,7 +42,7 @@
  * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should
  * follow the store.
  */
-#define _CMM_STORE_SHARED(x, v)__extension__ ({ CMM_ACCESS_ONCE(x) = 
(v); })
+#define _CMM_STORE_SHARED(x, v)__extension__ ({ __atomic_store_n(&(x), 
v, __ATOMIC_RELAXED); v; })
 
 /*
  * Store v into x, where x is located in shared memory. Performs the
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers with __atomic builtins

2023-03-21 Thread Ondřej Surý via lttng-dev
Instead of a custom code, use the __atomic_thread_fence() builtin to
implement the cmm_mb(), cmm_rmb(), cmm_wmb(), cmm_smp_mb(),
cmm_smp_rmb(), and cmm_smp_wmb() on all architectures, and
cmm_read_barrier_depends() on alpha (otherwise it's still no-op).

family of functions

Signed-off-by: Ondřej Surý 
---
 include/urcu/arch/alpha.h   |  6 +++---
 include/urcu/arch/arm.h | 14 -
 include/urcu/arch/generic.h |  6 +++---
 include/urcu/arch/mips.h|  6 --
 include/urcu/arch/nios2.h   |  2 --
 include/urcu/arch/ppc.h | 25 --
 include/urcu/arch/s390.h|  2 --
 include/urcu/arch/sparc64.h | 13 
 include/urcu/arch/x86.h | 42 +++--
 9 files changed, 9 insertions(+), 107 deletions(-)

diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h
index dc33e28..61687c7 100644
--- a/include/urcu/arch/alpha.h
+++ b/include/urcu/arch/alpha.h
@@ -29,9 +29,9 @@
 extern "C" {
 #endif
 
-#define cmm_mb()   __asm__ __volatile__ ("mb":::"memory")
-#define cmm_wmb()  __asm__ __volatile__ ("wmb":::"memory")
-#define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory")
+#ifndef cmm_read_barrier_depends
+#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_CONSUME)
+#endif
 
 /*
  * On Linux, define the membarrier system call number if not yet available in
diff --git a/include/urcu/arch/arm.h b/include/urcu/arch/arm.h
index 54ca4fa..b3671dc 100644
--- a/include/urcu/arch/arm.h
+++ b/include/urcu/arch/arm.h
@@ -39,20 +39,6 @@ extern "C" {
 /* For backwards compat. */
 #define CONFIG_RCU_ARM_HAVE_DMB 1
 
-/*
- * Issues full system DMB operation.
- */
-#define cmm_mb()   __asm__ __volatile__ ("dmb sy":::"memory")
-#define cmm_rmb()  __asm__ __volatile__ ("dmb sy":::"memory")
-#define cmm_wmb()  __asm__ __volatile__ ("dmb sy":::"memory")
-
-/*
- * Issues DMB operation only to the inner shareable domain.
- */
-#define cmm_smp_mb()   __asm__ __volatile__ ("dmb ish":::"memory")
-#define cmm_smp_rmb()  __asm__ __volatile__ ("dmb ish":::"memory")
-#define cmm_smp_wmb()  __asm__ __volatile__ ("dmb ish":::"memory")
-
 #endif /* URCU_ARCH_ARMV7 */
 
 #include 
diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h
index be6e41e..2715162 100644
--- a/include/urcu/arch/generic.h
+++ b/include/urcu/arch/generic.h
@@ -44,15 +44,15 @@ extern "C" {
  */
 
 #ifndef cmm_mb
-#define cmm_mb()__sync_synchronize()
+#define cmm_mb()   __atomic_thread_fence(__ATOMIC_SEQ_CST)
 #endif
 
 #ifndef cmm_rmb
-#define cmm_rmb()  cmm_mb()
+#define cmm_rmb()  __atomic_thread_fence(__ATOMIC_ACQUIRE)
 #endif
 
 #ifndef cmm_wmb
-#define cmm_wmb()  cmm_mb()
+#define cmm_wmb()  __atomic_thread_fence(__ATOMIC_RELEASE)
 #endif
 
 #define cmm_mc()   cmm_barrier()
diff --git a/include/urcu/arch/mips.h b/include/urcu/arch/mips.h
index ea5b7e9..ffe65c0 100644
--- a/include/urcu/arch/mips.h
+++ b/include/urcu/arch/mips.h
@@ -30,12 +30,6 @@
 extern "C" {
 #endif
 
-#define cmm_mb()   __asm__ __volatile__ (  \
-   "   .setmips2   \n" \
-   "   sync\n" \
-   "   .setmips0   \n" \
-   :::"memory")
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/include/urcu/arch/nios2.h b/include/urcu/arch/nios2.h
index b4f3e50..cd6bdb8 100644
--- a/include/urcu/arch/nios2.h
+++ b/include/urcu/arch/nios2.h
@@ -29,8 +29,6 @@
 extern "C" {
 #endif
 
-#define cmm_mb()   cmm_barrier()
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h
index 791529e..618f79c 100644
--- a/include/urcu/arch/ppc.h
+++ b/include/urcu/arch/ppc.h
@@ -34,31 +34,6 @@ extern "C" {
 /* Include size of POWER5+ L3 cache lines: 256 bytes */
 #define CAA_CACHE_LINE_SIZE256
 
-#ifdef __NO_LWSYNC__
-#define LWSYNC_OPCODE  "sync\n"
-#else
-#define LWSYNC_OPCODE  "lwsync\n"
-#endif
-
-/*
- * Use sync for all cmm_mb/rmb/wmb barriers because lwsync does not
- * preserve ordering of cacheable vs. non-cacheable accesses, so it
- * should not be used to order with respect to MMIO operations.  An
- * eieio+lwsync pair is also not enough for cmm_rmb, because it will
- * order cacheable and non-cacheable memory operations separately---i.e.
- * not the latter against the former.
- */
-#define cmm_mb() __asm__ __volatile__ ("sync":::"memory")
-
-/*
- * lwsync orders loads in cacheable memory with respect to other loads,
- * and stores in cacheable memory with respect to other stores.
- * Therefore, use it for barriers ordering accesses to cacheable memory
- * only.
- */
-#define cmm_smp_rmb()__asm__ __volatile__ (LWSYNC_OPCODE:::"memory")
-#define cmm_smp_wmb()__asm__ __volatile__ (LWSYNC_OPCODE:::"memory")
-
 #define 

[lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-21 Thread Ondřej Surý via lttng-dev
When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to uintptr_t to fix the
compiler error.

Signed-off-by: Ondřej Surý 
---
 src/rculfhash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..863387e 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 * Knowing which wins the race will be known after the garbage
 * collection phase, stay tuned!
 */
-   uatomic_or(>next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)>next, REMOVED_FLAG);
/* We performed the (logical) deletion. */
 
/*
@@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned 
long i,
dbg_printf("remove entry: order %lu index %lu hash %lu\n",
   i, j, j);
/* Set the REMOVED_FLAG to freeze the ->next for gc */
-   uatomic_or(_bucket->next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)_bucket->next, REMOVED_FLAG);
_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
}
ht->flavor->read_unlock();
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier()

2023-03-21 Thread Ondřej Surý via lttng-dev
Use __atomic_signal_fence(__ATOMIC_SEQ_CST) for cmm_barrier(), so
ThreadSanitizer can understand the memory synchronization.

Signed-off-by: Ondřej Surý 
---
 include/urcu/compiler.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..5763378 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -28,7 +28,7 @@
 #define caa_likely(x)  __builtin_expect(!!(x), 1)
 #define caa_unlikely(x)__builtin_expect(!!(x), 0)
 
-#definecmm_barrier()   __asm__ __volatile__ ("" : : : "memory")
+#definecmm_barrier()   __atomic_signal_fence(__ATOMIC_SEQ_CST)
 
 /*
  * Instruct the compiler to perform only a single access to a variable
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-21 Thread Ondřej Surý via lttng-dev
Replace the custom assembly code in include/urcu/uatomic/ with __atomic
builtins provided by C11-compatible compiler.

Signed-off-by: Ondřej Surý 
---
 include/Makefile.am|  16 -
 include/urcu/uatomic.h |  84 +++--
 include/urcu/uatomic/aarch64.h |  41 ---
 include/urcu/uatomic/alpha.h   |  32 --
 include/urcu/uatomic/arm.h |  57 ---
 include/urcu/uatomic/gcc.h |  46 ---
 include/urcu/uatomic/generic.h | 613 ---
 include/urcu/uatomic/hppa.h|  10 -
 include/urcu/uatomic/ia64.h|  41 ---
 include/urcu/uatomic/m68k.h|  44 ---
 include/urcu/uatomic/mips.h|  32 --
 include/urcu/uatomic/nios2.h   |  32 --
 include/urcu/uatomic/ppc.h | 237 
 include/urcu/uatomic/riscv.h   |  44 ---
 include/urcu/uatomic/s390.h| 170 -
 include/urcu/uatomic/sparc64.h |  81 -
 include/urcu/uatomic/tile.h|  41 ---
 include/urcu/uatomic/x86.h | 646 -
 18 files changed, 53 insertions(+), 2214 deletions(-)
 delete mode 100644 include/urcu/uatomic/aarch64.h
 delete mode 100644 include/urcu/uatomic/alpha.h
 delete mode 100644 include/urcu/uatomic/arm.h
 delete mode 100644 include/urcu/uatomic/gcc.h
 delete mode 100644 include/urcu/uatomic/generic.h
 delete mode 100644 include/urcu/uatomic/hppa.h
 delete mode 100644 include/urcu/uatomic/ia64.h
 delete mode 100644 include/urcu/uatomic/m68k.h
 delete mode 100644 include/urcu/uatomic/mips.h
 delete mode 100644 include/urcu/uatomic/nios2.h
 delete mode 100644 include/urcu/uatomic/ppc.h
 delete mode 100644 include/urcu/uatomic/riscv.h
 delete mode 100644 include/urcu/uatomic/s390.h
 delete mode 100644 include/urcu/uatomic/sparc64.h
 delete mode 100644 include/urcu/uatomic/tile.h
 delete mode 100644 include/urcu/uatomic/x86.h

diff --git a/include/Makefile.am b/include/Makefile.am
index ba1fe60..53a28fd 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -59,24 +59,8 @@ nobase_include_HEADERS = \
urcu/syscall-compat.h \
urcu/system.h \
urcu/tls-compat.h \
-   urcu/uatomic/aarch64.h \
-   urcu/uatomic/alpha.h \
urcu/uatomic_arch.h \
-   urcu/uatomic/arm.h \
-   urcu/uatomic/gcc.h \
-   urcu/uatomic/generic.h \
urcu/uatomic.h \
-   urcu/uatomic/hppa.h \
-   urcu/uatomic/ia64.h \
-   urcu/uatomic/m68k.h \
-   urcu/uatomic/mips.h \
-   urcu/uatomic/nios2.h \
-   urcu/uatomic/ppc.h \
-   urcu/uatomic/riscv.h \
-   urcu/uatomic/s390.h \
-   urcu/uatomic/sparc64.h \
-   urcu/uatomic/tile.h \
-   urcu/uatomic/x86.h \
urcu/urcu-bp.h \
urcu/urcu-futex.h \
urcu/urcu.h \
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 2fb5fd4..0327810 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -22,37 +22,59 @@
 #define _URCU_UATOMIC_H
 
 #include 
+#include 
 
-#if defined(URCU_ARCH_X86)
-#include 
-#elif defined(URCU_ARCH_PPC)
-#include 
-#elif defined(URCU_ARCH_S390)
-#include 
-#elif defined(URCU_ARCH_SPARC64)
-#include 
-#elif defined(URCU_ARCH_ALPHA)
-#include 
-#elif defined(URCU_ARCH_IA64)
-#include 
-#elif defined(URCU_ARCH_ARM)
-#include 
-#elif defined(URCU_ARCH_AARCH64)
-#include 
-#elif defined(URCU_ARCH_MIPS)
-#include 
-#elif defined(URCU_ARCH_NIOS2)
-#include 
-#elif defined(URCU_ARCH_TILE)
-#include 
-#elif defined(URCU_ARCH_HPPA)
-#include 
-#elif defined(URCU_ARCH_M68K)
-#include 
-#elif defined(URCU_ARCH_RISCV)
-#include 
-#else
-#error "Cannot build: unrecognized architecture, see ."
-#endif
+#define UATOMIC_HAS_ATOMIC_BYTE
+#define UATOMIC_HAS_ATOMIC_SHORT
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
+
+#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
+
+#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), 
__ATOMIC_SEQ_CST)
+
+#define uatomic_cmpxchg(addr, old, new) \
+   ({  
\
+   __typeof__(*(addr)) __old = old;
\
+   __atomic_compare_exchange_n(addr, &__old, new, 0,   
\
+   __ATOMIC_SEQ_CST, 
__ATOMIC_SEQ_CST);\
+   __old;  
\
+   })
+
+#define uatomic_add_return(addr, v) \
+   __atomic_add_fetch((addr), (v), __ATOMIC_SEQ_CST)
+
+#define uatomic_add(addr, v) \
+   (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_sub_return(addr, v) \
+   __atomic_sub_fetch((addr), (v), __ATOMIC_SEQ_CST)
+
+#define uatomic_sub(addr, v) \
+   (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_and(addr, mask) \
+   (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_or(addr, mask) \
+   (void)__atomic_or_fetch((addr), (mask), 

[lttng-dev] (no subject)

2023-03-21 Thread Ondřej Surý via lttng-dev
This is a second round of the patches after implementing the requested changes
from the first round.

Ondrej

[PATCH 1/7] Require __atomic builtins to build
- no changes

[PATCH 2/7] Use gcc __atomic builtis for 
- the non return macros are now __ATOMIC_RELAXED
- the return macros are now __ATOMIC_SEQ_CST
- the memory barriers are now

[PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier()
- this now uses __atomic_signal_fence() instead of __atomic_thread_fence()

[PATCH 4/7] Replace the internal pointer manipulation with __atomic
- changed the memory ordering to __ATOMIC_SEQ_CST for xchg and cmpxchg 

[PATCH 5/7] Replace the arch-specific memory barriers with __atomic
- dropped the changes to urcu/arch.h
- removed all custom cmm_*() macros from urcu/arch/*.h
- added the generic __atomic implementation to urcu/arch/generic.h

This was it's still possible to override the generics with arch specific macros.

[PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD,STORE}_SHARED
- _CMM_STORE_SHARED and CMM_STORE_SHARED now returns the stored value

[PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in
- no changes

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins

2023-03-21 Thread Ondřej Surý via lttng-dev
Instead of custom code, use the __atomic builtins to implement the
rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and
rcu_assign_pointer().

The rcu_dereference() now relies on CONSUME memory order.

Signed-off-by: Ondřej Surý 
---
 include/urcu/static/pointer.h | 77 +++
 1 file changed, 14 insertions(+), 63 deletions(-)

diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h
index 9e46a57..a549483 100644
--- a/include/urcu/static/pointer.h
+++ b/include/urcu/static/pointer.h
@@ -38,6 +38,8 @@
 extern "C" {
 #endif
 
+#define _rcu_get_pointer(addr) __atomic_load_n(addr, __ATOMIC_CONSUME)
+
 /**
  * _rcu_dereference - reads (copy) a RCU-protected pointer to a local variable
  * into a RCU read-side critical section. The pointer can later be safely
@@ -49,14 +51,6 @@ extern "C" {
  * Inserts memory barriers on architectures that require them (currently only
  * Alpha) and documents which pointers are protected by RCU.
  *
- * With C standards prior to C11/C++11, the compiler memory barrier in
- * CMM_LOAD_SHARED() ensures that value-speculative optimizations (e.g.
- * VSS: Value Speculation Scheduling) does not perform the data read
- * before the pointer read by speculating the value of the pointer.
- * Correct ordering is ensured because the pointer is read as a volatile
- * access. This acts as a global side-effect operation, which forbids
- * reordering of dependent memory operations.
- *
  * With C standards C11/C++11, concerns about dependency-breaking
  * optimizations are taken care of by the "memory_order_consume" atomic
  * load.
@@ -65,10 +59,6 @@ extern "C" {
  * explicit because the pointer used as input argument is a pointer,
  * not an _Atomic type as required by C11/C++11.
  *
- * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of
- * volatile access to implement rcu_dereference rather than
- * memory_order_consume load from the C11/C++11 standards.
- *
  * This may improve performance on weakly-ordered architectures where
  * the compiler implements memory_order_consume as a
  * memory_order_acquire, which is stricter than required by the
@@ -83,35 +73,7 @@ extern "C" {
  * meets the 10-line criterion in LGPL, allowing this function to be
  * expanded directly in non-LGPL code.
  */
-
-#if !defined (URCU_DEREFERENCE_USE_VOLATILE) &&\
-   ((defined (__cplusplus) && __cplusplus >= 201103L) ||   \
-   (defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L))
-# define __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
-#endif
-
-/*
- * If p is const (the pointer itself, not what it points to), using
- * __typeof__(p) would declare a const variable, leading to
- * -Wincompatible-pointer-types errors.  Using the statement expression
- * makes it an rvalue and gets rid of the const-ness.
- */
-#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
-# define _rcu_dereference(p) __extension__ ({  
\
-   __typeof__(__extension__ ({ 
\
-   __typeof__(p) __attribute__((unused)) 
_p0 = { 0 }; \
-   _p0;
\
-   })) _p1;
\
-   __atomic_load(&(p), &_p1, 
__ATOMIC_CONSUME);\
-   (_p1);  
\
-   })
-#else
-# define _rcu_dereference(p) __extension__ ({  
\
-   __typeof__(p) _p1 = CMM_LOAD_SHARED(p); 
\
-   cmm_smp_read_barrier_depends(); 
\
-   (_p1);  
\
-   })
-#endif
+#define _rcu_dereference(p) _rcu_get_pointer(&(p))
 
 /**
  * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer
@@ -126,12 +88,12 @@ extern "C" {
  * meets the 10-line criterion in LGPL, allowing this function to be
  * expanded directly in non-LGPL code.
  */
-#define _rcu_cmpxchg_pointer(p, old, _new) \
-   __extension__   \
-   ({  \
-   __typeof__(*p) _pold = (old);   \
-   __typeof__(*p) _pnew = (_new);  \
-   uatomic_cmpxchg(p, _pold, _pnew);   \
+#define _rcu_cmpxchg_pointer(p, old, _new) 
\
+   ({  
\
+   __typeof__(*(p)) __old = old;   
\
+   

Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()

2023-03-21 Thread Ondřej Surý via lttng-dev
> On 20. 3. 2023, at 19:37, Mathieu Desnoyers  
> wrote:
> 
> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>> FIXME: This is experiment that adds explicit memory barrier in the
>> free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
>> free the resources.
>> Signed-off-by: Ondřej Surý 
>> ---
>>  src/workqueue.c | 1 +
>>  1 file changed, 1 insertion(+)
>> diff --git a/src/workqueue.c b/src/workqueue.c
>> index 1039d72..f21907f 100644
>> --- a/src/workqueue.c
>> +++ b/src/workqueue.c
>> @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
>>   struct urcu_workqueue_completion *completion;
>> completion = caa_container_of(ref, struct urcu_workqueue_completion, 
>> ref);
>> + assert(!urcu_ref_get_unless_zero(>ref));
> 
> Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of 
> some sort ?

I guess?

My experience with TSAN tells me, that you need some kind of memory barrier 
when using acquire-release
semantics and you do:

if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
  /* __ATOMIC_ACQUIRE needed here */
   free(obj);
}

we end up using following code in BIND 9:

if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
   free(obj);
}

So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL,
this patch should no longer be needed.

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-21 Thread Ondřej Surý via lttng-dev

> On 20. 3. 2023, at 19:31, Mathieu Desnoyers  
> wrote:
> 
> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>> When adding REMOVED_FLAG to the pointers in the rculfhash
>> implementation, retype the generic pointer to uintptr_t to fix the
>> compiler error.
> 
> What is the compiler error ? I'm wondering whether the expected choice
> to match the rest of this file's content would be to use "uintptr_t *" or 
> "unsigned long *" ?

This is the error:

rculfhash.c:1201:2: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
uatomic_or(>next, REMOVED_FLAG);
^
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
(void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
  ^ ~~
rculfhash.c:1444:3: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
uatomic_or(_bucket->next, REMOVED_FLAG);
^~~~
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
(void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
  ^ ~~

uintptr_t is defined as "unsigned integer type capable of holding a pointer to 
void" while unsigned long is at least 32-bit;

I guess that works in a practise, but using unsigned long to retype the 
pointers might blow up (thinking of x32 which I know
little about, but it's kind of hybrid architecture, isn't it?)

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-20 Thread Ondřej Surý via lttng-dev

> On 20. 3. 2023, at 19:03, Mathieu Desnoyers  
> wrote:
> 
> In doc/uatomic-api.md, we document:
> 
> "```c
> type uatomic_cmpxchg(type *addr, type old, type new);
> ```
> 
> An atomic read-modify-write operation that performs this
> sequence of operations atomically: check if `addr` contains `old`.
> If true, then replace the content of `addr` by `new`. Return the
> value previously contained by `addr`. This function implies a full
> memory barrier before and after the atomic operation."
> 
> This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
> rather than __ATOMIC_CONSUME".


From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

> If desired is written into *ptr then true is returned and memory is affected 
> according to the memory order specified by success_memorder. There are no 
> restrictions on what memory order can be used here.
> 
> Otherwise, false is returned and memory is affected according to 
> failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor 
> __ATOMIC_ACQ_REL. It also cannot be a stronger order than that specified by 
> success_memorder.

I think it makes sense that the failure_memorder has the same memorder as 
uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - it's same as 
with __atomic_load_n, only following are permitted:

> The valid memory order variants are __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, 
> __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME.

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED

2023-03-17 Thread Ondřej Surý via lttng-dev
Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic
builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and
CMM_STORE_SHARED().

Signed-off-by: Ondřej Surý 
---
 include/urcu/system.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/urcu/system.h b/include/urcu/system.h
index faae390..99e7443 100644
--- a/include/urcu/system.h
+++ b/include/urcu/system.h
@@ -26,7 +26,7 @@
  * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
  * before the load.
  */
-#define _CMM_LOAD_SHARED(p)   CMM_ACCESS_ONCE(p)
+#define _CMM_LOAD_SHARED(p)   __atomic_load_n(&(p), __ATOMIC_RELAXED)
 
 /*
  * Load a data from shared memory, doing a cache flush if required.
@@ -42,7 +42,7 @@
  * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should
  * follow the store.
  */
-#define _CMM_STORE_SHARED(x, v)__extension__ ({ CMM_ACCESS_ONCE(x) = 
(v); })
+#define _CMM_STORE_SHARED(x, v)__atomic_store_n(&(x), (v), 
__ATOMIC_RELAXED)
 
 /*
  * Store v into x, where x is located in shared memory. Performs the
@@ -51,9 +51,8 @@
 #define CMM_STORE_SHARED(x, v) \
__extension__   \
({  \
-   __typeof__(x) _v = _CMM_STORE_SHARED(x, v); \
+   _CMM_STORE_SHARED(x, v);\
cmm_smp_wmc();  \
-   _v = _v;/* Work around clang "unused result" */ \
})
 
 #endif /* _URCU_SYSTEM_H */
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()

2023-03-17 Thread Ondřej Surý via lttng-dev
FIXME: This is experiment that adds explicit memory barrier in the
free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
free the resources.

Signed-off-by: Ondřej Surý 
---
 src/workqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/workqueue.c b/src/workqueue.c
index 1039d72..f21907f 100644
--- a/src/workqueue.c
+++ b/src/workqueue.c
@@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
struct urcu_workqueue_completion *completion;
 
completion = caa_container_of(ref, struct urcu_workqueue_completion, 
ref);
+   assert(!urcu_ref_get_unless_zero(>ref));
free(completion);
 }
 
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-17 Thread Ondřej Surý via lttng-dev
When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to uintptr_t to fix the
compiler error.

Signed-off-by: Ondřej Surý 
---
 src/rculfhash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..863387e 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 * Knowing which wins the race will be known after the garbage
 * collection phase, stay tuned!
 */
-   uatomic_or(>next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)>next, REMOVED_FLAG);
/* We performed the (logical) deletion. */
 
/*
@@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned 
long i,
dbg_printf("remove entry: order %lu index %lu hash %lu\n",
   i, j, j);
/* Set the REMOVED_FLAG to freeze the ->next for gc */
-   uatomic_or(_bucket->next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)_bucket->next, REMOVED_FLAG);
_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
}
ht->flavor->read_unlock();
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins

2023-03-17 Thread Ondřej Surý via lttng-dev
Instead of custom code, use the __atomic builtins to implement the
rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and
rcu_assign_pointer().

Signed-off-by: Ondřej Surý 
---
 include/urcu/arch.h   | 20 +
 include/urcu/arch/alpha.h |  6 +++
 include/urcu/arch/arm.h   | 12 ++
 include/urcu/arch/mips.h  |  2 +
 include/urcu/arch/nios2.h |  2 +
 include/urcu/arch/ppc.h   |  6 +++
 include/urcu/arch/s390.h  |  2 +
 include/urcu/arch/sparc64.h   |  6 +++
 include/urcu/arch/x86.h   | 20 +
 include/urcu/static/pointer.h | 77 +++
 10 files changed, 90 insertions(+), 63 deletions(-)

diff --git a/include/urcu/arch.h b/include/urcu/arch.h
index d3914da..aec6fa1 100644
--- a/include/urcu/arch.h
+++ b/include/urcu/arch.h
@@ -21,6 +21,26 @@
 #ifndef _URCU_ARCH_H
 #define _URCU_ARCH_H
 
+#if !defined(__has_feature)
+#define __has_feature(x) 0
+#endif /* if !defined(__has_feature) */
+
+/* GCC defines __SANITIZE_ADDRESS__, so reuse the macro for clang */
+#if __has_feature(address_sanitizer)
+#define __SANITIZE_ADDRESS__ 1
+#endif /* if __has_feature(address_sanitizer) */
+
+#ifdef __SANITIZE_THREAD__
+/* FIXME: Somebody who understands the barriers should look into this */
+#define cmm_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)
+#define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)
+#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#endif
+
 /*
  * Architecture detection using compiler defines.
  *
diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h
index dc33e28..84526ef 100644
--- a/include/urcu/arch/alpha.h
+++ b/include/urcu/arch/alpha.h
@@ -29,9 +29,15 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()   __asm__ __volatile__ ("mb":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()  __asm__ __volatile__ ("wmb":::"memory")
+#endif
+#ifndef cmm_read_barrier_depends
 #define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory")
+#endif
 
 /*
  * On Linux, define the membarrier system call number if not yet available in
diff --git a/include/urcu/arch/arm.h b/include/urcu/arch/arm.h
index 54ca4fa..4950e13 100644
--- a/include/urcu/arch/arm.h
+++ b/include/urcu/arch/arm.h
@@ -42,16 +42,28 @@ extern "C" {
 /*
  * Issues full system DMB operation.
  */
+#ifndef cmm_mb
 #define cmm_mb()   __asm__ __volatile__ ("dmb sy":::"memory")
+#endif
+#ifndef cmm_rmb
 #define cmm_rmb()  __asm__ __volatile__ ("dmb sy":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()  __asm__ __volatile__ ("dmb sy":::"memory")
+#endif
 
 /*
  * Issues DMB operation only to the inner shareable domain.
  */
+#ifndef cmm_smp_mb
 #define cmm_smp_mb()   __asm__ __volatile__ ("dmb ish":::"memory")
+#endif
+#ifndef cmm_smp_rmb
 #define cmm_smp_rmb()  __asm__ __volatile__ ("dmb ish":::"memory")
+#endif
+#ifndef cmm_smp_wmb
 #define cmm_smp_wmb()  __asm__ __volatile__ ("dmb ish":::"memory")
+#endif
 
 #endif /* URCU_ARCH_ARMV7 */
 
diff --git a/include/urcu/arch/mips.h b/include/urcu/arch/mips.h
index ea5b7e9..b9ee021 100644
--- a/include/urcu/arch/mips.h
+++ b/include/urcu/arch/mips.h
@@ -30,11 +30,13 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()   __asm__ __volatile__ (  \
"   .setmips2   \n" \
"   sync\n" \
"   .setmips0   \n" \
:::"memory")
+#endif
 
 #ifdef __cplusplus
 }
diff --git a/include/urcu/arch/nios2.h b/include/urcu/arch/nios2.h
index b4f3e50..5def45c 100644
--- a/include/urcu/arch/nios2.h
+++ b/include/urcu/arch/nios2.h
@@ -29,7 +29,9 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()   cmm_barrier()
+#endif
 
 #ifdef __cplusplus
 }
diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h
index 791529e..b8ec40d 100644
--- a/include/urcu/arch/ppc.h
+++ b/include/urcu/arch/ppc.h
@@ -48,7 +48,9 @@ extern "C" {
  * order cacheable and non-cacheable memory operations separately---i.e.
  * not the latter against the former.
  */
+#ifndef cmm_mb
 #define cmm_mb() __asm__ __volatile__ ("sync":::"memory")
+#endif
 
 /*
  * lwsync orders loads in cacheable memory with respect to other loads,
@@ -56,8 +58,12 @@ extern "C" {
  * Therefore, use it for barriers ordering accesses to cacheable memory
  * only.
  */
+#ifndef cmm_smp_rmb
 #define cmm_smp_rmb()__asm__ __volatile__ (LWSYNC_OPCODE:::"memory")
+#endif
+#ifndef cmm_smp_rmb
 #define cmm_smp_wmb()__asm__ __volatile__ 

[lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()

2023-03-17 Thread Ondřej Surý via lttng-dev
Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so
ThreadSanitizer can understand the memory synchronization.

FIXME: What should be the correct memory ordering here?

Signed-off-by: Ondřej Surý 
---
 include/urcu/compiler.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..ede909f 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -28,7 +28,8 @@
 #define caa_likely(x)  __builtin_expect(!!(x), 1)
 #define caa_unlikely(x)__builtin_expect(!!(x), 0)
 
-#definecmm_barrier()   __asm__ __volatile__ ("" : : : "memory")
+/* FIXME: What would be a correct memory ordering here? */
+#definecmm_barrier()   __atomic_signal_fence(__ATOMIC_ACQ_REL)
 
 /*
  * Instruct the compiler to perform only a single access to a variable
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-17 Thread Ondřej Surý via lttng-dev
Replace the custom assembly code in include/urcu/uatomic/ with __atomic
builtins provided by C11-compatible compiler.

Signed-off-by: Ondřej Surý 
---
 include/Makefile.am|  16 -
 include/urcu/uatomic.h |  84 +++--
 include/urcu/uatomic/aarch64.h |  41 ---
 include/urcu/uatomic/alpha.h   |  32 --
 include/urcu/uatomic/arm.h |  57 ---
 include/urcu/uatomic/gcc.h |  46 ---
 include/urcu/uatomic/generic.h | 613 ---
 include/urcu/uatomic/hppa.h|  10 -
 include/urcu/uatomic/ia64.h|  41 ---
 include/urcu/uatomic/m68k.h|  44 ---
 include/urcu/uatomic/mips.h|  32 --
 include/urcu/uatomic/nios2.h   |  32 --
 include/urcu/uatomic/ppc.h | 237 
 include/urcu/uatomic/riscv.h   |  44 ---
 include/urcu/uatomic/s390.h| 170 -
 include/urcu/uatomic/sparc64.h |  81 -
 include/urcu/uatomic/tile.h|  41 ---
 include/urcu/uatomic/x86.h | 646 -
 18 files changed, 53 insertions(+), 2214 deletions(-)
 delete mode 100644 include/urcu/uatomic/aarch64.h
 delete mode 100644 include/urcu/uatomic/alpha.h
 delete mode 100644 include/urcu/uatomic/arm.h
 delete mode 100644 include/urcu/uatomic/gcc.h
 delete mode 100644 include/urcu/uatomic/generic.h
 delete mode 100644 include/urcu/uatomic/hppa.h
 delete mode 100644 include/urcu/uatomic/ia64.h
 delete mode 100644 include/urcu/uatomic/m68k.h
 delete mode 100644 include/urcu/uatomic/mips.h
 delete mode 100644 include/urcu/uatomic/nios2.h
 delete mode 100644 include/urcu/uatomic/ppc.h
 delete mode 100644 include/urcu/uatomic/riscv.h
 delete mode 100644 include/urcu/uatomic/s390.h
 delete mode 100644 include/urcu/uatomic/sparc64.h
 delete mode 100644 include/urcu/uatomic/tile.h
 delete mode 100644 include/urcu/uatomic/x86.h

diff --git a/include/Makefile.am b/include/Makefile.am
index ba1fe60..53a28fd 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -59,24 +59,8 @@ nobase_include_HEADERS = \
urcu/syscall-compat.h \
urcu/system.h \
urcu/tls-compat.h \
-   urcu/uatomic/aarch64.h \
-   urcu/uatomic/alpha.h \
urcu/uatomic_arch.h \
-   urcu/uatomic/arm.h \
-   urcu/uatomic/gcc.h \
-   urcu/uatomic/generic.h \
urcu/uatomic.h \
-   urcu/uatomic/hppa.h \
-   urcu/uatomic/ia64.h \
-   urcu/uatomic/m68k.h \
-   urcu/uatomic/mips.h \
-   urcu/uatomic/nios2.h \
-   urcu/uatomic/ppc.h \
-   urcu/uatomic/riscv.h \
-   urcu/uatomic/s390.h \
-   urcu/uatomic/sparc64.h \
-   urcu/uatomic/tile.h \
-   urcu/uatomic/x86.h \
urcu/urcu-bp.h \
urcu/urcu-futex.h \
urcu/urcu.h \
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 2fb5fd4..4dbef4c 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -22,37 +22,59 @@
 #define _URCU_UATOMIC_H
 
 #include 
+#include 
 
-#if defined(URCU_ARCH_X86)
-#include 
-#elif defined(URCU_ARCH_PPC)
-#include 
-#elif defined(URCU_ARCH_S390)
-#include 
-#elif defined(URCU_ARCH_SPARC64)
-#include 
-#elif defined(URCU_ARCH_ALPHA)
-#include 
-#elif defined(URCU_ARCH_IA64)
-#include 
-#elif defined(URCU_ARCH_ARM)
-#include 
-#elif defined(URCU_ARCH_AARCH64)
-#include 
-#elif defined(URCU_ARCH_MIPS)
-#include 
-#elif defined(URCU_ARCH_NIOS2)
-#include 
-#elif defined(URCU_ARCH_TILE)
-#include 
-#elif defined(URCU_ARCH_HPPA)
-#include 
-#elif defined(URCU_ARCH_M68K)
-#include 
-#elif defined(URCU_ARCH_RISCV)
-#include 
-#else
-#error "Cannot build: unrecognized architecture, see ."
-#endif
+#define UATOMIC_HAS_ATOMIC_BYTE
+#define UATOMIC_HAS_ATOMIC_SHORT
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
+
+#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
+
+#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), 
__ATOMIC_ACQ_REL)
+
+#define uatomic_cmpxchg(addr, old, new) \
+   ({  
\
+   __typeof__(*(addr)) __old = old;
\
+   __atomic_compare_exchange_n(addr, &__old, new, 0,   
\
+   __ATOMIC_ACQ_REL, 
__ATOMIC_CONSUME);\
+   __old;  
\
+   })
+
+#define uatomic_add_return(addr, v) \
+   __atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_add(addr, v) \
+   (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_sub_return(addr, v) \
+   __atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_sub(addr, v) \
+   (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_and(addr, mask) \
+   (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_or(addr, mask) \
+   (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
+

[lttng-dev] [PATCH 1/7] Require __atomic builtins to build

2023-03-17 Thread Ondřej Surý via lttng-dev
Add autoconf checks for all __atomic builtins that urcu require, and
adjust the gcc and clang versions in the README.md.

Signed-off-by: Ondřej Surý 
---
 README.md| 33 +
 configure.ac | 15 +++
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index ba5bb08..a65a07a 100644
--- a/README.md
+++ b/README.md
@@ -68,30 +68,15 @@ Should also work on:
 
 (more testing needed before claiming support for these OS).
 
-Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or
-better.
-
-The C compiler used needs to support at least C99. The C++ compiler used
-needs to support at least C++11.
-
-The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are
-supported, with the following exceptions:
-
-  - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile
-accesses to offsets in a TLS structure on 32-bit x86. These versions are
-therefore not compatible with `liburcu` on x86 32-bit
-(i386, i486, i586, i686).
-The problem has been reported to the GCC community:
-
-  - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build.
-See 
-  - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins
-support. For ARM this was introduced with GCC 4.4:
-.
-  - Linux aarch64 depends on GCC 5.1 or better because prior versions
-perform unsafe access to deallocated stack.
-
-Clang version 3.0 (based on LLVM 3.0) is supported.
+Linux ARM depends on running a Linux kernel 2.6.15 or better.
+
+The C compiler used needs to support at least C99 and __atomic
+builtins. The C++ compiler used needs to support at least C++11
+and __atomic builtins.
+
+The GCC compiler versions 4.7 or better are supported.
+
+Clang version 3.1 (based on LLVM 3.1) is supported.
 
 Glibc >= 2.4 should work but the older version we test against is
 currently 2.17.
diff --git a/configure.ac b/configure.ac
index 909cf1d..cb7ba18 100644
--- a/configure.ac
+++ b/configure.ac
@@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [
   AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is 
detected.])
 ])
 
+# Require __atomic builtins
+AC_COMPILE_IFELSE(
+   [AC_LANG_PROGRAM(
+   [[int x, y;]],
+   [[__atomic_store_n(, 0, __ATOMIC_RELEASE);
+ __atomic_load_n(, __ATOMIC_CONSUME);
+ y = __atomic_exchange_n(, 1, __ATOMIC_ACQ_REL);
+ __atomic_compare_exchange_n(, , 0, 0, __ATOMIC_ACQ_REL, 
__ATOMIC_CONSUME);
+ __atomic_add_fetch(, 1, __ATOMIC_ACQ_REL);
+ __atomic_sub_fetch(, 1, __ATOMIC_ACQ_REL);
+ __atomic_and_fetch(, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_or_fetch(, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_thread_fence(__ATOMIC_ACQ_REL)]])],
+   [],
+   [AC_MSG_ERROR([The compiler does not support __atomic builtins])])
 
 ## ##
 ## Optional features selection ##
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins

2023-03-17 Thread Ondřej Surý via lttng-dev
Hi,

(this is my first time using git send-email, so I apologise in advance if
anything breaks).

Here's my attempt to convert the Userspace RCU to use __atomic builtins whenever
possible instead of custom assembly.

The __atomic builtins were first introduced in gcc 4.7.0 and clang 3.1.0.

Apart from simplifying the code, this should also help ThreadSanitizer to
understand the memory synchronization and report less (or no) warnings.

The code compiles and the tests passed (on amd64).

This is by no means complete, and most probably I missed or misunderstood
something, but it's a solid start, so I am submitting the patch set for
discussion and review.

Thanks,
--
Ondřej Surý 

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Ondřej Surý via lttng-dev
> On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
> wrote:
> 
> I would indeed like to remove all the custom atomics assembly code from 
> liburcu now that there are good atomics support in the major compilers (gcc 
> and clang). 

Here's very preliminary implementation:

https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/2

I just did something wrong somewhere along the path and it doesn't compile now,
but it did for me locally.

I am submitting this now as it's 18:00 Friday evening and my kids are starting 
to
be angry at me :).

This will need some more work - I think some of the cmm_ macros might be dropped
now, and somebody who does that more often than I should take a look at the 
memory
orderings.

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Ondřej Surý via lttng-dev
> On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
> wrote:
> 
> Sure, can you please submit the patch as a separate email with subject/commit 
> message/signed-off-by tag ?


https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/1.patch

Would this work for you?

Or do you need to have the patch attached?

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-14 Thread Ondřej Surý via lttng-dev
Hey,

we use ThreadSanitizer in BIND 9 CI quite extensively and with userspace-rcu
it's lit like an American house on Saturnalia ;).

I have two questions:

1. I've tried to help TSAN by replacing the custom atomics with __atomic gcc
  primitives - that seems to work pretty well.  Note that using C11 stdatomics
  is frankly not possible here because it would require wrapping everything into
  _Atomic().

  Do you want me to contribute this back? And how should I plug this into the
  existing structure?  This touches:

include/urcu/static/pointer.h
include/urcu/system.h
include/urcu/uatomic.h


2. I know there's KTSAN, so it must work somehow, but was there any success
  on using ThreadSanitizer on projects using Userspace-RCU?  It mostly seems
  to highlight the CDS parts of the code.

I can help TSAN to understand some of the code or suppress some of the warnings,
but I do want to prevent the code to be full of stuff like this:

static void
destroy_adbname_rcu_head(struct rcu_head *rcu_head) {
dns_adbname_t *adbname = caa_container_of(rcu_head, dns_adbname_t,
  rcu_head);

#ifdef __SANITIZE_THREAD__
SPINLOCK(>lock);
SPINUNLOCK(>lock);
#endif

destroy_adbname(adbname);
}

I am absolutely sure that the adbname can be destroyed here (because of the
reference counting), but TSAN had a problem with it. Doing the "fake" barrier
with a spinlock here made it stop consider this to be a data race.

I also had to disable the auto_resize of cds_lfht when running under TSAN.

I am also worried that by hiding some code from TSAN, we might miss a legitimate
error.

All I found using Google was this notice from 2014:
https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg05121.html

and perhaps this:
https://github.com/google/sanitizers/issues/1415

(Perhaps, I should look into annotating urcu code with TSAN annotations?)



3. As an extra bonus, this is going to be needed with clang-17 as noreturn is 
now
reserved word:

diff --git a/include/urcu/uatomic/generic.h b/include/urcu/uatomic/generic.h
index 89d1cfa..c3762b0 100644
--- a/include/urcu/uatomic/generic.h
+++ b/include/urcu/uatomic/generic.h
@@ -38,7 +38,7 @@ extern "C" {
 #endif

 #if !defined __OPTIMIZE__  || defined UATOMIC_NO_LINK_ERROR
-static inline __attribute__((always_inline, noreturn))
+static inline __attribute__((always_inline, __noreturn__))
 void _uatomic_link_error(void)
 {
 #ifdef ILLEGAL_INSTR
diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h
index 187727e..cc76f53 100644
--- a/src/urcu-call-rcu-impl.h
+++ b/src/urcu-call-rcu-impl.h
@@ -1055,7 +1055,7 @@ void urcu_register_rculfhash_atfork(struct urcu_atfork 
*atfork)
  * This unregistration function is deprecated, meant only for internal
  * use by rculfhash.
  */
-__attribute__((noreturn))
+__attribute__((__noreturn__))
 void urcu_unregister_rculfhash_atfork(struct urcu_atfork *atfork 
__attribute__((unused)))
 {
urcu_die(EPERM);


Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] urcu/rculist.h clarifications - for implementing LRU

2023-03-14 Thread Ondřej Surý via lttng-dev
Hi Mathieu,


> On 13. 3. 2023, at 21:02, Mathieu Desnoyers  
> wrote:
> 
> On 2023-03-13 11:30, Ondřej Surý wrote:
>> Hi Matthieu,
>> I spent some more time with the userspace-rcu on Friday and over weekend and
>> now I am in much better place.
>>> On 13. 3. 2023, at 15:29, Mathieu Desnoyers 
>>>  wrote:
>>> 
>>> On 2023-03-11 01:04, Ondřej Surý via lttng-dev wrote:
>>>> Hey,
>>>> so, we are integrating userspace-rcu to BIND 9 (yay!) and as experiment,
>>>> I am rewriting the internal address database (keeps the infrastructure
>>>> information about names and addresses).
>>> 
>>> That's indeed very interesting !
>> Thanks for the userspace-rcu! It saves a lot of time - while my colleague 
>> Tony Finch
>> already wrote our internal QSBR implementation from scratch, it would be 
>> waste of
>> time to try to reimplement the CDS part of the library.
>> This is part of larger work to replace the internal BIND 9 database that's 
>> currently
>> implemented as rwlocked RBT with qptrie, if you are interested Tony has good
>> summary here: https://dotat.at/@/2023-02-28-qp-bind.html
> 
> Speaking of tries, I have implemented RCU Judy arrays in liburcu feature 
> branches a
> while back. Those never made it to the liburcu master branch because I had no 
> real-life
> use for those so far, and I did not want to expose a public API that would 
> bitrot without
> real-life user feedback.
> 
> The lookups and ordered traversals (next/prev) are entirely RCU, and updates 
> are
> either single-threaded, or use a strategy where locking is distributed within
> the trie so updates to data spatially discontinuous would not contend with 
> each other.
> 
> My original implementation supported integer keys as well as variable-length 
> string keys.
> 
> The advantage of Judy arrays is that it minimizes the number of cache-lines 
> touched
> on lookup traversal. Let me know if this would be useful for your use-cases, 
> and if
> so I can provide links to prototype branches.

I don't think we are going to use this right away, but I would be happy to look 
at the branch.

I've looked at the Judy arrays before, but the normal implementation alone made 
my brain
hurt, so RCU Judy arrays seems like something I would like to look at even if 
we are not
going to use the Judy arrays right now or ever.

>>> I suspect you use a scheme where you hold the RCU read-side to perform the 
>>> lookup, and
>>> then you use the object with an internal lock held. But expecting the 
>>> object to still
>>> exist after rcu read unlock is incorrect, unless some other reference 
>>> counting scheme
>>> is used.
>> Yeah, I was trying to minimize the sections where we hold the rcu_read 
>> locks, but I gave
>> up and now there's rcu_read lock held for longer periods of time.
> 
> We've used that kind of scheme in LTTng lttng-relayd, where we use RCU for 
> short-term
> existence guarantee, and reference counting for longer-term existence 
> guarantee. An
> example can be found here:
> 
> https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-relayd/viewer-stream.cpp
> 
> viewer_stream_get_by_id() attempts lookup from the hash table, and 
> re-validates that the
> object exists with viewer_stream_get(), which checks if the refcount is 
> already 0 as it
> tries to increment it with urcu_ref_get_unless_zero(). If zero, it does as if 
> the object
> was not found. I recommend this kind of scheme if you intend to use both RCU 
> and reference
> counting.
> 
> Then you can place a mutex within the object, and use that mutex to provide 
> mutual
> exclusion between concurrent accesses to the object that need to be 
> serialized.
> 
> In the destroy handler (called when the reference count reaches 0), you will 
> typically
> want to unlink your object from the various data structures holding 
> references to it
> (hash tables, lists), and then use call_rcu() to invoke reclaim of the object 
> after a
> grace period.

Thanks, the description looks very similar to what we do. Only a special flag 
is used
to mark the name/entry as DEAD - it's possible to force-expire an entry from 
the cache,
so reference counting alone would not be sufficient.

I'll take a look at the lltng-tools code. Thank you.

>>>> if (NAME_DEAD(adbname)) {
>>>> UNLOCK(>lock);
>>>> dns_adbname_detach();
>>>> goto again;
>>>> }
>>>> if (adbname->last_used + ADB_CACHE_MINIMUM <= last_update) {
>>>>

Re: [lttng-dev] urcu/rculist.h clarifications - for implementing LRU

2023-03-13 Thread Ondřej Surý via lttng-dev
Hi Matthieu,

I spent some more time with the userspace-rcu on Friday and over weekend and
now I am in much better place.

> On 13. 3. 2023, at 15:29, Mathieu Desnoyers  
> wrote:
> 
> On 2023-03-11 01:04, Ondřej Surý via lttng-dev wrote:
>> Hey,
>> so, we are integrating userspace-rcu to BIND 9 (yay!) and as experiment,
>> I am rewriting the internal address database (keeps the infrastructure
>> information about names and addresses).
> 
> That's indeed very interesting !

Thanks for the userspace-rcu! It saves a lot of time - while my colleague Tony 
Finch
already wrote our internal QSBR implementation from scratch, it would be waste 
of
time to try to reimplement the CDS part of the library.

This is part of larger work to replace the internal BIND 9 database that's 
currently
implemented as rwlocked RBT with qptrie, if you are interested Tony has good
summary here: https://dotat.at/@/2023-02-28-qp-bind.html

>> There's a hashtable to keep the entries and there's associated LRU list.
>> For the hashtable the cds_lfht seems to work well, but I am kind of 
>> struggling
>> with cds_list (the urcu/rculist.h variant).
>> The names and entries works in pretty much similar way, so I am going to
>> describe just one.
>> The workhorse is get_attached_and_locked_name() function (I am going to
>> skip the parts where we create keys, checks if LRU needs to be updated, etc.)
> 
> It would help if you could share a git branch of your prototype. In order to 
> reason
> about RCU, we typically need to look at both the update-side and the 
> read-side.
> For instance I don't see the read-side of the LRU linked-list in the code 
> snippets
> below. We also need to have a complete picture of the object lifetime, from 
> allocation
> to reclaim/reuse. I don't see where the grace periods (either synchronize_rcu 
> or
> call_rcu) are before reclaim or reuse in the code snippets below.

Sure, happy to, but I didn't really wanted to ask you to look directly at the 
code, and
I don't *expect* direct help (although any help is surely appreciated).

So, please bear in mind this is far from complete and the commits are utter mess
so far: https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/7680

>> So this is part with the hashtable lookup which seems to work well:
>> rcu_read_lock();
>> struct cds_lfht_iter iter;
>> struct cds_lfht_node *ht_node;
>> cds_lfht_lookup(adb->names_ht, hashval, names_match, , );
>> ht_node = cds_lfht_iter_get_node();
>> bool unlink = false;
>> if (ht_node == NULL) {
>> /* Allocate a new name and add it to the hash table. */
>> adbname = new_adbname(adb, name, start_at_zone);
>> ht_node = cds_lfht_add_unique(adb->names_ht, hashval,
>>   names_match, ,
>>   >ht_node);
>> if (ht_node != >ht_node) {
>> /* ISC_R_EXISTS */
>> destroy_adbname(adbname);
>> adbname = NULL;
>> }
>> }
>> if (adbname == NULL) {
>> INSIST(ht_node != NULL);
>> adbname = caa_container_of(ht_node, dns_adbname_t, ht_node);
>> unlink = true;
>> }
>> dns_adbname_ref(adbname);
> 
> What is this dns_adbname_ref() supposed to do ? And is there a reference to 
> adbname
> that is still used after rcu_read_unlock() ? What guarantees the existence of 
> the
> adbname after rcu_read_unlock() ?

This is part of the internal reference counting - there's a macro that expects 
`isc_refcount_t references;`
member on the struct and it creates _ref, _unref, _attach and _detach functions 
for each struct.

The last _detach/_unref calls a destroy function.

>> rcu_read_unlock();
>> and here's the part where LRU gets updated:
>> LOCK(>lock); /* Must be unlocked by the caller */
> 
> I suspect you use a scheme where you hold the RCU read-side to perform the 
> lookup, and
> then you use the object with an internal lock held. But expecting the object 
> to still
> exist after rcu read unlock is incorrect, unless some other reference 
> counting scheme
> is used.

Yeah, I was trying to minimize the sections where we hold the rcu_read locks, 
but I gave
up and now there's rcu_read lock held for longer periods of time.

>> if (NAME_DEAD(adbname)) {
>> UNLOCK(>lock);
>> dns_adbname_detach();
>> goto again;
>> }
>>   

[lttng-dev] urcu/rculist.h clarifications - for implementing LRU

2023-03-10 Thread Ondřej Surý via lttng-dev
Hey,

so, we are integrating userspace-rcu to BIND 9 (yay!) and as experiment,
I am rewriting the internal address database (keeps the infrastructure
information about names and addresses).

There's a hashtable to keep the entries and there's associated LRU list.

For the hashtable the cds_lfht seems to work well, but I am kind of struggling
with cds_list (the urcu/rculist.h variant).

The names and entries works in pretty much similar way, so I am going to
describe just one.

The workhorse is get_attached_and_locked_name() function (I am going to
skip the parts where we create keys, checks if LRU needs to be updated, etc.)

So this is part with the hashtable lookup which seems to work well:

rcu_read_lock();

struct cds_lfht_iter iter;
struct cds_lfht_node *ht_node;

cds_lfht_lookup(adb->names_ht, hashval, names_match, , );

ht_node = cds_lfht_iter_get_node();

bool unlink = false;

if (ht_node == NULL) {
/* Allocate a new name and add it to the hash table. */
adbname = new_adbname(adb, name, start_at_zone);

ht_node = cds_lfht_add_unique(adb->names_ht, hashval,
  names_match, ,
  >ht_node);
if (ht_node != >ht_node) {
/* ISC_R_EXISTS */
destroy_adbname(adbname);
adbname = NULL;
}
}
if (adbname == NULL) {
INSIST(ht_node != NULL);
adbname = caa_container_of(ht_node, dns_adbname_t, ht_node);
unlink = true;
}

dns_adbname_ref(adbname);

rcu_read_unlock();

and here's the part where LRU gets updated:

LOCK(>lock); /* Must be unlocked by the caller */
if (NAME_DEAD(adbname)) {
UNLOCK(>lock);
dns_adbname_detach();
goto again;
}

if (adbname->last_used + ADB_CACHE_MINIMUM <= last_update) {
adbname->last_used = now;

LOCK(>names_lru_lock);
if (unlink) {
cds_list_del_rcu(>list_node);
}
cds_list_add_tail_rcu(>list_node, >names_lru);
UNLOCK(>names_lru_lock);
}

The NAME_DEAD gets updated under the adbname->lock in expire_name():

if (!NAME_DEAD(adbname)) {
adbname->flags |= NAME_IS_DEAD;

/* Remove the adbname from the hashtable... */
(void)cds_lfht_del(adb->names_ht, >ht_node);

/* ... and LRU list */
LOCK(>names_lru_lock);
cds_list_del_rcu(>list_node);
UNLOCK(>names_lru_lock);
}

So, now the problem is that sometimes I get a crash under load:

(gdb) bt
#0  0x7fae87a34c96 in cds_list_del_rcu (elem=0x7fae37e78880) at 
/usr/include/x86_64-linux-gnu/urcu/rculist.h:71
#1  get_attached_and_locked_name (adb=adb@entry=0x7fae830142a0, 
name=name@entry=0x7fae804fc9b0, start_at_zone=true, now=) at 
adb.c:1446
#2  0x7fae87a392bf in dns_adb_createfind (adb=0x7fae830142a0, 
loop=0x7fae842c3a20, cb=cb@entry=0x7fae87b28d9f , 
cbarg=0x7fae7c679000, name=name@entry=0x7fae804fc9b0, qname=0x7fae7c679010, 
qtype=1, options=63, now=, target=0x0, port=53, depth=1, 
qc=0x7fae7c651060, findp=0x7fae804fc698) at adb.c:2149

(gdb) frame 0
#0  0x7fae87a34c96 in cds_list_del_rcu (elem=0x7fae37e78880) at 
/usr/include/x86_64-linux-gnu/urcu/rculist.h:71
71  elem->next->prev = elem->prev;
(gdb) print elem->next
$1 = (struct cds_list_head *) 0x0
(gdb) print elem
$2 = (struct cds_list_head *) 0x7fae37e78880

So, I suspect, I am doing something wrong when updating the position of the the 
name in the LRU list.

There are couple of places where we iterate through the LRU list (overmem 
cleaning can kick-in, the user initiated cleaning can start, shutdown can be 
happening...)

Is there perhaps already some LRU implementation using Userspace-RCU that I can 
take look at?

Thank you!
Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev