I propose to delete current use of __HAVE_ATOMIC_AS_MEMBAR because it is a bad API, and replace it by one of two options:
(a) Add membar_release_preatomic and membar_acquire_postatomic: noop if __HAVE_ATOMIC_AS_MEMBAR, alias for membar_release/acquire otherwise. (Maybe also add membar_sync_atomic or something but this doesn't matter much because membar_sync is so rare.) (b) Change the semantics of membar_release and membar_acquire so they are noops on platforms with __HAVE_ATOMIC_AS_MEMBAR, like x86. For ordering plain loads and stores there are still atomic_load_acquire and atomic_store_release. Thoughts? I'm leaning toward option (b), but posting this for discussion because on its face (a) might seem more reasonable until you look at the implications. Tradeoffs discussed below. BACKGROUND On some architectures, the atomic read/modify/write operations of atomic_ops(3) (but not plain atomic load/store) imply full sequential consistency barriers (membar_sync), so putting membar_* immediately before or after them is redundant -- and possibly expensive. We define __HAVE_ATOMIC_AS_MEMBAR on these platforms so you can do things like this as optimizations: #ifndef _HAVE_ATOMIC_AS_MEMBAR membar_release(); #endif if (atomic_dec_uint_nv(&obj->refcnt) != 0) return; #ifndef __HAVE_ATOMIC_AS_MEMBER membar_acquire(); #endif free(obj); Except -- this code is actually broken because I misspelled the macro both times, and the failure mode is that the membars are quietly omitted on _all_ platforms, not just those where it is safe. We had this bug in practice for two years in subr_pcq.c: https://mail-index.netbsd.org/source-changes/2014/02/06/msg051520.html DISCUSSION Adding membar_release_preatomic and membar_acquire_postatomic makes it a little less of a mouthful to take advantage of the optimization on platforms like x86: if you need an atomic read/modify/write operation like atomic_dec_uint_nv to be a release operation, just precede it by membar_release_preatomic; if you need it to be an acquire operation, just follow it by membar_acquire_preatomic. membar_release_preatomic(); if (atomic_dec_uint_nv(&obj->refcnt) != 0) return; membar_acquire_postatomic(); free(obj); But essentially every use of membar_release and membar_acquire is already in this pattern, adjacent to an atomic r/m/w. Any plain atomic load/store that needs to be an acquire/release operation should already be using atomic_store_release or atomic_load_acquire (or atomic_load_consume in some cases). So if we introduced membar_release_preatomic and membar_acquire_postatomic, there would be essentially no need for membar_release and membar_acquire any more. TRADEOFFS Pro option (a): - generally better to create new names than change existing ones - maybe confusing if membar_release() and then plain store isn't actually a release operation, or if plain load then membar_acquire() isn't actually an acquire operation Pro option (b): - existing names have no use once new names are introduced - existing names haven't gone out in release yet - membar_ops(3) already has a dizzying array of options - changes to critical APIs like this are more painful the longer we wait - better to keep only the really useful operations for something that's already hard to understand - C11 semantics doesn't allow fences to affect plain loads and stores on non-_Atomic objects, so the existing semantics can't really work in C11 anyway