https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103966
Bug ID: 103966 Summary: std::atomic relaxed load, inc, store sub-optimal codegen Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: witold.baryluk+gcc at gmail dot com Target Milestone: --- Both functions below, should compile to the same assembly on x86: #include <atomic> #include <cstdint> uint64_t x; void inc_a() { x++; } std::atomic<uint64_t> y; void inc_b_non_atomic() { y.store(y.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed); } and it does so in clang. It does not in gcc 12 (and earlier). https://godbolt.org/z/GcM67xz8T This pattern is very popular in approximate statistical counters / metrics, where the flow of information is unidirectional (i.e. from one thread that does updates, to another thread that only reads the counters), and its performance is critical in many codebases.