https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973
Bug ID: 70973 Summary: x86: Can the __atomic_*() operations be made to list the LOCK prefixes? Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: dhowells at redhat dot com Target Milestone: --- When generating x86 code, the Linux kernel constructs a list of the LOCK prefixes it applies to inline assembly-coded atomic operations. This allows the LOCK prefixes to be NOP'd out if there's only one CPU online. Would it be possible to duplicate this in gcc? What the kernel does is this: #ifdef CONFIG_SMP #define LOCK_PREFIX_HERE \ ".pushsection .smp_locks,\"a\"\n" \ ".balign 4\n" \ ".long 671f - .\n" /* offset */ \ ".popsection\n" \ "671:" #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; " #else /* ! CONFIG_SMP */ #define LOCK_PREFIX_HERE "" #define LOCK_PREFIX "" #endif placing a 32-bit relative pointer in the .smp_locks section (we're assuming that .smp_locks isn't going to be more than 2G away from the .text section). Note, however, some LOCK prefixes apparently cannot be dispensed with, so the listing cannot be unconditional via a command-line flag. Would it be possible to provide a constant to OR with the memorder parameter to flag that this atomic op should be listed in .smp_locks? E.g.: orig = __atomic_fetch_or(ptr, mask, __ATOMIC_ACQUIRE | __ATOMIC_RECORD_LOCK); This might also be useful in userspace. So, for example, pthread initialisation could replace a bunch of NOPs with a bunch of LOCKs so that single-threaded processes don't get a performance penalty from LOCKed instructions.