https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973

            Bug ID: 70973
           Summary: x86: Can the __atomic_*() operations be made to list
                    the LOCK prefixes?
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dhowells at redhat dot com
  Target Milestone: ---

When generating x86 code, the Linux kernel constructs a list of the LOCK
prefixes it applies to inline assembly-coded atomic operations.  This allows
the LOCK prefixes to be NOP'd out if there's only one CPU online.

Would it be possible to duplicate this in gcc?

What the kernel does is this:

   #ifdef CONFIG_SMP
   #define LOCK_PREFIX_HERE \
                ".pushsection .smp_locks,\"a\"\n"       \
                ".balign 4\n"                           \
                ".long 671f - .\n" /* offset */         \
                ".popsection\n"                         \
                "671:"

   #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "

   #else /* ! CONFIG_SMP */
   #define LOCK_PREFIX_HERE ""
   #define LOCK_PREFIX ""
   #endif

placing a 32-bit relative pointer in the .smp_locks section (we're assuming
that .smp_locks isn't going to be more than 2G away from the .text section).

Note, however, some LOCK prefixes apparently cannot be dispensed with, so the
listing cannot be unconditional via a command-line flag.

Would it be possible to provide a constant to OR with the memorder parameter to
flag that this atomic op should be listed in .smp_locks? E.g.:

   orig = __atomic_fetch_or(ptr, mask, __ATOMIC_ACQUIRE |
__ATOMIC_RECORD_LOCK);

This might also be useful in userspace.  So, for example, pthread
initialisation could replace a bunch of NOPs with a bunch of LOCKs so that
single-threaded processes don't get a performance penalty from LOCKed
instructions.

Reply via email to