On Fri, 15 Feb 2019, Konstantin Belousov wrote:

On Fri, Feb 15, 2019 at 07:16:04AM +0000, Alexey Dokuchaev wrote:
On Thu, Feb 14, 2019 at 01:53:11PM +0000, Konstantin Belousov wrote:
New Revision: 344118
URL: https://svnweb.freebsd.org/changeset/base/344118

  Provide userspace versions of do_cpuid() and cpuid_count() on i386.

  Some older compilers, when generating PIC code, cannot handle inline
  asm that clobbers %ebx (because %ebx is used as the GOT offset
  register).  Userspace versions avoid clobbering %ebx by saving it to
  stack before executing the CPUID instruction.

+static __inline void
+do_cpuid(u_int ax, u_int *p)
+       __asm __volatile(
+           "pushl\t%%ebx\n\t"
+           "cpuid\n\t"
+           "movl\t%%ebx,%1\n\t"
+           "popl\t%%ebx"

Is there a reason to prefer pushl+movl+popl instead of movl+xchgl?

    "movl %%ebx, %1\n\t"
    "xchgl %%ebx, %1"

xchgl seems to be slower even in registers format (where no implicit
lock is used).  If you can demonstrate that your fragment is better in
some microbenchmark, I can change it.  But also note that its use is not
on the critical path.

The should have the same speed on modern x86.  xchgl %reg1,%reg2 is
not slow, but it changes 2 visible registers and a needs somwhere to
hold one of the registers while changing it, so on 14 year old AthlonXP
where I know the times in cycles better, register xchgl was twice as slow
as register move (2 cycles latency instead of 1, and throughput ==
latency (?)).  On 2015 Haswell, register movl in a loop is in parallel
with the loop overhead (1 cycle), while xchgl and pushl/popl take 0.5
cycles longer on average.  Latency might be a problem for pushl/popl
in critical paths.  There aren't many of those.

There is no reason to use the style with strings made unreadable using
soft tabs and newlines.  gcc supported hard newlines 20-30 years ago,
but broke this because C90 or C99 made hard newlines in strings invalid.
This broke lots of my asms.  I now use hard tabs and backslash-hard_newlines
after soft newlines:

        __asm __volatile(" \n\
        pushl   %%ebx           \n\
        cpuid                   \n\
        movl    %%ebx,%1        \n\
        popl    %%ebx"             \n\

The Standard C lossage forces use \n\ before hard newline, and readability
forces a hard-to-edit variable number of hard tabs before \n\, but otherwise
the code looks the same as before (opcodes are outdented to column 8 in
large asms, and labels are outdented to column 0, so that the code looks
the same as non-inline asm too).

svn-src-all@freebsd.org mailing list
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to