https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79437

            Bug ID: 79437
           Summary: Redundant move instruction when getting sign bit of
                    double on 32-bit architecture
           Product: gcc
           Version: 6.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mizvekov at gmail dot com
  Target Milestone: ---

Testing this simple example:

#ifdef DO_BOOL
bool
#else
int
#endif
sign(double a) noexcept {
    __UINT64_TYPE__ r;
    __builtin_memcpy(&r, &a, sizeof(r));
    return r >> 63 & 1;
}

Compiling with: -march=i686 -m32 -Os -fomit-frame-pointer

With DO_BOOL defined (bool sign(double)), this is the code generated:

sign(double):
        mov     eax, DWORD PTR [esp+8]
        shr     eax, 31
        ret

Without DO_BOOL (int sign(double)), this is the code generated instead:

sign(double):
        mov     edx, DWORD PTR [esp+8]
        mov     eax, edx
        shr     eax, 31
        ret

Notice the redundant moving around, from edx to eax instead of straight to eax.

These are my findings so far when investigating this:

* There is no difference between punning with memcpy or with union.

* Problem only happens when punning from floating point to integer.
Variants of the sign function that accept a void* or an uint64 directly are not
affected (ie either retuning bool or int result in the same optimal code).

* Problem only happens when punning to uint64. When punning to uint32[2], ie:

sign(double a) noexcept {
    __UINT32_TYPE__ r[2];
    __builtin_memcpy(&r, &a, sizeof(r));
    return r[1] >> 31 & 1;
}

then again optimal code generation is performed in both cases.

* Inspecting the results of -fdump-tree-all reveals the following
transformation happens early on (on -tree-original already)

--- build_bool/test.cc.003t.original    2017-02-08 22:07:22.749603900 -0200
+++ build_int/test.cc.003t.original     2017-02-08 22:07:11.675433300 -0200
@@ -1,5 +1,5 @@

-;; Function bool sign_mcpy1(double) (null)
+;; Function int sign_mcpy1(double) (null)
 ;; enabled by -tree-original


@@ -10,7 +10,7 @@
         long long unsigned int r;
     <<cleanup_point <<< Unknown tree: expr_stmt
   (void) __builtin_memcpy ((void *) &r, (const void *) &a, 8) >>>>>;
-    return <retval> = (signed long long) r < 0;
+    return <retval> = (int) (r >> 63);
   }
    >>>;

This basic structure persists until past .211t.optimized:

--- build_bool/test.cc.211t.optimized   2017-02-08 22:07:22.743595700 -0200
+++ build_int/test.cc.211t.optimized    2017-02-08 22:07:11.670441000 -0200
@@ -1,16 +1,16 @@

-;; Function bool sign_mcpy1(double) (_Z10sign_mcpy1d, funcdef_no=0,
decl_uid=1665, cgraph_uid=0, symbol_order=0)
+;; Function int sign_mcpy1(double) (_Z10sign_mcpy1d, funcdef_no=0,
decl_uid=1665, cgraph_uid=0, symbol_order=0)

-bool sign_mcpy1(double) (double a)
+int sign_mcpy1(double) (double a)
 {
   long long unsigned int _2;
-  signed long long r.1_3;
-  bool _4;
+  long long unsigned int _3;
+  int _4;

   <bb 2>:
   _2 = VIEW_CONVERT_EXPR<long long unsigned int>(a_5(D));
-  r.1_3 = (signed long long) _2;
-  _4 = r.1_3 < 0;
+  _3 = _2 >> 63;
+  _4 = (int) _3;
   return _4;

 }

So, the return bool case gets a different optimization early (at the
front-end?), and later stages can't fix it.

Reply via email to