https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79437
Bug ID: 79437 Summary: Redundant move instruction when getting sign bit of double on 32-bit architecture Product: gcc Version: 6.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: mizvekov at gmail dot com Target Milestone: --- Testing this simple example: #ifdef DO_BOOL bool #else int #endif sign(double a) noexcept { __UINT64_TYPE__ r; __builtin_memcpy(&r, &a, sizeof(r)); return r >> 63 & 1; } Compiling with: -march=i686 -m32 -Os -fomit-frame-pointer With DO_BOOL defined (bool sign(double)), this is the code generated: sign(double): mov eax, DWORD PTR [esp+8] shr eax, 31 ret Without DO_BOOL (int sign(double)), this is the code generated instead: sign(double): mov edx, DWORD PTR [esp+8] mov eax, edx shr eax, 31 ret Notice the redundant moving around, from edx to eax instead of straight to eax. These are my findings so far when investigating this: * There is no difference between punning with memcpy or with union. * Problem only happens when punning from floating point to integer. Variants of the sign function that accept a void* or an uint64 directly are not affected (ie either retuning bool or int result in the same optimal code). * Problem only happens when punning to uint64. When punning to uint32[2], ie: sign(double a) noexcept { __UINT32_TYPE__ r[2]; __builtin_memcpy(&r, &a, sizeof(r)); return r[1] >> 31 & 1; } then again optimal code generation is performed in both cases. * Inspecting the results of -fdump-tree-all reveals the following transformation happens early on (on -tree-original already) --- build_bool/test.cc.003t.original 2017-02-08 22:07:22.749603900 -0200 +++ build_int/test.cc.003t.original 2017-02-08 22:07:11.675433300 -0200 @@ -1,5 +1,5 @@ -;; Function bool sign_mcpy1(double) (null) +;; Function int sign_mcpy1(double) (null) ;; enabled by -tree-original @@ -10,7 +10,7 @@ long long unsigned int r; <<cleanup_point <<< Unknown tree: expr_stmt (void) __builtin_memcpy ((void *) &r, (const void *) &a, 8) >>>>>; - return <retval> = (signed long long) r < 0; + return <retval> = (int) (r >> 63); } >>>; This basic structure persists until past .211t.optimized: --- build_bool/test.cc.211t.optimized 2017-02-08 22:07:22.743595700 -0200 +++ build_int/test.cc.211t.optimized 2017-02-08 22:07:11.670441000 -0200 @@ -1,16 +1,16 @@ -;; Function bool sign_mcpy1(double) (_Z10sign_mcpy1d, funcdef_no=0, decl_uid=1665, cgraph_uid=0, symbol_order=0) +;; Function int sign_mcpy1(double) (_Z10sign_mcpy1d, funcdef_no=0, decl_uid=1665, cgraph_uid=0, symbol_order=0) -bool sign_mcpy1(double) (double a) +int sign_mcpy1(double) (double a) { long long unsigned int _2; - signed long long r.1_3; - bool _4; + long long unsigned int _3; + int _4; <bb 2>: _2 = VIEW_CONVERT_EXPR<long long unsigned int>(a_5(D)); - r.1_3 = (signed long long) _2; - _4 = r.1_3 < 0; + _3 = _2 >> 63; + _4 = (int) _3; return _4; } So, the return bool case gets a different optimization early (at the front-end?), and later stages can't fix it.