[Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434

2022-11-17 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #9 from bartoldeman at users dot sourceforge.net ---
I ended up using -mprefer-vector-width=128 as a workaround myself (via
__attribute__((target("prefer-vector-width=128", so there is still some AVX
vectorization.

[Bug tree-optimization/107647] GCC 12.2.0 may produce FMAs even with -ffp-contract=off

2022-11-11 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647

--- Comment #1 from bartoldeman at users dot sourceforge.net ---
According to godbolt it's still producing FMAs on trunk:
https://godbolt.org/z/aWh6d1E4E

[Bug tree-optimization/107647] New: GCC 12.2.0 may produce FMAs even with -ffp-contract=off

2022-11-11 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647

Bug ID: 107647
   Summary: GCC 12.2.0 may produce FMAs even with
-ffp-contract=off
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

I stumped upon an example where GCC generates FMA instruction even when FMAs
are disabled using -ffp-contract=off (extracted from
https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_64/cscal.c)

$ cat cscal.c
void cscal(int n, float da_r, float *x)
{
  for (int i = 0; i < n; i += 4)
{
  float temp0  =  da_r * x[i]   - x[i+1];
  float temp1  =  da_r * x[i+2] - x[i+3];
  x[i+1]   =  da_r * x[i+1] + x[i];
  x[i+3]   =  da_r * x[i+3] + x[i+2];
  x[i] =  temp0;
  x[i+2]   =  temp1;
}
}
$ gcc -S -march=haswell -O2 -ffp-contract=off cscal.c
$ grep fma cscal.s
vfmaddsub231ps  %xmm0, %xmm2, %xmm1

I would expect there to be no FMA instructions in there.

[Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.

2022-10-28 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

bartoldeman at users dot sourceforge.net changed:

   What|Removed |Added

  Attachment #53785|0   |1
is obsolete||

--- Comment #3 from bartoldeman at users dot sourceforge.net ---
Created attachment 53786
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53786=edit
Corrected test case

In my eagerness to make it as short as possible I made it too short indeed!

[Bug tree-optimization/107451] New: Segmentation fault with vectorized code.

2022-10-28 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Bug ID: 107451
   Summary: Segmentation fault with vectorized code.
   Product: gcc
   Version: 11.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 53785
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53785=edit
Test case

The following code:

double dot(int n, const double *x, int inc_x, const double *y)
{
int i, ix;
double dot[4] = { 0.0, 0.0, 0.0, 0.0 } ; 

ix=0;
for(i = 0; i < n; i++) {
dot[0] += x[ix]   * y[ix]   ;
dot[1] += x[ix+1] * y[ix+1] ;
dot[2] += x[ix]   * y[ix+1] ;
dot[3] += x[ix+1] * y[ix]   ;
ix += inc_x ;
}

return dot[0] + dot[1] + dot[2] + dot[3];
}

int main(void)
{
double x = 0, y = 0;
return dot(1, , 4096*4096, );
}

crashes with (on Linux x86-64)

$ gcc -O2 -ftree-vectorize -march=haswell crash.c -o crash
$ ./a.out 
Segmentation fault

for GCC 11.3.0 and also the current prerelease (gcc version 11.3.1 20221021),
and also when patched with the patches from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212.

The loop code assembly is as follows:

  18:   c5 f9 10 1e vmovupd (%rsi),%xmm3
  1c:   c5 f9 10 21 vmovupd (%rcx),%xmm4
  20:   ff c2   inc%edx
  22:   c4 e3 65 18 0c 06 01vinsertf128 $0x1,(%rsi,%rax,1),%ymm3,%ymm1
  29:   c4 e3 5d 18 04 01 01vinsertf128 $0x1,(%rcx,%rax,1),%ymm4,%ymm0
  30:   48 01 c6add%rax,%rsi
  33:   48 01 c1add%rax,%rcx
  36:   c4 e3 fd 01 c9 11   vpermpd $0x11,%ymm1,%ymm1
  3c:   c4 e3 fd 01 c0 14   vpermpd $0x14,%ymm0,%ymm0
  42:   c4 e2 f5 b8 d0  vfmadd231pd %ymm0,%ymm1,%ymm2
  47:   39 fa   cmp%edi,%edx
  49:   75 cd   jne18 

what happens here is that the vinsertf128 instructions take the element from
one loop iteration later, and those get put in the high halves of ymm0 and
ymm1.
The vpermpd instructions then throw away those high halves again, so e.g. they
turn 1,2,3,4 into 2,1,2,1 and 1,2,2,1 respectively.

So the result is correct but the superfluous vinsertf128 instructions access
memory potentially past the end of x or y and thus a produce a segfault.

related issue (coming from OpenBLAS):
https://github.com/easybuilders/easybuild-easyconfigs/issues/16387
may also be related:
https://github.com/xianyi/OpenBLAS/issues/3740#issuecomment-1233899834
(the particular comment shows very similar code but it's for GCC 12 which
vectorizes by default, OpenBLAS worked around this by disabling the tree
vectorizer there but only on Mac OS and Windows).

[Bug fortran/107294] Missed optimization: multiplying real with complex number in Fortran (only)

2022-10-17 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107294

bartoldeman at users dot sourceforge.net changed:

   What|Removed |Added

 Resolution|FIXED   |WONTFIX

[Bug fortran/107294] Missed optimization: multiplying real with complex number in Fortran (only)

2022-10-17 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107294

bartoldeman at users dot sourceforge.net changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from bartoldeman at users dot sourceforge.net ---
Thanks for the explanation, finding an example with NaNs you get
0.0 * (NaN + 0.0i) = NaN + 0.0i
for C with annex G.5.1
but
NaN + NaN i
for Fortran, unless you specify -fno-signed-zeros.

program main
  use, intrinsic :: ieee_arithmetic, only: IEEE_Value, IEEE_QUIET_NAN
  use, intrinsic :: iso_fortran_env, only: real32

  real(real32) :: a, nan
  complex(real32) :: cnan

  nan = IEEE_VALUE(nan, IEEE_QUIET_NAN)
  cnan = cmplx(nan, 0.0)
  zero = 0.0
  print *, zero, cnan, zero * cnan
end

illustrates this
   0. ( NaN,  0.) ( NaN,   
 NaN)
vs
   0. ( NaN,  0.) ( NaN, 
0.)

[Bug fortran/107294] New: Missed optimization: multiplying real with complex number in Fortran (only)

2022-10-17 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107294

Bug ID: 107294
   Summary: Missed optimization: multiplying real with complex
number in Fortran (only)
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

This code:

complex function csmul(a, b)
  real, value :: a
  complex, value :: b
  csmul = a * b
end function csmul

produces this assembly on x86-64 (11.3, -O2)
   0:   66 0f d6 4c 24 f8   movq   %xmm1,-0x8(%rsp)
   6:   f3 0f 10 64 24 fc   movss  -0x4(%rsp),%xmm4
   c:   f3 0f 10 4c 24 f8   movss  -0x8(%rsp),%xmm1
  12:   0f 28 d0movaps %xmm0,%xmm2
  15:   66 0f ef db pxor   %xmm3,%xmm3 # xmm3 = 0
  19:   f3 0f 59 d1 mulss  %xmm1,%xmm2
  1d:   0f 28 ecmovaps %xmm4,%xmm5
  20:   f3 0f 59 eb mulss  %xmm3,%xmm5 # xmm5 = 0
  24:   f3 0f 59 c4 mulss  %xmm4,%xmm0
  28:   f3 0f 59 cb mulss  %xmm3,%xmm1 # xmm1 = 0
  2c:   f3 0f 5c d5 subss  %xmm5,%xmm2 # xmm2 unchanged
  30:   f3 0f 58 c1 addss  %xmm1,%xmm0 # xmm0 unchanged
  34:   f3 0f 11 54 24 f0   movss  %xmm2,-0x10(%rsp)
  3a:   f3 0f 11 44 24 f4   movss  %xmm0,-0xc(%rsp)
  40:   f3 0f 7e 44 24 f0   movq   -0x10(%rsp),%xmm0
  46:   c3  retq

here xmm3 (imaginary part of a, promoted to complex) is set to 0 but this is
not exploited in the remainder.

On the other hand the assembly for the corresponding C code looks good, with
two mul instructions, as expected:

float _Complex csmul(float a, float _Complex b)
{
  return a * b;
}

 :
   0:   66 0f d6 4c 24 f8   movq   %xmm1,-0x8(%rsp)
   6:   f3 0f 10 4c 24 f8   movss  -0x8(%rsp),%xmm1
   c:   f3 0f 59 c8 mulss  %xmm0,%xmm1
  10:   f3 0f 59 44 24 fc   mulss  -0x4(%rsp),%xmm0
  16:   f3 0f 11 4c 24 f0   movss  %xmm1,-0x10(%rsp)
  1c:   f3 0f 11 44 24 f4   movss  %xmm0,-0xc(%rsp)
  22:   f3 0f 7e 44 24 f0   movq   -0x10(%rsp),%xmm0
  28:   c3  retq   

The same issue is still present in trunk, according to godbolt.org.

[Bug tree-optimization/107254] [11/12 Regression] Wrong vectorizer code (Fortran) since r11-1501-gda2b7c7f0a136b4d

2022-10-17 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254

--- Comment #10 from bartoldeman at users dot sourceforge.net ---
Thanks for the fix! I can confirm that, when applied to 11.3 (with files
renamed from .cc to .c), it fixes the issue, and with it, thousands of test
failures in the reference LAPACK test suite.

My findings for LAPACK are in this issue here:
https://github.com/Reference-LAPACK/lapack/issues/732

[Bug tree-optimization/107254] New: Wrong vectorizer code (GCC 11 only, Fortran)

2022-10-13 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254

Bug ID: 107254
   Summary: Wrong vectorizer code (GCC 11 only, Fortran)
   Product: gcc
   Version: 11.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 53703
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53703=edit
Test case

The following code gives the wrong result (-1. instead of
0.) with gfortran 11.3 (also tested with the 11.3.1 20221007
prerelease) when given the options
`-O2 -ftree-vectorize -march=core-avx`
for x86_64.

There's no issue with GCC 9,10, and 12. It could be related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 except that bug also
affects GCC 12.

This issue came up from testing the reference LAPACK with -ftree-vectorize
enabled, where many more tests failed with recent GCC (11/12), see
https://github.com/easybuilders/easybuild-easyconfigs/issues/16380

$ gfortran -O2 -ftree-vectorize -march=core-avx2 dhgeqz2.f90; ./a.out 
  -1. 
$ gfortran -Wall -O2 dhgeqz2.f90; ./a.out 
   0.


subroutine dlartg( f, g, s, r )
  implicit none
  double precision :: f, g, r, s
  double precision :: d, p

  d = sqrt( f*f + g*g )
  p = 1.d0 / d
  if( abs( f ) > 1 ) then
 s = g*sign( p, f )
 r = sign( d, f )
  else
 s = g*sign( p, f )
 r = sign( d, f )
  end if
end subroutine

subroutine dhgeqz( n, h, t )
  implicit none
  integern
  double precision   h( n, * ), t( n, * )
  integerjc
  double precision   c, s, temp, temp2, tempr
  temp2 = 10d0
  call dlartg( 10d0, temp2, s, tempr )
  c = 0.9d0
  s = 1.d0
  do jc = 1, n
 temp = c*h( 1, jc ) + s*h( 2, jc )
 h( 2, jc ) = -s*h( 1, jc ) + c*h( 2, jc )
 h( 1, jc ) = temp
 temp2 = c*t( 1, jc ) + s*t( 2, jc )
 ! t(2,2)=-s*t(1,2)+c*t(2,2)=-0.9*0+1*0=0
 t( 2, jc ) = -s*t( 1, jc ) + c*t( 2, jc )
 t( 1, jc ) = temp2
  enddo
end subroutine dhgeqz

program test
  implicit none
  double precision h(2,2), t(2,2)  
  h = 0
  t(1,1) = 1
  t(2,1) = 0
  t(1,2) = 0
  t(2,2) = 0
  call dhgeqz( 2, h, t )
  print *,t(2,2)
end program test

[Bug fortran/103023] ICE (Segmentation fault) with !$OMP DECLARE SIMD(func) linear(ref(u))

2021-11-01 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103023

--- Comment #2 from bartoldeman at users dot sourceforge.net ---
Yes this is about the ICE mainly.

It was stripped down from this, which HAS uniform.

subroutine func(u,f,ndim)
  !$OMP DECLARE SIMD(func) uniform(ndim) linear(ref(f,u):1)
  integer, intent(in) :: ndim
  double precision, intent(in) :: u(ndim)
  double precision, intent(out) :: f(ndim)
  f(1) = u(1) + u(2)
  f(2) = u(1) - u(2)
end subroutine func

subroutine main(u,f)
  double precision, intent(in) :: u(8)
  double precision, intent(out) :: f(8)
!$OMP SIMD
  do i=1,8,2
 call func(u(i),f(i),2)
  enddo
end subroutine main

If I leave out ndim and hardcode "2" in func (:: u(2) and :: f(2)), or let the
auto-vectorizer and inliner do its work this produces good code (though it
would be better with u and f transposed, as basically the code transposes it to
two ymm registers in the asm output.

With general "ndim" that could still work, e.g. with ndim=3 and 3 equations for
u(1:3) -> f(1:3), you'd work with 3 vector registers.

Now you may wonder why "ndim" here, since we know it's "2": this comes from
feeding a user-defined function into a larger program (that processes e.g.
maps) where that same user needs to specify ndim as a parameter.

Intel (ifort) doesn't like this at all from what I can see:

openfun.f90(1): error #6080: Only scalar variables may be referenced in a
LINEAR or MONOTONIC clause.   [U]
subroutine func(u,f)
^
openfun.f90(1): error #6080: Only scalar variables may be referenced in a
LINEAR or MONOTONIC clause.   [F]
subroutine func(u,f)
--^
compilation aborted for openfun.f90 (code 1)

[Bug fortran/103023] New: ICE (Segmentation fault) with !$OMP DECLARE SIMD(func) linear(ref(u))

2021-11-01 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103023

Bug ID: 103023
   Summary: ICE (Segmentation fault) with !$OMP DECLARE SIMD(func)
linear(ref(u))
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 51717
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51717=edit
Test case for crash

For the following Fortran code gfortran gives a SIGSEGV (tested GCC 9.3,10.3
locally, 11.2 and trunk on Godbolt)

subroutine func(u,ndim)
  !$OMP DECLARE SIMD(func) linear(ref(u))
  integer, intent(in) :: ndim
  double precision, intent(in) :: u(ndim)
end subroutine func

Here's the output for 10.3:

$ gfortran -c -fopenmp-simd openfun2.f90
openfun2.f90:1:15:

1 | subroutine func(u,ndim)
  |   1
internal compiler error: Segmentation fault

0xc147cf crash_signal
../../gcc/toplev.c:328
0x948ae6 size_binop_loc(unsigned int, tree_code, tree_node*, tree_node*)
../../gcc/fold-const.c:1906
0x7b8258 gfc_trans_omp_clauses
../../gcc/fortran/trans-openmp.c:2324
0x7bb168 gfc_trans_omp_declare_simd(gfc_namespace*)
../../gcc/fortran/trans-openmp.c:5838
0x77b767 gfc_create_function_decl(gfc_namespace*, bool)
../../gcc/fortran/trans-decl.c:3069
0x77b767 gfc_generate_function_code(gfc_namespace*)
../../gcc/fortran/trans-decl.c:6744
0x6f679e translate_all_program_units
../../gcc/fortran/parse.c:6306
0x6f679e gfc_parse_file()
../../gcc/fortran/parse.c:6567
0x74dfbf gfc_be_parse_file
../../gcc/fortran/f95-lang.c:210
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug rtl-optimization/101683] Floating point exception for double->unsigned conversion on avx512 only

2021-07-30 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101683

--- Comment #6 from bartoldeman at users dot sourceforge.net ---
"really not many people care about floating point exceptions". I think more
people should :) but this is indeed the context.

We found this issue on a supercomputer running OpenFOAM (which can enable FP
exceptions, see https://cpp.openfoam.org/v3/a02284.html), and a small simple
MPI program with FP exceptions enabled. Even then it crashed in an underlying
library, and not OpenFOAM itself, see
https://github.com/ComputeCanada/software-stack/issues/74

In the end the combination of MPI and FP exceptions easily triggers it, but the
vast majority of jobs don't crash, so even on our cluster this is very rare
indeed. And many other clusters don't compile the UCX library with avx512
optimizations enabled or use precompiled binaries without those enabled.

[Bug target/101683] New: Floating point exception for double->unsigned conversion on avx512 only

2021-07-29 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101683

Bug ID: 101683
   Summary: Floating point exception for double->unsigned
conversion on avx512 only
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 51222
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51222=edit
File to reproduce

For this code:

#define _GNU_SOURCE
#include 

int main(int argc, char **argv) {
feenableexcept(FE_INVALID);
double argcm10 = argc / -0.1;
return (unsigned)(argcm10 < 0.0 ? 0 : argcm10);
}

$ gcc -O -march=skylake-avx512 fpexcept.c -lm
$ ./a.out 
Floating point exception


the instructions
vcvttsd2usi %xmm0, %eax
vxorpd  %xmm1, %xmm1, %xmm1
vucomisd%xmm0, %xmm1
movl$0, %edx
cmova   %edx, %eax
are generated just after the division, so the conversion happens before the
comparison.
"If a converted result cannot be represented in the destination format, the
floating-point invalid exception is raised, and if this exception is masked,
the integer value 2^w – 1 is returned, where w represents the number of bits in
the destination format."

so when masked, for argcm10 = -10.0 the value 2^w-1 is discarded and all is
well, since it's < 0, but not when unmasked.

I can reproduce this issue with 9.3 as well, but not with 8.4 (the generated
code is correct for 8.4). I have not tried 11.1 yet.

Note: I found this issue with the UCX library when compiled with
-march=skylake-avx512, this example is stripped down from:
https://github.com/openucx/ucx/blob/f5362f5e6f80d930b88c44c63b4d8d71cf91d214/src/ucp/core/ucp_ep.c#L2699

[Bug fortran/93734] New: Invalid code generated with -O2 -march=haswell -ftree-vectorize

2020-02-13 Thread bartoldeman at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93734

Bug ID: 93734
   Summary: Invalid code generated with -O2 -march=haswell
-ftree-vectorize
   Product: gcc
   Version: 8.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 47837
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47837=edit
Fortran code that prints 0 if correct, and -9 if miscompiled

The attached code prints -9. if compiled using

gfortran -O2 -march=haswell -ftree-vectorize bug.f90 -o bug
./bug
  -9.
using
GNU Fortran (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

also reproduceable on GCC 9.2.0, but not with GCC 7.3.0 and earlier.

The correct answer is 1-1=0.

(I found this issue first when compiling the reference BLAS using those options
and running the "zblat2" tests, the test is a much reduced version of ztrsv,
see
http://www.netlib.org/lapack/explore-html/dc/dc1/group__complex16__blas__level2_ga99cc66f0833474d6607e6ea7dbe2f9bd.html#ga99cc66f0833474d6607e6ea7dbe2f9bd)

[Bug target/52838] New: [x32] missed optimization for pointer return value

2012-04-02 Thread bartoldeman at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52838

 Bug #: 52838
   Summary: [x32] missed optimization for pointer return value
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: bartolde...@users.sourceforge.net


The program test.c:

extern void *foo(void);
extern void bar(void*);

void test(void)
{
  bar(foo());
}

when compiled with
gcc-4.7 -mx32 -Os -S test.c
produces:
.filetest.c
.text
.globltest
.typetest, @function
test:
.LFB0:
.cfi_startproc
pushq%rax
.cfi_def_cfa_offset 16
callfoo
popq%rdx
.cfi_def_cfa_offset 8
movq%rax, %rdi
jmpbar
.cfi_endproc
.LFE0:
.sizetest, .-test
.identGCC: (Debian 4.7.0-1) 4.7.0
.section.note.GNU-stack,,@progbits

Here movq %rax, %rdi could be replaced by movl %eax, %edi, saving one
prefix byte 0x48.