from:"agner at agner dot org"

[Bug c++/111897] Initialization of _Float16 with f.p. constant gives false warning

2023-10-25 Thread agner at agner dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111897

--- Comment #3 from Agner Fog  ---
I have asked the authors of the linked document. They say that the example in
the document is wrong. The latest version still has the error in the example:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html

The compiler should give warnings for all of these:
_Float16 A = 1.5;
_Float16 B(1.5);
_Float16 C{1.5};

but no warning for integers:
_Float16 D = 2;

It is fine to give a warning rather than an error message when the intention is
obvious and there is no ambiguity.

Below is my conversation with David Olsen:


That’s correct.  Conversions between integral and floating-point types are
standard conversions, in both directions, which means they are implicit
conversions.  That was covered by preexisting wording in the standard, so P1467
doesn’t talk about those conversions.  There isn’t a good way for the standard
to clearly specify which conversions are lossless and which are potentially
lossy, so we didn’t try to limit int/float conversions involving extended
floating-point types to just the safe conversions.


From: Agner Fog 
Sent: Tuesday, October 24, 2023 10:26 PM
To: David Olsen ; gri...@griwes.info
Subject: Re: Problem with P1467R4 C++ std. proposal



Thank you for a clear answer.

I don't see any mentioning of implicit conversion from integer to extended
floating point types. Is that allowed?

gcc 13.1 gives no warning for implicit conversion from integer to float16:


_Float16 A = 3; // no warning
_Float16 A = 3.; // warning


Is that correct?

- Agner



On 24/10/2023 19.29, David Olsen wrote:

The final version of P1467 is R9,
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html .  The
GCC 13 release notes contain a link to that version.  Where do you see the link
to R4?

The issue of initialization of extended floating-point type variables was
raised and discussed during the standardization process.  R9 contains a long
discussion of the issue, with some of the ways that we tried to fix the problem
of initialization. 
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html#implicit-constant
 After some back and forth, the consensus was to leave the implicit conversion
rules unchanged for initialization, because the potential solutions have their
own problems.  So all of

_Float16 A = 1.5;
_Float16 B(1.5);
_Float16 C{1.5};

are ill-formed and should result in a compiler diagnostic.  I am pleased
that GCC reports only a warning and not a hard error, since the user's intent
is obvious. 

Yes, the example in section 5.6.1 is wrong.  The mistake is still there in
R9 of the paper.  It should be

float f32 = 1.0f;
std::float16_t f16 = 2.0f16;
std::bfloat16_t b16 = 3.0bf16; 

On the more general issue of how much the new extended floating-point types
should behave like the existing standard floating-point types, there was a long
and useful discussion about this topic at a committee meeting in Feb 2020. 
There is general agreement that many of the defaults in C++ are wrong and can
make it easier to write badly behaving code.  Whenever new features are added,
there is tension between the consistency of having them behave like existing
features and the safety of choosing different defaults that makes it easier to
write correct code.  These competing goals and the consequences and tradeoffs
of both of them were explicitly laid out and discussed at the Feb 2020 meeting,
and at the end of the discussion there was strong consensus (though not
unanimity) to go with safety over consistency for implicit conversions of
extended floating-point types. 

int16_t is in the std namespace in C++.  For C compatibility it is also
available at global scope if you include  (defined by the C standard)
instead of  (defined by the C++ standard).  The C++ standard doesn't
define any names at global scope other than 'operator new' and 'operator
delete'.  Defining float16_t to be a global scope would have been a huge
departure from existing practice.


-Original Message-

From: Agner Fog 
Sent: Tuesday, October 24, 2023 8:03 AM
To: David Olsen ; gri...@griwes.info
Subject: Problem with P1467R4 C++ std. proposal


Dear David Olsen and Michał Dominiak 

I don't know who to contact regarding C++ standard development, so I am
taking the liberty to contact you as the authors of P1467R4,

   
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1467r4.html#implicit

It is unclear whether the rules for implicit conversion relate to
initialization and assignment.

gcc version 13.1 gives warnings for the following cases with reference to
the above document:

_Float16 A = 1.5;
_Float16 B(1.5);
_Float16 C{1.5}; 

The last one should probably not have a warning, the other ones are
unclear. Initialization with an integer constant gives no warning message.

The example

[Bug c++/111897] Initialization of _Float16 with f.p. constant gives false warning

2023-10-20 Thread agner at agner dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111897

--- Comment #2 from Agner Fog  ---
Thank you Jonathan.

The problem is that the C++ standard is becoming so complicated that nobody can
master it, not even the persons who wrote the example in the proposal.

`_Float16 A{1.0};` gives a warning, which apparently is wrong.
`_Float16 A = 1;` gives no warning.
`_Float16 A = 1.5f16;` gives no warning, but I am not sure the f16 suffix is
supported by all compilers

[Bug c++/111897] New: Initialization of _Float16 with f.p. constant gives false warning

2023-10-20 Thread agner at agner dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111897

Bug ID: 111897
   Summary: Initialization of _Float16 with f.p. constant gives
false warning
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: agner at agner dot org
  Target Milestone: ---

Initializing a _Float16 gives false warning. Example:

  _Float16 A = 1.0;

This gives the "warning: converting to ‘_Float16’ from ‘double’ with greater
conversion rank", with a link to 
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1467r4.html#implicit

However, this link says that implicit conversion is allowed in initialization
with a constant. See section 5.7.3 and the example in 5.6.1 in the linked
document.

[Bug middle-end/108920] Condition falsely optimized out

2023-02-25 Thread agner at agner dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108920

Agner Fog  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Agner Fog  ---
I am not sure I have identified the problem correctly, but there is no need to
spend more time on it since the problem disappears with version 9.4.0.

You may close this issue.

[Bug middle-end/108920] Condition falsely optimized out

2023-02-24 Thread agner at agner dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108920

--- Comment #3 from Agner Fog  ---
It seems to work with gcc 9.4.0.
Thank you

[Bug c++/108920] New: Condition falsely optimized out

2023-02-24 Thread agner at agner dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108920

Bug ID: 108920
   Summary: Condition falsely optimized out
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: agner at agner dot org
  Target Milestone: ---

Created attachment 54526
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54526=edit
code to reproduce error

The attached file test.cpp gives wrong code when optimized with -O2 or higher.
To reproduce error, do:

g++ -O2 -m64 -S -o t1.s test.cpp
g++ -O2 -m64 -S -DFIX -o t2.s test.cpp


The condition in line 104 in test.cpp is optimized away in t1.s

The workaround on line 73 is preventing this false optimization with -DFIX to
generate correct code in t2.s
See t2.s line 252-255

[Bug target/89597] Inconsistent vector calling convention on windows with Clang and MSVC

2019-12-15 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89597

Agner Fog  changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #1 from Agner Fog  ---
I can confirm this. 

When compiling for a Win64 target, gcc version 9.2.0 (and earlier) returns
128-bit intrinsic vectors in XMM0, while 256-bit and 512-bit intrinsic vectors
are returned through a pointer. Clang, MS and Intel compilers return all these
vectors in registers.

The Microsoft Windows documentation for x64 calling convention says:

"Non-scalar types including floats, doubles, and vector types such as __m128,
__m128i, __m128d are returned in XMM0."
(https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019#return-values)

Obviously, this document needs to be updated, but the only logical
interpretation is that the wording "vector types such as __m128" includes
larger intrinsic vectors, which must necessarily be returned in YMM0 or ZMM0.

Test case:
```
__m128 square_x (__m128 x) {
return _mm_mul_ps( x , x);
}

__m256 square_y (__m256 y) {
return _mm256_mul_ps( y , y);
}

__m512 square_z (__m512 z) {
return _mm512_mul_ps( z , z);
}
```

Disassembly (Intel syntax):
```
_Z8square_xDv4_f:; Function begin
vmovaps xmm0, oword [rcx]
vmulps  xmm0, xmm0, xmm0 
ret  
; _Z8square_xDv4_f End of function


_Z8square_yDv8_f:; Function begin
vmovaps ymm0, yword [rdx]
vmulps  ymm0, ymm0, ymm0 
mov rax, rcx 
vmovaps yword [rcx], ymm0
vzeroupper   
ret  
; _Z8square_yDv8_f End of function


_Z8square_zDv16_f:; Function begin
vmovaps zmm0, zword [rdx]
vmulps  zmm0, zmm0, zmm0 
mov rax, rcx 
vmovaps zword [rcx], zmm0
vzeroupper   
ret  
; _Z8square_zDv16_f End of function

```

Same, compiled with Clang, MS or Intel compilers:

```
_Z8square_yDv8_f:; Function begin
vmovaps ymm0, yword [rcx]
vmulps  ymm0, ymm0, ymm0 
ret  
; _Z8square_yDv8_f End of function


_Z8square_zDv16_f:; Function begin
vmovaps zmm0, zword [rcx]
vmulps  zmm0, zmm0, zmm0 
ret  
; _Z8square_zDv16_f End of function
```

... And while we are at it: It would be nice if you could support __vectorcall
for win64 targets (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89485)

[Bug target/89485] Support vectorcall calling convention on windows

2019-08-07 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89485

Agner Fog  changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #1 from Agner Fog  ---
I can confirm that both Clang, MS, and Intel compilers transfer vectors in
registers for function parameters and function return in 64 bit Windows when
__vectorcall is specified. There is still 32 or 40 bytes of superfluous shadow
space allocated on the stack.

Clang adds @@ to the mangled function name.


Please support __vectorcall in Gcc as well.

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2019-08-07 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

Agner Fog  changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #2 from Agner Fog  ---
Clang does this. Gcc should do the same:

_Z3fooDv16_f:   # @_Z3fooDv16_f
.cfi_startproc
# %bb.0:
vaddps  .LCPI1_0(%rip){1to16}, %zmm0, %zmm0
retq

[Bug target/83250] _mm256_zextsi128_si256 missing for AVX2 zero extension

2019-06-21 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83250

Agner Fog  changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #1 from Agner Fog  ---
I can confirm this bug.

_mm256_zextsi128_si256 and several similar intrinsic functions are supported by
Clang and MS compilers, but not by Gcc.

Test case:

   #include  

   __m256i zero_upper_part(__m256i a) {
   return _mm256_zextsi128_si256(_mm256_castsi256_si128(a));
   }

Result:
test.cpp: In function '__m256i zero_upper_part(__m256i)':
test.cpp:6:12: error: '_mm256_zextsi128_si256' was not declared in this scope
 return _mm256_zextsi128_si256(_mm256_castsi256_si128(a));
^~
test.cpp:6:12: note: suggested alternative: '_mm256_castsi128_si256'


The suggested alternative is *dangerous*: The upper part of the ymm register is
undefined after _mm256_castsi128_si256, while it is zero after
_mm256_zextsi128_si256.

_mm256_castsi128_si256 works most of the time, but sometimes a compiler will
optimize away the undefined upper part so that it no longer zero. This can give
some nasty bugs.

[Bug c++/89325] [7/8/9/10 Regression] False warnings about "optimization attribute" on operators when -fno-ipa-cp-clone

2019-04-30 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89325

Agner Fog  changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #5 from Agner Fog  ---
I have the same problem.
Minimal test case:

#include 

struct Test {
float f;
};

Test round(Test const & a) __attribute__
((optimize("-fno-unsafe-math-optimizations")));
Test round(Test const & a) {return a;}

[Bug target/65782] Assembly failure (invalid register for .seh_savexmm) with -O3 -mavx512f on mingw-w64

2019-04-29 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65782

Agner Fog  changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #6 from Agner Fog  ---
I get the same error with G++ 7.4.0 Cygwin when compiling with option
-mavx512vl -m64. 

A workaround is to use -fno-asynchronous-unwind-tables

Register xmm16-31 should be considered clobbered in Win64. See
https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin

[Bug target/41084] Filling xmm register with all bit set is not optimized

2018-05-15 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084

Agner Fog  changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #1 from Agner Fog  ---
What is the status of this bug? It doesn't seem to have been fixed.

[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)

2014-09-25 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253

--- Comment #13 from Agner Fog agner at agner dot org ---
Thank you. I agree that integer overflow should be well-defined when using
intrinsics.

Is it possible to do the same optimization with boolean vector intrinsics, such
as _mm_and_epi32 and _mm_or_ps to enable optimizations such as algebraic
reduction and constant propagation?

[Bug target/63351] Optimization: contract broadcast intrinsics when AVX512 is enabled

2014-09-24 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351

--- Comment #2 from Agner Fog agner at agner dot org ---
AVX512 allows all _memory_ source operands to broadcast from a scalar on almost
all vector instructions for 128-, 256- and 512-bit vectors with 32- or 64-bit
elements. See section 4.6.1 in Intel® Architecture Instruction Set Extensions
Programming Reference
https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf

This feature comes for free; there is no performance cost to broadcasting other
than making the instruction prefix longer for vector sizes smaller than 512.

This feature has no explicit support in intrinsic functions, so the only way to
utilize this excellent optimization opportunity without using assembly is to
contract broadcast intrinsics with subsequent instructions.

An obvious application is to store scalar constants as 32 or 64 bit constants
rather than as full vectors.

Often, it is not known to the programmer whether a variable is stored in memory
or in a register. If a scalar variable is already in a register then it is
better to use a broadcast instruction. If the scalar variable is in memory then
it is better to contract the broadcast into the vector instruction that uses
it, even if the broadcasted value is used multiple times.

[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)

2014-09-23 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253

Agner Fog agner at agner dot org changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #8 from Agner Fog agner at agner dot org ---
The same problem applies to other kinds of optimizations, such as algebraic
reductions and constant propagation. 

The method of using operators such as * and + is not portable to other
compilers, and it doesn't work with integer vectors for other integer sizes
than 64-bits. (I know that there is no integer FMA on Intel CPUs, but I am also
talking about other optimizations).

Here are some other examples of optimizations I would like gcc to do:

#include x86intrin.h

void dummy2(__m128 a, __m128 b);
void dummyi2(__m128i a, __m128i b);

void commutative(__m128 a, __m128 b) {
// expect reduce a+b = b+a. This is the only reduction that actually works!
dummy2(_mm_add_ps(a,b), _mm_add_ps(b,a));
}

void associative(__m128i a, __m128i b, __m128i c) {
// expect reduce (a+b)+c = a+(b+c)
dummy2i(_mm_add_epi32(_mm_add_epi32(a,b),c),
_mm_add_epi32(a,_mm_add_epi32(b,c)));
}

void distributive(__m128i a, __m128i b, __m128i c) {
// expect reduce a*b+a*c = a*(b+c)
dummy2i(_mm_add_epi32(_mm_mul_epi32(a,b),_mm_mul_epi32(a,c)),
_mm_mul_epi32(a,_mm_add_epi32(b,c)));
}

void constant_propagation() {
// expect store c and d as precalculated constants
__m128i a = _mm_setr_epi32(1,2,3,4);
__m128i b = _mm_set1_epi32(5);
__m128i c = _mm_add_epi32(a,b);
__m128i d = _mm_mul_epi32(a,b);
dummyi2(c,d);
}

Of course, the same applies to 256-bit and 512-bit vectors.

[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)

2014-09-23 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253

--- Comment #9 from Agner Fog agner at agner dot org ---
Many programmers are using a vector class library rather than writing intrinsic
functions directly. Such libraries have overloaded operators which are inlined
to produce intrinsic functions. Therefore, we cannot expect programmers to make
optimizations like FMA contraction, algebraic reduction, constant propagation,
etc. manually.

I don't know if this more general discussion of optimizations on code with
intrinsics fit into this bug or they need to be discussed elsewhere?

[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)

2014-09-23 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253

--- Comment #11 from Agner Fog agner at agner dot org ---
Thanks for the links Marc. 
You are right, the discussion in the gcc-patches mailing list ignores integer
vectors. You need a solution that also allows optimizations on integer
intrinsic functions (perhaps cast the vector type?). I am not on any internal
mailing list, so please post it there for me.

The proposed solution of using vector extensions will not work on masked vector
intrinsics in AVX512, so it wouldn't enable e.g. constant propagation through a
masked intrinsic, but that is probably too much to ask for :) I will add a new
bug report for contraction of broadcast with AVX512.

[Bug c/63351] New: Optimization: contract broadcast intrinsics when AVX512 is enabled

2014-09-23 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351

Bug ID: 63351
   Summary: Optimization: contract broadcast intrinsics when
AVX512 is enabled
   Product: gcc
   Version: 4.9.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: agner at agner dot org

The AVX512 instruction set allows instructions with broadcast, but there are no
corresponding intrinsic functions. The programmer has to write a broadcast
intrinsic followed by some other intrinsic and rely on the compiler to contract
this into a single instruction.

I would expect the optimizer to contract a broadcast intrinsic with any
subsequent intrinsic into a single instruction. For example:

// gcc -Ofast -mavx512f

#include x86intrin.h

void dummyz(__m512i a, __m512i b);

void broadcastz(__m512i a, int b) {
// expect reduction to instruction with broadcast,
// something like: vpaddd b, %zmm0, %zmm3 {1to16}
__m512i bb = _mm512_set1_epi32(b);
__m512i ab = _mm512_add_epi32(a,bb);
__m512i cc = _mm512_set1_epi32(5);
__m512i ac = _mm512_add_epi32(a,cc);
dummyz(ab, ac);
}


This should actually be possible for smaller vector sizes as well when AVX512
is enabled:

void dummyx(__m128 a, __m128 b);

void broadcastx(__m128 a, float b) {
// broadcasting should even be possible with smaller vectors
__m128 bb = _mm_set1_ps(b);
__m128 ab = _mm_add_ps(a,bb);
__m128 cc = _mm_set1_ps(5.0);
__m128 ac = _mm_add_ps(a,cc);
dummyx(ab, ac);
}

[Bug c/61878] New: Missing intrinsic functions in avx512intrin.h

2014-07-22 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61878

Bug ID: 61878
   Summary: Missing intrinsic functions in avx512intrin.h
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: agner at agner dot org

A few compare functions are missing in avx512intrin.h, e.g.
_mm512_cmpgt_epu32_mask and _mm512_cmpgt_epu64_mask

All intrinsic functions for typecasting are also missing. Please add these
functions, as they are indispensable.

See https://software.intel.com/en-us/node/513903 and
https://software.intel.com/sites/landingpage/IntrinsicsGuide/ for documentation
of these functions.

Definitions copied from Intel's zmmintrin.h:

/* Conversion from one type to another, no change in value. */

extern __m512  __ICL_INTRINCC _mm512_castpd_ps(__m512d);
extern __m512i __ICL_INTRINCC _mm512_castpd_si512(__m512d);
extern __m512d __ICL_INTRINCC _mm512_castps_pd(__m512);
extern __m512i __ICL_INTRINCC _mm512_castps_si512(__m512);
extern __m512  __ICL_INTRINCC _mm512_castsi512_ps(__m512i);
extern __m512d __ICL_INTRINCC _mm512_castsi512_pd(__m512i);

* Casts from a larger type to a smaller type.
 */
extern __m128d __ICL_INTRINCC _mm512_castpd512_pd128(__m512d);
extern __m128  __ICL_INTRINCC _mm512_castps512_ps128(__m512);
extern __m128i __ICL_INTRINCC _mm512_castsi512_si128(__m512i);
extern __m256d __ICL_INTRINCC _mm512_castpd512_pd256(__m512d);
extern __m256  __ICL_INTRINCC _mm512_castps512_ps256(__m512);
extern __m256i __ICL_INTRINCC _mm512_castsi512_si256(__m512i);

/*
 * Casts from a smaller type to a larger type.
 * Upper elements of the result are undefined.
 */
extern __m512d __ICL_INTRINCC _mm512_castpd128_pd512(__m128d);
extern __m512  __ICL_INTRINCC _mm512_castps128_ps512(__m128);
extern __m512i __ICL_INTRINCC _mm512_castsi128_si512(__m128i);
extern __m512d __ICL_INTRINCC _mm512_castpd256_pd512(__m256d);
extern __m512  __ICL_INTRINCC _mm512_castps256_ps512(__m256);
extern __m512i __ICL_INTRINCC _mm512_castsi256_si512(__m256i);

[Bug c/61855] New: _MM_MANTISSA_NORM_ENUM in avx512intrin.h disabled when optimization off

2014-07-20 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61855

Bug ID: 61855
   Summary: _MM_MANTISSA_NORM_ENUM in avx512intrin.h disabled when
optimization off
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: agner at agner dot org

Created attachment 33159
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33159action=edit
test code to replicate bug

Definitions  _MM_MANTISSA_NORM_ENUM and _MM_MANTISSA_SIGN_ENUM in
avx512intrin.h are disabled when optimization is off.

To replicate error, compile attached file with 
gcc -c -mavx512f -O0 bug2.c

Workaround: gcc -c -mavx512f -O1 bug2.c

[Bug c++/61794] New: internal error: unrecognizable insn, from avx512 extract instruction

2014-07-13 Thread agner at agner dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61794

Bug ID: 61794
   Summary: internal error: unrecognizable insn, from avx512
extract instruction
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: agner at agner dot org

Created attachment 33117
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33117action=edit
c++ file producing error

The attached file bug1.cpp generates internal error when compiling:
g++ -mavx512f bug1.cpp

g++ version: 4.9 and 4.10.0 20140706
binutils version: 2.24
Ubuntu version: 12.04.2 LTS


Error message:
==
a@a-desktop:~/avx512$ g++ -mavx512f bug1.cpp
bug1.cpp: In member function ‘int32_t Vec16i::extract(uint32_t) const’:
bug1.cpp:59:5: error: unrecognizable insn:
 }
 ^
(insn 29 28 30 8 (set (reg:V4SI 89 [ D.12727 ])
(vec_merge:V4SI (vec_select:V4SI (reg:V16SI 88 [ D.12726 ])
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
]))
(reg:V4SI 86 [ D.12724 ])
(reg:QI 113))) bug1.cpp:38 -1
 (nil))
bug1.cpp:59:5: internal compiler error: in extract_insn, at recog.c:2204
0xb25c68 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../.././gcc/rtl-error.c:109
0xb25c99 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././gcc/rtl-error.c:117
0xaf609e extract_insn(rtx_def*)
../.././gcc/recog.c:2204
0x980803 instantiate_virtual_regs_in_insn
../.././gcc/function.c:1561
0x980803 instantiate_virtual_regs
../.././gcc/function.c:1932
0x980803 execute
../.././gcc/function.c:1983
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.

[Bug c/53071] New: Wrong instruction replacement when compiling for xop

2012-04-22 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53071

 Bug #: 53071
   Summary: Wrong instruction replacement when compiling for xop
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: critical
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ag...@agner.org


Created attachment 27216
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27216
source code and asm output

An intrinsic vector multiply function is replaced by xop instructions when the
attached file is compiled with -mxop. Perhaps the compiler is trying to combine
a shift instruction and a multiply instruction into a single xop instruction,
but it ends up with something wrong.

This is perhaps related to bugs # 52908 and 52910.

[Bug target/52910] xop-mul-1:f13 miscompiled on bulldozer (-mxop)

2012-04-22 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52910

Agner Fog agner at agner dot org changed:

   What|Removed |Added

 CC||agner at agner dot org

--- Comment #1 from Agner Fog agner at agner dot org 2012-04-22 14:35:35 UTC 
---
Confirm: I have seen a similar bug (gcc 4.7.0)

[Bug target/52932] AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type

2012-04-13 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932

--- Comment #10 from Agner Fog agner at agner dot org 2012-04-13 16:50:33 UTC 
---
_mm256_permutevar8x32_epi32 has the operands in wrong order. They need 
to be swapped. Did you fix this too?


On 12-04-2012 20:37, uros at gcc dot gnu.org wrote:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932

 --- Comment #9 from uros at gcc dot gnu.org 2012-04-12 18:37:47 UTC ---
 Author: uros
 Date: Thu Apr 12 18:37:42 2012
 New Revision: 186388

 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=186388
 Log:
  PR target/52932
  * config/i386/avx2intrin.h (_mm256_permutevar8x32_ps): Change second
  argument type to __m256i.  Update call to __builtin_ia32_permvarsf256.
  * config/i386/sse.md (UNSPEC_VPERMVAR): New.
  (UNSPEC_VPERMSI, UNSPEC_VPERMSF): Remove.
  (avx2_permvarv8sf, avx2_permvarv8si): Switch operands 1 and 2.
  (avx2_permvarmode): Macroize insn from avx2_permvarv8sf and
  avx2_permvarv8si using VI4F_256 mode iterator.
  * config/i386/i386.c (bdesc_args)__builtin_ia32_permvarsf256:
  Update builtin type to V8SF_FTYPE_V8SF_V8SI.
  (ix86_expand_vec_perm): Update calls to gen_avx2_permvarv8si and
  gen_avx2_permvarv8sf.
  (expand_vec_perm_pshufb): Ditto.

 testsuite/ChangeLog:

  PR target/52932
  * gcc.target/i386/avx2-vpermps-1.c (avx2_test): Use __m256i type for
  second function argument.
  * gcc.target/i386/avx2-vpermps-2.c (init_permps): Update declaration.
  (calc_permps): Update declaration.  Calculate result correctly.
  (avx2_test): Change src2 type to union256i_d.
  * gcc.target/i386/avx2-vpermd-2.c (calc_permd): Calculate result
  correctly.


 Modified:
  trunk/gcc/ChangeLog
  trunk/gcc/config/i386/avx2intrin.h
  trunk/gcc/config/i386/i386.c
  trunk/gcc/config/i386/sse.md
  trunk/gcc/testsuite/ChangeLog
  trunk/gcc/testsuite/gcc.target/i386/avx2-vpermd-2.c
  trunk/gcc/testsuite/gcc.target/i386/avx2-vpermps-1.c
  trunk/gcc/testsuite/gcc.target/i386/avx2-vpermps-2.c

[Bug c/52932] New: AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type

2012-04-11 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932

 Bug #: 52932
   Summary: AVX2 intrinsic _mm256_permutevar8x32_ps has wrong
parameter type
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ag...@agner.org


The intrinsic _mm256_permutevar8x32_ps in avx2intrin.h has the second parameter
of type __m256, the correct type is __m256i.

The Intel programming reference has the type __m256, which is wrong. This error
may have propagated into gnu avx2intrin.h.

The correct type is specified by Intel at
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/intref_cls/common/intref_avx2_permutevar8x32_ps.htm

Both Intel and Microsoft immintrin.h files have the type __m256i, which appears
to be the only logically correct choice.

Excerpt from Intel version of immintrin.h:
extern __m256  __cdecl _mm256_permutevar8x32_ps(__m256, __m256i);

For the sake of compatibility with other compilers, and for logical reasons, I
would prefer __m26i.

The function works after type-casting the parameter.

[Bug c/49820] Explicit check for integer negative after abs optimized away

2011-07-27 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820

--- Comment #15 from Agner Fog agner at agner dot org 2011-07-27 14:27:33 UTC 
---
How do you define clever things? Checking that a variable is within the
allowed range is certainly a standard thing that every SW teacher tells you to
do. I think it is reasonable to expect -Wall to do what it says and set a very
high warning level. Optimizing away an overflow check is such a dangerous thing
to do that it requires a warning.

I think it may be wise to distinguish between optimizing away a whole branch or
loop, versus just making calculations more efficient, e.g. simplifying
expressions or making induction variables. If a branch can be optimized away
then it is either violating the intentions of the programmer or the program has
a logical error. A warning would be in place in either case.

What I am suggesting is that optimizing away a branch should give a warning at
a lower level than simplifying an arithmetic expression. I know this might be
somewhat complicated to implement, but it would be useful for catching the
situation where an overflow check is optimized away.

Checking for overflow in a safe way is so complicated and tedious that it is
practically never done (see
https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow
)

Sorry for being persistent, but I think this issue has serious security
implications.

[Bug c/49820] Explicit check for integer negative after abs optimized away

2011-07-26 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820

--- Comment #13 from Agner Fog agner at agner dot org 2011-07-26 19:31:48 UTC 
---
My example does indeed give a warning when compiled with -Wstrict-overflow=2.
Unfortunately, -Wall implies only -Wstrict-overflow=1 so I got no warning in
the first place. I think the warning levels need to be adjusted so that we get
the warning with -Wall because the consequences are no less serious than
ignoring an overflow check with if(a+consta), which gives a warning with
-Wstrict-overflow=1

[Bug c/49820] Explicit check for integer negative after abs optimized away

2011-07-25 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820

--- Comment #10 from Agner Fog agner at agner dot org 2011-07-25 07:43:58 UTC 
---
I still think that a compiler should be predictable and consistent. It is
inconsistent that  a+5a = false  produces a warning, while  abs(a)0 = false
does not. Both expressions could be intended overflow checks.

Besides, some compilers produce a warning when a branch condition is always
true or always false. That is sound behavior because it is likely to be a bug.
gcc does not produce a warning when optimizing away something like  if (2+2 !=
4)

[Bug c/49820] Explicit check for integer negative after abs optimized away

2011-07-25 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820

--- Comment #12 from Agner Fog agner at agner dot org 2011-07-25 14:21:52 UTC 
---
No the behavior is not predictable when it sometimes warns about ignoring
overflow, and sometimes not. Please add a warning when it optimizes away an
overflow check after the abs function.

Unsafe optimizations are sometimes good, sometimes causing hard-to-find bugs.
The programmer can't always predict what kind of optimizations the compiler
makes. A warning feature is the best way to enable the programmer to check if
the compiler does the right thing. The programmer can then turn off specific
warnings after verifying that the optimizations are OK.

[Bug c/49820] Explicit check for integer negative after abs optimized away

2011-07-24 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820

--- Comment #8 from Agner Fog agner at agner dot org 2011-07-24 08:16:39 UTC 
---
Thanks for your comments.

Why is the behavior different for signed and unsigned?
The expression (a + 5  a) is reduced to always false when a is signed, but not
when a is unsigned. 

-Wall produces the warning assuming signed overflow does not occur when
assuming that (X + c)  X is always false in the above example, but there is
no warning when it assumes that abs(a)  0 is always false.

I believe that the behavior of a compiler must be predictable. An ordinary
programmer would never predict that the compiler can optimize away an explicit
check for overflow, no matter how many C++ textbooks he has read. If the
compiler can remove a security check without warning then we have a security
issue.

To say that the behavior in case of overflow is undefined is not the same as
denying that overflow can occur. I think we need a sensible compromise that
allows the compiler to e.g. optimize a loop under the assumption that the loop
counter doesn't overflow, but doesn't allow it to optimize away an explicit
overflow check. I know the compiler can't guess the programmers' intentions,
but then we must at least have a warning. Any optimization rule that allows the
compiler to optimize away an overflow check without warning is unacceptable in
my opinion.

[Bug c/49820] New: Explicit check for integer negative after abs optimized away

2011-07-23 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820

   Summary: Explicit check for integer negative after abs
optimized away
   Product: gcc
   Version: 4.5.2
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ag...@agner.org


Created attachment 24812
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24812
Example generating bug

The integer abs function can overflow if the argument is 0x8000. An
intended check for overflow is ignored. The gcc compiler optimizes away a check
for the value  0 after abs with -O2 optimization:
int b;
b = abs(b);
if (b  0)  // check for overflow optimized away

The error occurs when compiling the attached file with -O2, 32 or 64 bit mode,
C or C++. The C/C++ language does not normally need to check for overflow, but
it should acknowledge an explicit check for overflow.

[Bug c/40528] Add a new ifunc attribute

2011-07-08 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40528

--- Comment #16 from Agner Fog agner at agner dot org 2011-07-08 08:52:32 UTC 
---
(In reply to comment #15)
 (In reply to comment #14)
  (In reply to comment #13)
   What is the status of this issue?
  
  It is implemented on ifunc branch.
  
   Is option 3 implemented?
  
  Yes, on ifunc branch.
  
   Which versions of Linux and binutils support IFUNC?
 Still doesn't work. 
 warning: ‘ifunc’ attribute directive ignored
 GNU Binutils for Ubuntu 2.21.0.20110327

The ifunc attribute is described in
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html but it doesn't work
(see my previous comment). After some experimentation I found that the method
described below works. Either the compiler should be fixed or the onlinedocs
should be changed.

// Example of gnu indirect function
#include stdio.h
#include time.h

// Define different versions of my function
int myfunc1() {
   return 1;
}

int myfunc2() {
   return 2;
}

// Type definition for pointer to my function
typedef int (*MyFunctionPointer)(void);

// Prototype for the common entry point 
extern C  // remove this line if not C++
int myfunc();
__asm__ (.type myfunc, @gnu_indirect_function);

// Make the dispatcher function
MyFunctionPointer myfunc_dispatch (void) __asm__ (myfunc);
MyFunctionPointer myfunc_dispatch (void)  {

   if (time(0)  1) {
  // If time is odd at first call, use version 1
  return myfunc1;
   }
   else {
  // else use version 2
  return myfunc2;
   }
}

int main() {
   // Test the call to myfunc
   printf(\nCalled function number %i\n, myfunc());
   return 0;
}

[Bug c/40528] Add a new ifunc attribute

2011-05-30 Thread agner at agner dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40528

--- Comment #15 from Agner Fog agner at agner dot org 2011-05-30 13:13:06 UTC 
---
(In reply to comment #14)
 (In reply to comment #13)
  What is the status of this issue?
 
 It is implemented on ifunc branch.
 
  Is option 3 implemented?
 
 Yes, on ifunc branch.
 
  Which versions of Linux and binutils support IFUNC?
 
 You need at least glibc 2.11 and binutils 2.20.
 
  Any plans for BSD and Mac?
  
 
 You have to ask BSD and Mac people since IFUNC support
 needs to be implemented in both binutils and the C
 library.

Still doesn't work. 
warning: ‘ifunc’ attribute directive ignored
GNU Binutils for Ubuntu 2.21.0.20110327
Where can I find an implementation with ifunc branch?

[Bug c/40528] Add a new ifunc attribute

2010-02-21 Thread agner at agner dot org



--- Comment #13 from agner at agner dot org  2010-02-21 16:21 ---
What is the status of this issue?
Is option 3 implemented?
Which versions of Linux and binutils support IFUNC?
Any plans for BSD and Mac?


-- 

agner at agner dot org changed:

   What|Removed |Added

 CC||agner at agner dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40528

[Bug c++/37880] New: Documentation of option -mcmodel=medium is wrong

2008-10-21 Thread agner at agner dot org

The documentation of option -mcmodel=medium says:
-mcmodel=medium
Generate code for the medium model: The program is linked in the lower 2 GB of
the address space but symbols can be located anywhere in the address space.
Programs can be statically or dynamically linked, but building of shared
libraries are not supported with the medium model.

This is misleading since the compiler still uses 32-bit addresses for data
objects on Linux (and BSD?) targets. The program data are still limited to
addresses  2 GB. Dynamically allocated memory (new or malloc) can probably
exceed the 2GB address limit in both the small and the medium memory model.
Whatever the difference is between small and medium memory models, it is not
covered by the above explanation. On Mac OS X (Darwin) targets, all addresses
are above 4GB by default for both small and medium models. On Windows targets,
a DLL can be loaded at addresses  2 GB though this rarely happens.

Example:
-- code file a.cpp ---
int * mypointer = 0;
int myarray[100] = {0};

int myfunction (int x) {
   mypointer = myarray + 1;
   return myarray[x];
}
-- end of code file a.cpp ---

Command line:
g++ -m64 -mcmodel=medium -S a.cpp

gcc version:
gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)

-- assembly output (excerpt) ---
_Z10myfunctioni:
.LFB2:
pushq   %rbp
.LCFI0:
movq%rsp, %rbp
.LCFI1:
movl%edi, -4(%rbp)
movl$myarray+4, %eax # uses 32-bit zero-extended address
here!
movq%rax, mypointer(%rip)
movl-4(%rbp), %eax
cltq
movlmyarray(,%rax,4), %eax   # uses 32-bit sign-extended address
here!
leave
ret
-- end of assembly output (excerpt) ---

This example shows that the statement symbols can be located anywhere in the
address space is misleading. Static symbols must be located at addresses  2GB
for the above code to work. Does the above statement apply to symbols on the
stack or only to objects allocated with new or malloc? The statement is
definitely wrong for Mac targets, and possibly also for Windows targets.

If a correct description would be too long then there may be a link to more
exact descriptions elsewhere.


-- 
   Summary: Documentation of option -mcmodel=medium is wrong
   Product: gcc
   Version: 4.2.3
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: agner at agner dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37880

[Bug target/13685] Building simple test application with -march=pentium3 -Os gives SIGSEGV (unaligned sse instruction)

2006-09-23 Thread agner at agner dot org



--- Comment #26 from agner at agner dot org  2006-09-23 08:23 ---
Thank you for fixing this, but you need to tell the world which solution you
have chosen. Please see the discussion at the dublicate bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537 for arguments for and against
each possible solution.

You need to specify whether the chosen solution is to enforce 16 byte stack
alignment regardless of -Os option or the solution is to make no assumption
about stack alignment when making XMM code. This is an ABI issue that has to be
standardized and made public. The makers of the Intel compiler are waiting for
a resolution to this issue so that they can make their compiler compatible with
GCC. For the same reason, assembly programmers need to know whether stack
alignment is required or not.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13685

[Bug target/27537] XMM alignment fault when compiling for i386 with -Os

2006-08-23 Thread agner at agner dot org



--- Comment #11 from agner at agner dot org  2006-08-23 08:04 ---
This problem wouldn't have happened if the ABI had been better maintained.
Somebody decides to change the calling convention without properly documenting
the change, and somebody else makes another change that is incompatible because
the alignment requirement has never made it into the ABI documents.

Let me help you making a decision on this issue by summarizing the pro's and
con's of 16-bytes stack alignment in 32-bit x86 Linux/BSD.

Advantages of enforcing 16-bytes stack alignment:
-
1.
The use of XMM code is becoming more common now that all new computers have
support for the SSE2 or higher instructions set. The necessary alignment of XMM
variables can be implemented more efficiently when the stack is aligned.

2.
Variables of type double (double precision floating point) are accessed more
efficiently when aligned by 8. This is easily achieved when the stack is
aligned.

3.
Function parameters of type double will automatically get the optimal
alignment, unless the parameter is preceded by an odd number of smaller
parameters (including any 'this' pointer and return pointer). This means that
more than 50% of function parameters of type double will be optimally aligned,
versus 50% without stack alignment. The C/C++ programmer will be able to ensure
optimal alignment by manipulating the order of function parameters.

4.
Functions that need to align local variables can do so without using EBP as
stack frame. This frees EBP for other purposes. General purpose registers is a
scarce resource in 32-bit mode.

5.
16-bytes stack alignment is officially enforced in Intel-based Mac OS X. It is
desirable to have identical ABI's for Linux, BSD and Mac. This makes it
possible to use the same compilers and the same function libraries for all
three platforms (except for the object file format, which can be converted).

6.
The stack alignment requires no extra instructions in leaf functions, which are
more likely to contain the critical innermost loop than non-leaf functions.

7.
The stack alignment requires no extra instructions in a non-leaf function if
the function adjusts the stack pointer anyway for the sake of allocating local
storage.

8.
Stack alignment is already implemented in Gcc and existing code relies on it.


Disadvantages of enforcing 16-bytes stack alignment:

1.
A non-leaf function without any stack space allocated for local storage needs
one or two extra instructions for conforming to the stack alignment
requirement.

2.
The alignment requirement results in unused space in the stack. This takes up
to 12 bytes of extra space in the data cache for each function calling level
except the innermost. Assuming that only the innermost three function levels
matter in terms of speed, and that the number of unused bytes is 8 on average
for all but the innermost function, the total amount of space wasted in the
data cache is 16 bytes.

3.
The Intel compiler does not enforce stack alignment. However, the Intel people
are ready to change this as soon as you Gnu people make a decision on this
issue. I have contact with the Intel people about this issue.

4.
Stack alignment is not enforced in 32-bit Windows. Compatibility with the
Windows ABI might be desirable.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537

[Bug target/27537] XMM alignment fault when compiling for i386 with -Os

2006-08-03 Thread agner at agner dot org



--- Comment #8 from agner at agner dot org  2006-08-03 20:20 ---
hjl wrote:
Apparently, it was done on purpose

Yes, the -Os non-alignment was obviously done on purpose. The problem is that
other modules that may be called from the -Os module rely on the stack being
aligned by 16. The wrong alignment makes the program crash whem xmm registers
are used. The alignment must be strictly enforced in the ABI if any function
relies on it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537

[Bug target/27537] XMM alignment fault when compiling for i386 with -Os

2006-06-08 Thread agner at agner dot org



--- Comment #6 from agner at agner dot org  2006-06-08 06:27 ---
Comment #5 From hjl confirms my point: The error can occur in an optimized part
of the program that uses XMM registers when some other, noncritical, part of
the program is compiled with -Os

We need a comment from the ABI people about which solution to choose because
the Intel compiler people are working on a fix to make the two compilers
compatible.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537

[Bug target/27537] XMM alignment fault when compiling for i386 with -Os

2006-05-11 Thread agner at agner dot org



--- Comment #4 from agner at agner dot org  2006-05-11 07:11 ---
Thanks for confirming this bug. If Gcc relies on the stack being aligned then
it has to be an official ABI requirement. 

It makes perfectly sense to compile the whole program, or some of it, with -Os
and also use XMM. -Os can be the optimal option if code cache or data cache is
a critical resource. It is also a perfectly justifiable solution to compile the
part of the program that contains the innermost loop with -O3 and the rest of
the program with -Os. The error also occurs if part of the program is compiled
with the Intel C++ compiler, because the Intel people follow the official ABI
which hasn't been updated for many years. The Intel compiler is intended to be
100% binary compatible with Gnu.

Gcc is no longer a hobby project for a limited group of nerds. It is one of the
most used compilers in the world and it is used for critical applications.
Therefore, you have to be strict about ABI standards. Either the ABI must be
changed and made public, or the compiler must be changed so that it doesn't
rely on the stack being aligned by 16.

I can find the SYSTEM V. APPLICATION BINARY INTERFACE. Intel386 Architecture
Processor Supplement at www.caldera.com. It says DRAFT COPY, March 19, 1997.
Nothing indicates that this is the official or the latest version. It says
nothing about MMX or XMM. I have documented the things that are not clear from
the ABI in http://www.agner.org/assem/calling_conventions.pdf as good as I can.
I am going to change this document when this issue is resolved.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537

[Bug c++/27537] New: XMM alignment fault when compiling for i386 with -Os. Needs ABI specification.

2006-05-10 Thread agner at agner dot org

/gcc/x86_64-redhat-linux/4.1.0/32/crtend.o
/usr/lib/gcc/x86_64-redhat-linux/4.1.0/../../../../lib/crtn.o
[EMAIL PROTECTED] t]# ./a.out
Segmentation fault
[EMAIL PROTECTED] t]#


-- 
   Summary: XMM alignment fault when compiling for i386 with -Os.
Needs ABI specification.
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: critical
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: agner at agner dot org
  GCC host triplet: x64
GCC target triplet: ia32


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537

42 matches

Mail list logo