[Bug c++/111897] Initialization of _Float16 with f.p. constant gives false warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111897 --- Comment #3 from Agner Fog --- I have asked the authors of the linked document. They say that the example in the document is wrong. The latest version still has the error in the example: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html The compiler should give warnings for all of these: _Float16 A = 1.5; _Float16 B(1.5); _Float16 C{1.5}; but no warning for integers: _Float16 D = 2; It is fine to give a warning rather than an error message when the intention is obvious and there is no ambiguity. Below is my conversation with David Olsen: That’s correct. Conversions between integral and floating-point types are standard conversions, in both directions, which means they are implicit conversions. That was covered by preexisting wording in the standard, so P1467 doesn’t talk about those conversions. There isn’t a good way for the standard to clearly specify which conversions are lossless and which are potentially lossy, so we didn’t try to limit int/float conversions involving extended floating-point types to just the safe conversions. From: Agner Fog Sent: Tuesday, October 24, 2023 10:26 PM To: David Olsen ; gri...@griwes.info Subject: Re: Problem with P1467R4 C++ std. proposal Thank you for a clear answer. I don't see any mentioning of implicit conversion from integer to extended floating point types. Is that allowed? gcc 13.1 gives no warning for implicit conversion from integer to float16: _Float16 A = 3; // no warning _Float16 A = 3.; // warning Is that correct? - Agner On 24/10/2023 19.29, David Olsen wrote: The final version of P1467 is R9, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html . The GCC 13 release notes contain a link to that version. Where do you see the link to R4? The issue of initialization of extended floating-point type variables was raised and discussed during the standardization process. R9 contains a long discussion of the issue, with some of the ways that we tried to fix the problem of initialization. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html#implicit-constant After some back and forth, the consensus was to leave the implicit conversion rules unchanged for initialization, because the potential solutions have their own problems. So all of _Float16 A = 1.5; _Float16 B(1.5); _Float16 C{1.5}; are ill-formed and should result in a compiler diagnostic. I am pleased that GCC reports only a warning and not a hard error, since the user's intent is obvious. Yes, the example in section 5.6.1 is wrong. The mistake is still there in R9 of the paper. It should be float f32 = 1.0f; std::float16_t f16 = 2.0f16; std::bfloat16_t b16 = 3.0bf16; On the more general issue of how much the new extended floating-point types should behave like the existing standard floating-point types, there was a long and useful discussion about this topic at a committee meeting in Feb 2020. There is general agreement that many of the defaults in C++ are wrong and can make it easier to write badly behaving code. Whenever new features are added, there is tension between the consistency of having them behave like existing features and the safety of choosing different defaults that makes it easier to write correct code. These competing goals and the consequences and tradeoffs of both of them were explicitly laid out and discussed at the Feb 2020 meeting, and at the end of the discussion there was strong consensus (though not unanimity) to go with safety over consistency for implicit conversions of extended floating-point types. int16_t is in the std namespace in C++. For C compatibility it is also available at global scope if you include (defined by the C standard) instead of (defined by the C++ standard). The C++ standard doesn't define any names at global scope other than 'operator new' and 'operator delete'. Defining float16_t to be a global scope would have been a huge departure from existing practice. -Original Message- From: Agner Fog Sent: Tuesday, October 24, 2023 8:03 AM To: David Olsen ; gri...@griwes.info Subject: Problem with P1467R4 C++ std. proposal Dear David Olsen and Michał Dominiak I don't know who to contact regarding C++ standard development, so I am taking the liberty to contact you as the authors of P1467R4, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1467r4.html#implicit It is unclear whether the rules for implicit conversion relate to initialization and assignment. gcc version 13.1 gives warnings for the following cases with reference to the above document: _Float16 A = 1.5; _Float16 B(1.5); _Float16 C{1.5}; The last one should probably not have a warning, the other ones are unclear. Initialization with an integer constant gives no warning message. The example
[Bug c++/111897] Initialization of _Float16 with f.p. constant gives false warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111897 --- Comment #2 from Agner Fog --- Thank you Jonathan. The problem is that the C++ standard is becoming so complicated that nobody can master it, not even the persons who wrote the example in the proposal. `_Float16 A{1.0};` gives a warning, which apparently is wrong. `_Float16 A = 1;` gives no warning. `_Float16 A = 1.5f16;` gives no warning, but I am not sure the f16 suffix is supported by all compilers
[Bug c++/111897] New: Initialization of _Float16 with f.p. constant gives false warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111897 Bug ID: 111897 Summary: Initialization of _Float16 with f.p. constant gives false warning Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: agner at agner dot org Target Milestone: --- Initializing a _Float16 gives false warning. Example: _Float16 A = 1.0; This gives the "warning: converting to ‘_Float16’ from ‘double’ with greater conversion rank", with a link to https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1467r4.html#implicit However, this link says that implicit conversion is allowed in initialization with a constant. See section 5.7.3 and the example in 5.6.1 in the linked document.
[Bug middle-end/108920] Condition falsely optimized out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108920 Agner Fog changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #4 from Agner Fog --- I am not sure I have identified the problem correctly, but there is no need to spend more time on it since the problem disappears with version 9.4.0. You may close this issue.
[Bug middle-end/108920] Condition falsely optimized out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108920 --- Comment #3 from Agner Fog --- It seems to work with gcc 9.4.0. Thank you
[Bug c++/108920] New: Condition falsely optimized out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108920 Bug ID: 108920 Summary: Condition falsely optimized out Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: agner at agner dot org Target Milestone: --- Created attachment 54526 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54526=edit code to reproduce error The attached file test.cpp gives wrong code when optimized with -O2 or higher. To reproduce error, do: g++ -O2 -m64 -S -o t1.s test.cpp g++ -O2 -m64 -S -DFIX -o t2.s test.cpp The condition in line 104 in test.cpp is optimized away in t1.s The workaround on line 73 is preventing this false optimization with -DFIX to generate correct code in t2.s See t2.s line 252-255
[Bug target/89597] Inconsistent vector calling convention on windows with Clang and MSVC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89597 Agner Fog changed: What|Removed |Added CC||agner at agner dot org --- Comment #1 from Agner Fog --- I can confirm this. When compiling for a Win64 target, gcc version 9.2.0 (and earlier) returns 128-bit intrinsic vectors in XMM0, while 256-bit and 512-bit intrinsic vectors are returned through a pointer. Clang, MS and Intel compilers return all these vectors in registers. The Microsoft Windows documentation for x64 calling convention says: "Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0." (https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019#return-values) Obviously, this document needs to be updated, but the only logical interpretation is that the wording "vector types such as __m128" includes larger intrinsic vectors, which must necessarily be returned in YMM0 or ZMM0. Test case: ``` __m128 square_x (__m128 x) { return _mm_mul_ps( x , x); } __m256 square_y (__m256 y) { return _mm256_mul_ps( y , y); } __m512 square_z (__m512 z) { return _mm512_mul_ps( z , z); } ``` Disassembly (Intel syntax): ``` _Z8square_xDv4_f:; Function begin vmovaps xmm0, oword [rcx] vmulps xmm0, xmm0, xmm0 ret ; _Z8square_xDv4_f End of function _Z8square_yDv8_f:; Function begin vmovaps ymm0, yword [rdx] vmulps ymm0, ymm0, ymm0 mov rax, rcx vmovaps yword [rcx], ymm0 vzeroupper ret ; _Z8square_yDv8_f End of function _Z8square_zDv16_f:; Function begin vmovaps zmm0, zword [rdx] vmulps zmm0, zmm0, zmm0 mov rax, rcx vmovaps zword [rcx], zmm0 vzeroupper ret ; _Z8square_zDv16_f End of function ``` Same, compiled with Clang, MS or Intel compilers: ``` _Z8square_yDv8_f:; Function begin vmovaps ymm0, yword [rcx] vmulps ymm0, ymm0, ymm0 ret ; _Z8square_yDv8_f End of function _Z8square_zDv16_f:; Function begin vmovaps zmm0, zword [rcx] vmulps zmm0, zmm0, zmm0 ret ; _Z8square_zDv16_f End of function ``` ... And while we are at it: It would be nice if you could support __vectorcall for win64 targets (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89485)
[Bug target/89485] Support vectorcall calling convention on windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89485 Agner Fog changed: What|Removed |Added CC||agner at agner dot org --- Comment #1 from Agner Fog --- I can confirm that both Clang, MS, and Intel compilers transfer vectors in registers for function parameters and function return in 64 bit Windows when __vectorcall is specified. There is still 32 or 40 bytes of superfluous shadow space allocated on the stack. Clang adds @@ to the mangled function name. Please support __vectorcall in Gcc as well.
[Bug target/87767] Missing AVX512 memory broadcast for constant vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 Agner Fog changed: What|Removed |Added CC||agner at agner dot org --- Comment #2 from Agner Fog --- Clang does this. Gcc should do the same: _Z3fooDv16_f: # @_Z3fooDv16_f .cfi_startproc # %bb.0: vaddps .LCPI1_0(%rip){1to16}, %zmm0, %zmm0 retq
[Bug target/83250] _mm256_zextsi128_si256 missing for AVX2 zero extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83250 Agner Fog changed: What|Removed |Added CC||agner at agner dot org --- Comment #1 from Agner Fog --- I can confirm this bug. _mm256_zextsi128_si256 and several similar intrinsic functions are supported by Clang and MS compilers, but not by Gcc. Test case: #include __m256i zero_upper_part(__m256i a) { return _mm256_zextsi128_si256(_mm256_castsi256_si128(a)); } Result: test.cpp: In function '__m256i zero_upper_part(__m256i)': test.cpp:6:12: error: '_mm256_zextsi128_si256' was not declared in this scope return _mm256_zextsi128_si256(_mm256_castsi256_si128(a)); ^~ test.cpp:6:12: note: suggested alternative: '_mm256_castsi128_si256' The suggested alternative is *dangerous*: The upper part of the ymm register is undefined after _mm256_castsi128_si256, while it is zero after _mm256_zextsi128_si256. _mm256_castsi128_si256 works most of the time, but sometimes a compiler will optimize away the undefined upper part so that it no longer zero. This can give some nasty bugs.
[Bug c++/89325] [7/8/9/10 Regression] False warnings about "optimization attribute" on operators when -fno-ipa-cp-clone
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89325 Agner Fog changed: What|Removed |Added CC||agner at agner dot org --- Comment #5 from Agner Fog --- I have the same problem. Minimal test case: #include struct Test { float f; }; Test round(Test const & a) __attribute__ ((optimize("-fno-unsafe-math-optimizations"))); Test round(Test const & a) {return a;}
[Bug target/65782] Assembly failure (invalid register for .seh_savexmm) with -O3 -mavx512f on mingw-w64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65782 Agner Fog changed: What|Removed |Added CC||agner at agner dot org --- Comment #6 from Agner Fog --- I get the same error with G++ 7.4.0 Cygwin when compiling with option -mavx512vl -m64. A workaround is to use -fno-asynchronous-unwind-tables Register xmm16-31 should be considered clobbered in Win64. See https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin
[Bug target/41084] Filling xmm register with all bit set is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084 Agner Fog changed: What|Removed |Added CC||agner at agner dot org --- Comment #1 from Agner Fog --- What is the status of this bug? It doesn't seem to have been fixed.
[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 --- Comment #13 from Agner Fog agner at agner dot org --- Thank you. I agree that integer overflow should be well-defined when using intrinsics. Is it possible to do the same optimization with boolean vector intrinsics, such as _mm_and_epi32 and _mm_or_ps to enable optimizations such as algebraic reduction and constant propagation?
[Bug target/63351] Optimization: contract broadcast intrinsics when AVX512 is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351 --- Comment #2 from Agner Fog agner at agner dot org --- AVX512 allows all _memory_ source operands to broadcast from a scalar on almost all vector instructions for 128-, 256- and 512-bit vectors with 32- or 64-bit elements. See section 4.6.1 in Intel® Architecture Instruction Set Extensions Programming Reference https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf This feature comes for free; there is no performance cost to broadcasting other than making the instruction prefix longer for vector sizes smaller than 512. This feature has no explicit support in intrinsic functions, so the only way to utilize this excellent optimization opportunity without using assembly is to contract broadcast intrinsics with subsequent instructions. An obvious application is to store scalar constants as 32 or 64 bit constants rather than as full vectors. Often, it is not known to the programmer whether a variable is stored in memory or in a register. If a scalar variable is already in a register then it is better to use a broadcast instruction. If the scalar variable is in memory then it is better to contract the broadcast into the vector instruction that uses it, even if the broadcasted value is used multiple times.
[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 Agner Fog agner at agner dot org changed: What|Removed |Added CC||agner at agner dot org --- Comment #8 from Agner Fog agner at agner dot org --- The same problem applies to other kinds of optimizations, such as algebraic reductions and constant propagation. The method of using operators such as * and + is not portable to other compilers, and it doesn't work with integer vectors for other integer sizes than 64-bits. (I know that there is no integer FMA on Intel CPUs, but I am also talking about other optimizations). Here are some other examples of optimizations I would like gcc to do: #include x86intrin.h void dummy2(__m128 a, __m128 b); void dummyi2(__m128i a, __m128i b); void commutative(__m128 a, __m128 b) { // expect reduce a+b = b+a. This is the only reduction that actually works! dummy2(_mm_add_ps(a,b), _mm_add_ps(b,a)); } void associative(__m128i a, __m128i b, __m128i c) { // expect reduce (a+b)+c = a+(b+c) dummy2i(_mm_add_epi32(_mm_add_epi32(a,b),c), _mm_add_epi32(a,_mm_add_epi32(b,c))); } void distributive(__m128i a, __m128i b, __m128i c) { // expect reduce a*b+a*c = a*(b+c) dummy2i(_mm_add_epi32(_mm_mul_epi32(a,b),_mm_mul_epi32(a,c)), _mm_mul_epi32(a,_mm_add_epi32(b,c))); } void constant_propagation() { // expect store c and d as precalculated constants __m128i a = _mm_setr_epi32(1,2,3,4); __m128i b = _mm_set1_epi32(5); __m128i c = _mm_add_epi32(a,b); __m128i d = _mm_mul_epi32(a,b); dummyi2(c,d); } Of course, the same applies to 256-bit and 512-bit vectors.
[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 --- Comment #9 from Agner Fog agner at agner dot org --- Many programmers are using a vector class library rather than writing intrinsic functions directly. Such libraries have overloaded operators which are inlined to produce intrinsic functions. Therefore, we cannot expect programmers to make optimizations like FMA contraction, algebraic reduction, constant propagation, etc. manually. I don't know if this more general discussion of optimizations on code with intrinsics fit into this bug or they need to be discussed elsewhere?
[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 --- Comment #11 from Agner Fog agner at agner dot org --- Thanks for the links Marc. You are right, the discussion in the gcc-patches mailing list ignores integer vectors. You need a solution that also allows optimizations on integer intrinsic functions (perhaps cast the vector type?). I am not on any internal mailing list, so please post it there for me. The proposed solution of using vector extensions will not work on masked vector intrinsics in AVX512, so it wouldn't enable e.g. constant propagation through a masked intrinsic, but that is probably too much to ask for :) I will add a new bug report for contraction of broadcast with AVX512.
[Bug c/63351] New: Optimization: contract broadcast intrinsics when AVX512 is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351 Bug ID: 63351 Summary: Optimization: contract broadcast intrinsics when AVX512 is enabled Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: agner at agner dot org The AVX512 instruction set allows instructions with broadcast, but there are no corresponding intrinsic functions. The programmer has to write a broadcast intrinsic followed by some other intrinsic and rely on the compiler to contract this into a single instruction. I would expect the optimizer to contract a broadcast intrinsic with any subsequent intrinsic into a single instruction. For example: // gcc -Ofast -mavx512f #include x86intrin.h void dummyz(__m512i a, __m512i b); void broadcastz(__m512i a, int b) { // expect reduction to instruction with broadcast, // something like: vpaddd b, %zmm0, %zmm3 {1to16} __m512i bb = _mm512_set1_epi32(b); __m512i ab = _mm512_add_epi32(a,bb); __m512i cc = _mm512_set1_epi32(5); __m512i ac = _mm512_add_epi32(a,cc); dummyz(ab, ac); } This should actually be possible for smaller vector sizes as well when AVX512 is enabled: void dummyx(__m128 a, __m128 b); void broadcastx(__m128 a, float b) { // broadcasting should even be possible with smaller vectors __m128 bb = _mm_set1_ps(b); __m128 ab = _mm_add_ps(a,bb); __m128 cc = _mm_set1_ps(5.0); __m128 ac = _mm_add_ps(a,cc); dummyx(ab, ac); }
[Bug c/61878] New: Missing intrinsic functions in avx512intrin.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61878 Bug ID: 61878 Summary: Missing intrinsic functions in avx512intrin.h Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: agner at agner dot org A few compare functions are missing in avx512intrin.h, e.g. _mm512_cmpgt_epu32_mask and _mm512_cmpgt_epu64_mask All intrinsic functions for typecasting are also missing. Please add these functions, as they are indispensable. See https://software.intel.com/en-us/node/513903 and https://software.intel.com/sites/landingpage/IntrinsicsGuide/ for documentation of these functions. Definitions copied from Intel's zmmintrin.h: /* Conversion from one type to another, no change in value. */ extern __m512 __ICL_INTRINCC _mm512_castpd_ps(__m512d); extern __m512i __ICL_INTRINCC _mm512_castpd_si512(__m512d); extern __m512d __ICL_INTRINCC _mm512_castps_pd(__m512); extern __m512i __ICL_INTRINCC _mm512_castps_si512(__m512); extern __m512 __ICL_INTRINCC _mm512_castsi512_ps(__m512i); extern __m512d __ICL_INTRINCC _mm512_castsi512_pd(__m512i); * Casts from a larger type to a smaller type. */ extern __m128d __ICL_INTRINCC _mm512_castpd512_pd128(__m512d); extern __m128 __ICL_INTRINCC _mm512_castps512_ps128(__m512); extern __m128i __ICL_INTRINCC _mm512_castsi512_si128(__m512i); extern __m256d __ICL_INTRINCC _mm512_castpd512_pd256(__m512d); extern __m256 __ICL_INTRINCC _mm512_castps512_ps256(__m512); extern __m256i __ICL_INTRINCC _mm512_castsi512_si256(__m512i); /* * Casts from a smaller type to a larger type. * Upper elements of the result are undefined. */ extern __m512d __ICL_INTRINCC _mm512_castpd128_pd512(__m128d); extern __m512 __ICL_INTRINCC _mm512_castps128_ps512(__m128); extern __m512i __ICL_INTRINCC _mm512_castsi128_si512(__m128i); extern __m512d __ICL_INTRINCC _mm512_castpd256_pd512(__m256d); extern __m512 __ICL_INTRINCC _mm512_castps256_ps512(__m256); extern __m512i __ICL_INTRINCC _mm512_castsi256_si512(__m256i);
[Bug c/61855] New: _MM_MANTISSA_NORM_ENUM in avx512intrin.h disabled when optimization off
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61855 Bug ID: 61855 Summary: _MM_MANTISSA_NORM_ENUM in avx512intrin.h disabled when optimization off Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: agner at agner dot org Created attachment 33159 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33159action=edit test code to replicate bug Definitions _MM_MANTISSA_NORM_ENUM and _MM_MANTISSA_SIGN_ENUM in avx512intrin.h are disabled when optimization is off. To replicate error, compile attached file with gcc -c -mavx512f -O0 bug2.c Workaround: gcc -c -mavx512f -O1 bug2.c
[Bug c++/61794] New: internal error: unrecognizable insn, from avx512 extract instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61794 Bug ID: 61794 Summary: internal error: unrecognizable insn, from avx512 extract instruction Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: agner at agner dot org Created attachment 33117 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33117action=edit c++ file producing error The attached file bug1.cpp generates internal error when compiling: g++ -mavx512f bug1.cpp g++ version: 4.9 and 4.10.0 20140706 binutils version: 2.24 Ubuntu version: 12.04.2 LTS Error message: == a@a-desktop:~/avx512$ g++ -mavx512f bug1.cpp bug1.cpp: In member function ‘int32_t Vec16i::extract(uint32_t) const’: bug1.cpp:59:5: error: unrecognizable insn: } ^ (insn 29 28 30 8 (set (reg:V4SI 89 [ D.12727 ]) (vec_merge:V4SI (vec_select:V4SI (reg:V16SI 88 [ D.12726 ]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) ])) (reg:V4SI 86 [ D.12724 ]) (reg:QI 113))) bug1.cpp:38 -1 (nil)) bug1.cpp:59:5: internal compiler error: in extract_insn, at recog.c:2204 0xb25c68 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../.././gcc/rtl-error.c:109 0xb25c99 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../.././gcc/rtl-error.c:117 0xaf609e extract_insn(rtx_def*) ../.././gcc/recog.c:2204 0x980803 instantiate_virtual_regs_in_insn ../.././gcc/function.c:1561 0x980803 instantiate_virtual_regs ../.././gcc/function.c:1932 0x980803 execute ../.././gcc/function.c:1983 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions.
[Bug c/53071] New: Wrong instruction replacement when compiling for xop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53071 Bug #: 53071 Summary: Wrong instruction replacement when compiling for xop Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: critical Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: ag...@agner.org Created attachment 27216 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27216 source code and asm output An intrinsic vector multiply function is replaced by xop instructions when the attached file is compiled with -mxop. Perhaps the compiler is trying to combine a shift instruction and a multiply instruction into a single xop instruction, but it ends up with something wrong. This is perhaps related to bugs # 52908 and 52910.
[Bug target/52910] xop-mul-1:f13 miscompiled on bulldozer (-mxop)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52910 Agner Fog agner at agner dot org changed: What|Removed |Added CC||agner at agner dot org --- Comment #1 from Agner Fog agner at agner dot org 2012-04-22 14:35:35 UTC --- Confirm: I have seen a similar bug (gcc 4.7.0)
[Bug target/52932] AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932 --- Comment #10 from Agner Fog agner at agner dot org 2012-04-13 16:50:33 UTC --- _mm256_permutevar8x32_epi32 has the operands in wrong order. They need to be swapped. Did you fix this too? On 12-04-2012 20:37, uros at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932 --- Comment #9 from uros at gcc dot gnu.org 2012-04-12 18:37:47 UTC --- Author: uros Date: Thu Apr 12 18:37:42 2012 New Revision: 186388 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=186388 Log: PR target/52932 * config/i386/avx2intrin.h (_mm256_permutevar8x32_ps): Change second argument type to __m256i. Update call to __builtin_ia32_permvarsf256. * config/i386/sse.md (UNSPEC_VPERMVAR): New. (UNSPEC_VPERMSI, UNSPEC_VPERMSF): Remove. (avx2_permvarv8sf, avx2_permvarv8si): Switch operands 1 and 2. (avx2_permvarmode): Macroize insn from avx2_permvarv8sf and avx2_permvarv8si using VI4F_256 mode iterator. * config/i386/i386.c (bdesc_args)__builtin_ia32_permvarsf256: Update builtin type to V8SF_FTYPE_V8SF_V8SI. (ix86_expand_vec_perm): Update calls to gen_avx2_permvarv8si and gen_avx2_permvarv8sf. (expand_vec_perm_pshufb): Ditto. testsuite/ChangeLog: PR target/52932 * gcc.target/i386/avx2-vpermps-1.c (avx2_test): Use __m256i type for second function argument. * gcc.target/i386/avx2-vpermps-2.c (init_permps): Update declaration. (calc_permps): Update declaration. Calculate result correctly. (avx2_test): Change src2 type to union256i_d. * gcc.target/i386/avx2-vpermd-2.c (calc_permd): Calculate result correctly. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/avx2intrin.h trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/avx2-vpermd-2.c trunk/gcc/testsuite/gcc.target/i386/avx2-vpermps-1.c trunk/gcc/testsuite/gcc.target/i386/avx2-vpermps-2.c
[Bug c/52932] New: AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932 Bug #: 52932 Summary: AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: ag...@agner.org The intrinsic _mm256_permutevar8x32_ps in avx2intrin.h has the second parameter of type __m256, the correct type is __m256i. The Intel programming reference has the type __m256, which is wrong. This error may have propagated into gnu avx2intrin.h. The correct type is specified by Intel at http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/intref_cls/common/intref_avx2_permutevar8x32_ps.htm Both Intel and Microsoft immintrin.h files have the type __m256i, which appears to be the only logically correct choice. Excerpt from Intel version of immintrin.h: extern __m256 __cdecl _mm256_permutevar8x32_ps(__m256, __m256i); For the sake of compatibility with other compilers, and for logical reasons, I would prefer __m26i. The function works after type-casting the parameter.
[Bug c/49820] Explicit check for integer negative after abs optimized away
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820 --- Comment #15 from Agner Fog agner at agner dot org 2011-07-27 14:27:33 UTC --- How do you define clever things? Checking that a variable is within the allowed range is certainly a standard thing that every SW teacher tells you to do. I think it is reasonable to expect -Wall to do what it says and set a very high warning level. Optimizing away an overflow check is such a dangerous thing to do that it requires a warning. I think it may be wise to distinguish between optimizing away a whole branch or loop, versus just making calculations more efficient, e.g. simplifying expressions or making induction variables. If a branch can be optimized away then it is either violating the intentions of the programmer or the program has a logical error. A warning would be in place in either case. What I am suggesting is that optimizing away a branch should give a warning at a lower level than simplifying an arithmetic expression. I know this might be somewhat complicated to implement, but it would be useful for catching the situation where an overflow check is optimized away. Checking for overflow in a safe way is so complicated and tedious that it is practically never done (see https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow ) Sorry for being persistent, but I think this issue has serious security implications.
[Bug c/49820] Explicit check for integer negative after abs optimized away
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820 --- Comment #13 from Agner Fog agner at agner dot org 2011-07-26 19:31:48 UTC --- My example does indeed give a warning when compiled with -Wstrict-overflow=2. Unfortunately, -Wall implies only -Wstrict-overflow=1 so I got no warning in the first place. I think the warning levels need to be adjusted so that we get the warning with -Wall because the consequences are no less serious than ignoring an overflow check with if(a+consta), which gives a warning with -Wstrict-overflow=1
[Bug c/49820] Explicit check for integer negative after abs optimized away
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820 --- Comment #10 from Agner Fog agner at agner dot org 2011-07-25 07:43:58 UTC --- I still think that a compiler should be predictable and consistent. It is inconsistent that a+5a = false produces a warning, while abs(a)0 = false does not. Both expressions could be intended overflow checks. Besides, some compilers produce a warning when a branch condition is always true or always false. That is sound behavior because it is likely to be a bug. gcc does not produce a warning when optimizing away something like if (2+2 != 4)
[Bug c/49820] Explicit check for integer negative after abs optimized away
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820 --- Comment #12 from Agner Fog agner at agner dot org 2011-07-25 14:21:52 UTC --- No the behavior is not predictable when it sometimes warns about ignoring overflow, and sometimes not. Please add a warning when it optimizes away an overflow check after the abs function. Unsafe optimizations are sometimes good, sometimes causing hard-to-find bugs. The programmer can't always predict what kind of optimizations the compiler makes. A warning feature is the best way to enable the programmer to check if the compiler does the right thing. The programmer can then turn off specific warnings after verifying that the optimizations are OK.
[Bug c/49820] Explicit check for integer negative after abs optimized away
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820 --- Comment #8 from Agner Fog agner at agner dot org 2011-07-24 08:16:39 UTC --- Thanks for your comments. Why is the behavior different for signed and unsigned? The expression (a + 5 a) is reduced to always false when a is signed, but not when a is unsigned. -Wall produces the warning assuming signed overflow does not occur when assuming that (X + c) X is always false in the above example, but there is no warning when it assumes that abs(a) 0 is always false. I believe that the behavior of a compiler must be predictable. An ordinary programmer would never predict that the compiler can optimize away an explicit check for overflow, no matter how many C++ textbooks he has read. If the compiler can remove a security check without warning then we have a security issue. To say that the behavior in case of overflow is undefined is not the same as denying that overflow can occur. I think we need a sensible compromise that allows the compiler to e.g. optimize a loop under the assumption that the loop counter doesn't overflow, but doesn't allow it to optimize away an explicit overflow check. I know the compiler can't guess the programmers' intentions, but then we must at least have a warning. Any optimization rule that allows the compiler to optimize away an overflow check without warning is unacceptable in my opinion.
[Bug c/49820] New: Explicit check for integer negative after abs optimized away
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49820 Summary: Explicit check for integer negative after abs optimized away Product: gcc Version: 4.5.2 Status: UNCONFIRMED Severity: major Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: ag...@agner.org Created attachment 24812 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24812 Example generating bug The integer abs function can overflow if the argument is 0x8000. An intended check for overflow is ignored. The gcc compiler optimizes away a check for the value 0 after abs with -O2 optimization: int b; b = abs(b); if (b 0) // check for overflow optimized away The error occurs when compiling the attached file with -O2, 32 or 64 bit mode, C or C++. The C/C++ language does not normally need to check for overflow, but it should acknowledge an explicit check for overflow.
[Bug c/40528] Add a new ifunc attribute
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40528 --- Comment #16 from Agner Fog agner at agner dot org 2011-07-08 08:52:32 UTC --- (In reply to comment #15) (In reply to comment #14) (In reply to comment #13) What is the status of this issue? It is implemented on ifunc branch. Is option 3 implemented? Yes, on ifunc branch. Which versions of Linux and binutils support IFUNC? Still doesn't work. warning: ‘ifunc’ attribute directive ignored GNU Binutils for Ubuntu 2.21.0.20110327 The ifunc attribute is described in http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html but it doesn't work (see my previous comment). After some experimentation I found that the method described below works. Either the compiler should be fixed or the onlinedocs should be changed. // Example of gnu indirect function #include stdio.h #include time.h // Define different versions of my function int myfunc1() { return 1; } int myfunc2() { return 2; } // Type definition for pointer to my function typedef int (*MyFunctionPointer)(void); // Prototype for the common entry point extern C // remove this line if not C++ int myfunc(); __asm__ (.type myfunc, @gnu_indirect_function); // Make the dispatcher function MyFunctionPointer myfunc_dispatch (void) __asm__ (myfunc); MyFunctionPointer myfunc_dispatch (void) { if (time(0) 1) { // If time is odd at first call, use version 1 return myfunc1; } else { // else use version 2 return myfunc2; } } int main() { // Test the call to myfunc printf(\nCalled function number %i\n, myfunc()); return 0; }
[Bug c/40528] Add a new ifunc attribute
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40528 --- Comment #15 from Agner Fog agner at agner dot org 2011-05-30 13:13:06 UTC --- (In reply to comment #14) (In reply to comment #13) What is the status of this issue? It is implemented on ifunc branch. Is option 3 implemented? Yes, on ifunc branch. Which versions of Linux and binutils support IFUNC? You need at least glibc 2.11 and binutils 2.20. Any plans for BSD and Mac? You have to ask BSD and Mac people since IFUNC support needs to be implemented in both binutils and the C library. Still doesn't work. warning: ‘ifunc’ attribute directive ignored GNU Binutils for Ubuntu 2.21.0.20110327 Where can I find an implementation with ifunc branch?
[Bug c/40528] Add a new ifunc attribute
--- Comment #13 from agner at agner dot org 2010-02-21 16:21 --- What is the status of this issue? Is option 3 implemented? Which versions of Linux and binutils support IFUNC? Any plans for BSD and Mac? -- agner at agner dot org changed: What|Removed |Added CC||agner at agner dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40528
[Bug c++/37880] New: Documentation of option -mcmodel=medium is wrong
The documentation of option -mcmodel=medium says: -mcmodel=medium Generate code for the medium model: The program is linked in the lower 2 GB of the address space but symbols can be located anywhere in the address space. Programs can be statically or dynamically linked, but building of shared libraries are not supported with the medium model. This is misleading since the compiler still uses 32-bit addresses for data objects on Linux (and BSD?) targets. The program data are still limited to addresses 2 GB. Dynamically allocated memory (new or malloc) can probably exceed the 2GB address limit in both the small and the medium memory model. Whatever the difference is between small and medium memory models, it is not covered by the above explanation. On Mac OS X (Darwin) targets, all addresses are above 4GB by default for both small and medium models. On Windows targets, a DLL can be loaded at addresses 2 GB though this rarely happens. Example: -- code file a.cpp --- int * mypointer = 0; int myarray[100] = {0}; int myfunction (int x) { mypointer = myarray + 1; return myarray[x]; } -- end of code file a.cpp --- Command line: g++ -m64 -mcmodel=medium -S a.cpp gcc version: gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7) -- assembly output (excerpt) --- _Z10myfunctioni: .LFB2: pushq %rbp .LCFI0: movq%rsp, %rbp .LCFI1: movl%edi, -4(%rbp) movl$myarray+4, %eax # uses 32-bit zero-extended address here! movq%rax, mypointer(%rip) movl-4(%rbp), %eax cltq movlmyarray(,%rax,4), %eax # uses 32-bit sign-extended address here! leave ret -- end of assembly output (excerpt) --- This example shows that the statement symbols can be located anywhere in the address space is misleading. Static symbols must be located at addresses 2GB for the above code to work. Does the above statement apply to symbols on the stack or only to objects allocated with new or malloc? The statement is definitely wrong for Mac targets, and possibly also for Windows targets. If a correct description would be too long then there may be a link to more exact descriptions elsewhere. -- Summary: Documentation of option -mcmodel=medium is wrong Product: gcc Version: 4.2.3 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: agner at agner dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37880
[Bug target/13685] Building simple test application with -march=pentium3 -Os gives SIGSEGV (unaligned sse instruction)
--- Comment #26 from agner at agner dot org 2006-09-23 08:23 --- Thank you for fixing this, but you need to tell the world which solution you have chosen. Please see the discussion at the dublicate bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537 for arguments for and against each possible solution. You need to specify whether the chosen solution is to enforce 16 byte stack alignment regardless of -Os option or the solution is to make no assumption about stack alignment when making XMM code. This is an ABI issue that has to be standardized and made public. The makers of the Intel compiler are waiting for a resolution to this issue so that they can make their compiler compatible with GCC. For the same reason, assembly programmers need to know whether stack alignment is required or not. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13685
[Bug target/27537] XMM alignment fault when compiling for i386 with -Os
--- Comment #11 from agner at agner dot org 2006-08-23 08:04 --- This problem wouldn't have happened if the ABI had been better maintained. Somebody decides to change the calling convention without properly documenting the change, and somebody else makes another change that is incompatible because the alignment requirement has never made it into the ABI documents. Let me help you making a decision on this issue by summarizing the pro's and con's of 16-bytes stack alignment in 32-bit x86 Linux/BSD. Advantages of enforcing 16-bytes stack alignment: - 1. The use of XMM code is becoming more common now that all new computers have support for the SSE2 or higher instructions set. The necessary alignment of XMM variables can be implemented more efficiently when the stack is aligned. 2. Variables of type double (double precision floating point) are accessed more efficiently when aligned by 8. This is easily achieved when the stack is aligned. 3. Function parameters of type double will automatically get the optimal alignment, unless the parameter is preceded by an odd number of smaller parameters (including any 'this' pointer and return pointer). This means that more than 50% of function parameters of type double will be optimally aligned, versus 50% without stack alignment. The C/C++ programmer will be able to ensure optimal alignment by manipulating the order of function parameters. 4. Functions that need to align local variables can do so without using EBP as stack frame. This frees EBP for other purposes. General purpose registers is a scarce resource in 32-bit mode. 5. 16-bytes stack alignment is officially enforced in Intel-based Mac OS X. It is desirable to have identical ABI's for Linux, BSD and Mac. This makes it possible to use the same compilers and the same function libraries for all three platforms (except for the object file format, which can be converted). 6. The stack alignment requires no extra instructions in leaf functions, which are more likely to contain the critical innermost loop than non-leaf functions. 7. The stack alignment requires no extra instructions in a non-leaf function if the function adjusts the stack pointer anyway for the sake of allocating local storage. 8. Stack alignment is already implemented in Gcc and existing code relies on it. Disadvantages of enforcing 16-bytes stack alignment: 1. A non-leaf function without any stack space allocated for local storage needs one or two extra instructions for conforming to the stack alignment requirement. 2. The alignment requirement results in unused space in the stack. This takes up to 12 bytes of extra space in the data cache for each function calling level except the innermost. Assuming that only the innermost three function levels matter in terms of speed, and that the number of unused bytes is 8 on average for all but the innermost function, the total amount of space wasted in the data cache is 16 bytes. 3. The Intel compiler does not enforce stack alignment. However, the Intel people are ready to change this as soon as you Gnu people make a decision on this issue. I have contact with the Intel people about this issue. 4. Stack alignment is not enforced in 32-bit Windows. Compatibility with the Windows ABI might be desirable. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537
[Bug target/27537] XMM alignment fault when compiling for i386 with -Os
--- Comment #8 from agner at agner dot org 2006-08-03 20:20 --- hjl wrote: Apparently, it was done on purpose Yes, the -Os non-alignment was obviously done on purpose. The problem is that other modules that may be called from the -Os module rely on the stack being aligned by 16. The wrong alignment makes the program crash whem xmm registers are used. The alignment must be strictly enforced in the ABI if any function relies on it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537
[Bug target/27537] XMM alignment fault when compiling for i386 with -Os
--- Comment #6 from agner at agner dot org 2006-06-08 06:27 --- Comment #5 From hjl confirms my point: The error can occur in an optimized part of the program that uses XMM registers when some other, noncritical, part of the program is compiled with -Os We need a comment from the ABI people about which solution to choose because the Intel compiler people are working on a fix to make the two compilers compatible. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537
[Bug target/27537] XMM alignment fault when compiling for i386 with -Os
--- Comment #4 from agner at agner dot org 2006-05-11 07:11 --- Thanks for confirming this bug. If Gcc relies on the stack being aligned then it has to be an official ABI requirement. It makes perfectly sense to compile the whole program, or some of it, with -Os and also use XMM. -Os can be the optimal option if code cache or data cache is a critical resource. It is also a perfectly justifiable solution to compile the part of the program that contains the innermost loop with -O3 and the rest of the program with -Os. The error also occurs if part of the program is compiled with the Intel C++ compiler, because the Intel people follow the official ABI which hasn't been updated for many years. The Intel compiler is intended to be 100% binary compatible with Gnu. Gcc is no longer a hobby project for a limited group of nerds. It is one of the most used compilers in the world and it is used for critical applications. Therefore, you have to be strict about ABI standards. Either the ABI must be changed and made public, or the compiler must be changed so that it doesn't rely on the stack being aligned by 16. I can find the SYSTEM V. APPLICATION BINARY INTERFACE. Intel386 Architecture Processor Supplement at www.caldera.com. It says DRAFT COPY, March 19, 1997. Nothing indicates that this is the official or the latest version. It says nothing about MMX or XMM. I have documented the things that are not clear from the ABI in http://www.agner.org/assem/calling_conventions.pdf as good as I can. I am going to change this document when this issue is resolved. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537
[Bug c++/27537] New: XMM alignment fault when compiling for i386 with -Os. Needs ABI specification.
/gcc/x86_64-redhat-linux/4.1.0/32/crtend.o /usr/lib/gcc/x86_64-redhat-linux/4.1.0/../../../../lib/crtn.o [EMAIL PROTECTED] t]# ./a.out Segmentation fault [EMAIL PROTECTED] t]# -- Summary: XMM alignment fault when compiling for i386 with -Os. Needs ABI specification. Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: critical Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: agner at agner dot org GCC host triplet: x64 GCC target triplet: ia32 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537