[Bug c++/63707] Brace initialization of array sometimes fails if no copy constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707 Eyal Rozenberg changed: What|Removed |Added CC||eyalroz at technion dot ac.il --- Comment #12 from Eyal Rozenberg --- Rejection valid code, especially valid code which is not contrived and can well occur in people's real-life work, seems like a high-priority bug. The last substantive comment here, other than dupe-marking-related comments two years ago, is comment #8, asking for this to be fixed - four and a half years ago. Jonathan and others - please try to prioritize fixing this, and even if you can't for some reason - at least explain which this can't be fixed promptly. See also: https://stackoverflow.com/q/65138048/1593077
[Bug c++/97553] [missed optimization] constexprness not noticed when UBsan enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97553 --- Comment #5 from Eyal Rozenberg --- (In reply to Jakub Jelinek from comment #4) > Depends on what you mean by properly. -O3 can be used with sanitization, > but expecting the code to be optimized the same way as without sanitization > is wrong, it is more important to catch as many bugs as possible, and the > runtime instrumentation slows things down anyway. The sanitization is not > meant to be used for production code, only when debugging it. I wonder, then, if some kind of notice isn't called for when -O3 and UBsan are used together.
[Bug c++/97553] [missed optimization] constexprness not noticed when UBsan enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97553 --- Comment #3 from Eyal Rozenberg --- > And, the runtime sanitization intentionally isn't heavily optimized away, > because the intent is to detect when the code is invalid, so it can't e.g. > optimize away those checks based on assumption that undefined behavior will > not happen. So, doesn't that essentially mean that -O3 cannot properly apply with UBsan?
[Bug c++/97553] New: [missed optimization] constexprness not noticed when UBsan enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97553 Bug ID: 97553 Summary: [missed optimization] constexprness not noticed when UBsan enabled Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- (GodBolt example: https://godbolt.org/z/Kvan5c) Consider the following code: #include constexpr std::string_view f() { return "hello"; } static constexpr std::string_view g() { auto x { f() }; return x.substr(1, 3); } int foo() { return g().length(); } if you compile it with flags `--std=c++17 -O3`, it results in a pleasant: foo(): mov eax, 3 ret but if you also enabled undefined-behavior sanitization, i.e. `--std=c++17 -fsanitize=undefined -O3`, then you get a much longer program with UB-related instrumentation - which is never used. I'm not sure if it's because some optimizations are disabled with UBsan, in which case this might be a "misfeature", or whether they're enabled but the optimization is just missed.
[Bug c/97274] Need ability to ensure no warning about tmpnam
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97274 --- Comment #2 from Eyal Rozenberg --- (In reply to Jonathan Wakely from comment #1) > The linker issues the warning, because the symbol in glibc is annotated to > cause a warning. It has nothing to do with GCC. Hmm. There's still a question of responsibility: * Supposing at least some part of GCC is aware that a symbol used is annotated in the library to cause a warning - should it not offer some mechanism for circumventing that warning? Seeing how it's a "legitimate" standard library function? * Otherwise, would this be a bug to file against the linker, or against the library?
[Bug c/97274] New: Need ability to ensure no warning about tmpnam
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97274 Bug ID: 97274 Summary: Need ability to ensure no warning about tmpnam Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- If you use tmpnam, or std::tmpnam in C++, you get a linker (not compiler, linker) warning: /usr/bin/ld: CMakeFiles/simpleDrvRuntimePTX.dir/modified_cuda_samples/simpleDrvRuntimePTX/simpleDrvRuntimePTX.cpp.o: in function `create_ptx_file[abi:cxx11]()': /home/eyalroz/src/mine/cuda-api-wrappers/examples/modified_cuda_samples/simpleDrvRuntimePTX/simpleDrvRuntimePTX.cpp:105: warning: the use of `tmpnam' is dangerous, better use `mkstemp' there should be a way to disable that warning, when invoking the compiler without separate linking, or simply when invoking the linker. I'm not sure if this bug should even be filed here, since it's not obvious to me who is "responsible" for the linker emitting this error.
[Bug c++/96283] "undefined vtable" error should indicate which members are missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96283 --- Comment #2 from Eyal Rozenberg --- (In reply to Andrew Pinski from comment #1) > (In reply to Eyal Rozenberg from comment #0) > > I'm assuming the compiler provides the linker with enough information to > > realize which virtual methods' implementations are missing > > It does not. Ok, still - the linker knows which virtual methods it needs, and it knows which are provided by each compiled translation unit. Isn't that enough? > Also techinically the C++ ABI can really be changed. I don't understand how this sentence relates to the previous part of your reply. If you mean - can change between compilation and linking - that's theoretically possible but would be the cause of all sorts of trouble.
[Bug c++/96283] New: "undefined vtable" error should indicate which members are missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96283 Bug ID: 96283 Summary: "undefined vtable" error should indicate which members are missing Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Consider the following code: class Base { public: virtual void vmethod(); }; class foo : public Base { int x; void vmethod() override; }; int main() { foo f; } This will yield the errors (irrelevant paths snipped): ld: prog.o: in function `Base::Base()': :1: undefined reference to `vtable for Base' ld: prog.o: in function `foo::foo()': :6: undefined reference to `vtable for foo' While this is true, it is a bit confusing. But even supposing I looked up what this error means and realized what was going on, I would still need to go over all the methods of one or two of the classes to find the one that's missing its implementation. In this simple example that's not so difficult, but sometimes it's quite the nuisance. I'm assuming the compiler provides the linker with enough information to realize which virtual methods' implementations are missing, so that the linker can finally print an error message which methods are still missing after it has run. In this specific case, the linker should complain about vmethod() missing its definition. GodBolt: https://godbolt.org/z/9Ejn4s
[Bug c++/95148] New: -Wtype-limits always-false warning triggered despite comparison being avoided
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95148 Bug ID: 95148 Summary: -Wtype-limits always-false warning triggered despite comparison being avoided Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Consider the following program: #include int main() { unsigned x { 5 }; return (std::is_signed::value and (x < 0)) ? 1 : 0; } when compiling it with GCC versions 11.0 20200511, 10.1, 9.2.1, 8.3.0, I get the warning: a.cpp:5:52: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] I should not be getting this warning, because when x is unsigned, the comparison is never performed, due to the short-circuit semantics of `and`. This can be easily determined by the compiler - and probably is. No less importantly, the author of such a line in a program clearly specified his/her intent here with this check. clang doesn't seem to issue a warn inf does come to pass.
[Bug libstdc++/94559] Nitpick: std::array constexpr_fill test isn't constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94559 Eyal Rozenberg changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Eyal Rozenberg --- Sorry, I misread.
[Bug libstdc++/94559] New: Nitpick: constexpr_fill test isn't constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94559 Bug ID: 94559 Summary: Nitpick: constexpr_fill test isn't constexpr Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- This test: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/testsuite/23_containers/array/requirements/constexpr_fill.cc is named constexpr_fill, but that test is a runtime one.
[Bug c++/42633] hinting gcc that restricted pointer dont alias with members of structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42633 Eyal Rozenberg changed: What|Removed |Added CC||eyalroz at technion dot ac.il --- Comment #5 from Eyal Rozenberg --- This bug is marked assigned, and there's a patch, but 10 years have passed. Torben, Richard - ping.
[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293 --- Comment #6 from Eyal Rozenberg --- (In reply to Richard Biener from comment #5) > DSE part ... DCE DSE = Dead Statement Elimination? DCE = Dead Code Elimination?
[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293 --- Comment #3 from Eyal Rozenberg --- (In reply to Marc Glisse from comment #1) You should probably post that comment on the second, related, bug 94294 - which is about the fact that GCC keeps the new and delete. This one is strictly about the population of the string, given that new and delete are called.
[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293 --- Comment #2 from Eyal Rozenberg --- Note: The bugs also manifest with this slightly simpler program: #include int bar() { std::string second { "Hey... no small-string optimization for me please!" }; return 123; } See: https://godbolt.org/z/LjmNYi
[Bug tree-optimization/94294] [missed optimization] new+delete of unused local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294 --- Comment #1 from Eyal Rozenberg --- Note: The bugs also manifest with this simpler program: #include int bar() { std::string second { "Hey... no small-string optimization for me please!" }; return 123; } See: https://godbolt.org/z/LjmNYi
[Bug tree-optimization/94294] New: [missed optimization] new+delete of unused local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294 Bug ID: 94294 Summary: [missed optimization] new+delete of unused local string not removed Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- (Relevant Godbolt: https://godbolt.org/z/GygbjZ) This is the second of two apparent bugs, following bug 94293. They both manifest when compiling the following program: #include int bar() { struct poor_mans_pair { int first; std::string second; }; poor_mans_pair p { 123, "Hey... no small-string optimization for me please!" }; return p.first; } For x86_64, this would ideally compile into: bar(): mov eax, 123 ret but when compiling this with GCC 10.0.1 20200322 (or GCC 9.x etc.), we get assembly which calls operator new[](), populates the string, calls operator delete[](), then returns 123: bar(): sub rsp, 8 mov edi, 51 calloperator new(unsigned long) movdqa xmm0, XMMWORD PTR .LC0[rip] mov esi, 51 mov rdi, rax movups XMMWORD PTR [rax], xmm0 movdqa xmm0, XMMWORD PTR .LC1[rip] movups XMMWORD PTR [rax+16], xmm0 movdqa xmm0, XMMWORD PTR .LC2[rip] movups XMMWORD PTR [rax+32], xmm0 mov eax, 8549 mov WORD PTR [rdi+48], ax mov BYTE PTR [rdi+50], 0 calloperator delete(void*, unsigned long) mov eax, 123 add rsp, 8 ret .LC0: .quad 7935393319309894984 .quad 3273110194895396975 .LC1: .quad 8007513861377913971 .quad 8386118574366356592 .LC2: .quad 2338053640980164457 .quad 8314037903514690925 This bug report is about how the allocation and de-allocation are not elided/optimized-away, even though the std::string variable is local and unused. AFAICT, g++ is not required to do this. And, in fact, clang++ doesn't do this with its libc++. cppreference says that, starting in C++14, > New-expressions are allowed to elide or combine allocations made > through replaceable allocation functions. In case of elision, the > storage may be provided by the compiler without making the call to > an allocation function (this also permits optimizing out unused > new-expression) and this is, indeed, the case of an unused new-expression. Well, eventually-unused. Note: I suppose it's theoretically possible that this bug only manifests because bug 94293 prevents the allocated space from being recognized as unused; but I can't tell whether that's the case.
[Bug c++/94293] New: [missed optimization] Useless statements populating local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293 Bug ID: 94293 Summary: [missed optimization] Useless statements populating local string not removed Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- (Relevant Godbolt: https://godbolt.org/z/GygbjZ) This is the first of two apparent bugs manifesting when compiling the following program: #include int bar() { struct poor_mans_pair { int first; std::string second; }; poor_mans_pair p { 123, "Hey... no small-string optimization for me please!" }; return p.first; } For x86_64, this would ideally compile into: bar(): mov eax, 123 ret but when compiling this with GCC 10.0.1 20200322 (or GCC 9.x etc.), we get assembly which calls operator new[](), populates the string, calls operator delete[](), then returns 123: bar(): sub rsp, 8 mov edi, 51 calloperator new(unsigned long) movdqa xmm0, XMMWORD PTR .LC0[rip] mov esi, 51 mov rdi, rax movups XMMWORD PTR [rax], xmm0 movdqa xmm0, XMMWORD PTR .LC1[rip] movups XMMWORD PTR [rax+16], xmm0 movdqa xmm0, XMMWORD PTR .LC2[rip] movups XMMWORD PTR [rax+32], xmm0 mov eax, 8549 mov WORD PTR [rdi+48], ax mov BYTE PTR [rdi+50], 0 calloperator delete(void*, unsigned long) mov eax, 123 add rsp, 8 ret .LC0: .quad 7935393319309894984 .quad 3273110194895396975 .LC1: .quad 8007513861377913971 .quad 8386118574366356592 .LC2: .quad 2338053640980164457 .quad 8314037903514690925 This bug report is about the population of the string, i.e. let's ignore the question of whether any memory should be allocated at all. g++ should be aware that the string has no visibility outside `bar()` (except through access using raw arbitrary memory addresses from another while `bar()` is executing). Also, IANALL, even if the allocation can be considered observable behavior which needs to be maintained - values at that memory location, which may transiently be present, do not constitute such behavior. Why even set those values, therefore, when they are not used? At least these string constants and population statements should be optimized away, into something like (hand-written assembly): bar(): sub rsp, 8 mov edi, 51 calloperator new(unsigned long) mov rdi, rax mov esi, 51 calloperator delete(void*, unsigned long) mov eax, 123 add rsp, 8 ret
[Bug c++/93739] Ability to print a type name without aborting compilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93739 --- Comment #4 from Eyal Rozenberg --- (In reply to Eyal Rozenberg from comment #3) A couple more points: * The error I get (https://godbolt.org/z/5GpR2T) doesn't have the "your type here" string. * This forces you to define a variable you're not using. So, you need to also say [[unused]] in some cases - more clutter.
[Bug c++/93739] Ability to print a type name without aborting compilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93739 --- Comment #3 from Eyal Rozenberg --- (In reply to Jonathan Wakely from comment #2) > Oops, that was meant to be print_type() Ok, that's a better kludge than mine - it doesn't have the more serious shortcomings. That makes the motivation for this feature request more about convenience / aesthetics. It would still be nice to have.
[Bug c++/93739] New: Ability to print a type name without aborting compilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93739 Bug ID: 93739 Summary: Ability to print a type name without aborting compilation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Over the past several years, C++ has seen increased use of type deduction via auto variables (C++11), auto return type (C++14), template deduction guides and more. The language already had delicate rules regarding decays, references being added or removed, etc. All of these motivates the developer to sometimes want to ascertain what the type of an expression or the value of a template parameter is, in their program, while it is being compiled - with the information printed to the standard error stream like warnings and errors are. (And nota bene: Not at run-time). This is partly doable today already, e.g. with the kludge in the following example: using mystery = int; // ... etc etc template struct has_type{}; using foo = typename has_type::mystery; this results in an error whose first line is: :6:59: error: 'mystery' in 'struct has_type' does not name a type and tells us the type of mystery is int. This has several drawbacks: * Abuse of a mechanism with a different intent (although in C++ that is sometimes considered a good idea...) * Irrelevant clutter in the output - you need to pay attention and know what you're looking for. * [MOST IMPORTANT] Compilation stops when hitting this type check. (consequences of the above) * Can't check more than one type at a time this way. * Can't be used in compilation log parsing - must be introduced and then removed. * Effectively unusable if the same expression has a different type when called from different locations - i.e. within templates: Compilation will stop with the first instantiation, not the one you want (and preventing this stoppage is overkill). I therefore ask that a feature be added to GCC, of allowing the printing of a type's name at some appropriate point during compilation. I'm not familiar with the various passes and stages of parsing and comprehending C++ programs in GCC, but obviously the type is deduced at some point - the same point where the error in the above example can be printed. Instead, I suggest for some pragma (new or existing one) to be able to print type names. Syntax ideas (a bike-shed issue of course): #pragma print_type_of( mystery ) #pragma message( typeof(mystery) )
[Bug ipa/89924] [missed-optimization] Function not de-virtualized within the same TU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924 --- Comment #9 from Eyal Rozenberg --- (In reply to Jason Merrill from comment #8) > I think if the object were not an actual Aint, performing the standard > conversion to A* should be undefined, allowing the devirtualization. But > I'm not finding actual wording to that effect in the current draft. I'm not sure you _should_ find such language, because it's unnecessary. A function getting a T* is allowed to assume that the pointer is pointing to a valid T (and if I were a language lawyer I would tell you where that's stated). Implicitly, therefore,, a C++ program is not required to have any defined behavior when that T* does not point to a valid T.
[Bug c++/90703] New: A virtuous bug: `=delete` accepted on second declaration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90703 Bug ID: 90703 Summary: A virtuous bug: `=delete` accepted on second declaration Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- (based on this SE question: https://stackoverflow.com/q/56409551/1593077 and this GodBolt test case: https://godbolt.org/z/YNstQX ) Consider this code: template int foo(); template int foo() = delete; it seems to be invalid in C++11 and onward: [dcl.fct.def.delete] 4 ... A deleted definition of a function shall be the first declaration of the function... Un(?)fortunately, GCC accepts this code as valid C++11, beginning with 4.7.1 and all the way up to the "trunk" version that GodBolt uses. Specifically, version 9.1 accepts it. (Personally I feel the standard should correspond to GCC's behavior on this matter but it's not for me to decide.)
[Bug other/90566] New: Support demangling with underscore-prefixed string after mangled name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90566 Bug ID: 90566 Summary: Support demangling with underscore-prefixed string after mangled name Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- libiberty performs the demangling for c++filt, the most commonly-used (and perhaps only?) tool for demangling C++ names in object files and related file formats. One of the "ecosystems" which produces such files is CUDA; specifically in its intermediary representation for GPU code. Now, a GPU-device-side function, compiled with clang to PTX, can look like this, for example: .visible .entry _Z6squarePii( .param .u64 _Z6squarePii_param_0, .param .u32 _Z6squarePii_param_1 ) { ld.param.u32%r1, [_Z6squarePii_param_1]; mov.u32 %r2, %ctaid.x; setp.ge.s32 %p1, %r2, %r1; @%p1 braLBB6_2; ld.param.u64%rd2, [_Z6squarePii_param_0]; cvta.to.global.u64 %rd3, %rd2; mul.wide.u32%rd4, %r2, 4; add.s64 %rd1, %rd3, %rd4; ld.global.u32 %r3, [%rd1]; mul.lo.s32 %r4, %r3, %r3; st.global.u32 [%rd1], %r4; ret; } (see https://godbolt.org/z/GcDTVh for cland and nvcc output) which clearly has mangled names. However, it seems the function parameter name is somewhat malformed, or non-standard - being a mangled name, followed immediately by an underscore and more text: mangledblahblah_param_0. When demangling, the function name gets demangled fine, but the parameter does not: .visible .entry square(int*, int)( .param .u64 _Z6squarePii_param_0, .param .u32 _Z6squarePii_param_1 ) and from what the c++filt people say - this is libiberty's output. I ask that libiberty either auto-detect this case, or have an option to detect it; and when that's turned on, demangle the above into: .visible .entry square(int*, int)( .param .u64 square(int*, int)_param_0, .param .u32 square(int*, int)_param_1 ) or .visible .entry square(int*, int)( .param .u64 square(int*, int) param_0, .param .u32 square(int*, int) param_1 ) or something else that's meaningful. Caveat: I realize that libiberty is FOSS and CUDA involves a bunch of closed-source software by a company notorious for keeping code and specs closed, and not making it easy for FOSS developers. Still, we are talking about something clang compiles; and it's only being mindful of an underscore. (Note: I first filed this as a bug against c++filt: https://sourceware.org/bugzilla/show_bug.cgi?id=24557 ) and was directed to file here.
[Bug tree-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271 --- Comment #10 from Eyal Rozenberg --- (In reply to rguent...@suse.de from comment #9) > You'd have to experiment with different GCC versions, but yes. I was hoping for a more concrete suggestion (which works with multiple GCC versions)...
[Bug tree-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271 --- Comment #8 from Eyal Rozenberg --- (In reply to rguent...@suse.de from comment #5) > int foo3() > { > struct { int x; int y; } s; > s.x = 3; > char c = 1; > return replace_bytes_3(&s.x,c); > } > > Coalescing successful! > Merged into 1 stores This is very interesting! Do you think I could somehow adapt this example into a workaround, for existing GCC versions, rather than wait for the bug fix?
[Bug tree-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271 --- Comment #6 from Eyal Rozenberg --- > Is the example from real-world code? Yes. Example: Some machines support atomic instructions on aligned 32 bits or on 64 bits, but not directly on 1, 2, 3, 5, 6 or 7 bytes. So in order to atomically change a value of one of those "undesirable" sizes, you have to work on its corresponding 4-byte or 8-byte stretch: You read it, change it in the middle, then apply atomic compare-and-swap to it.
[Bug rtl-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271 --- Comment #1 from Eyal Rozenberg --- Can also reproduce this in C, with slightly different code: int replace_bytes_3(int v1 ,char v2) { memcpy( (void*) (((char*)&v1)+1) , &v2 , sizeof(v2) ); return v1; } int foo3() { int x = 3; char c = 1; return replace_bytes_3(x,c); } GodBolt: https://godbolt.org/z/1K89xh Again, clang optimizes this correctly. Note specifically the way it handles the non-inlined replace_bytes_3.
[Bug rtl-optimization/90271] New: [missed-optimization] failure to keep variables in registers during "faux" memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271 Bug ID: 90271 Summary: [missed-optimization] failure to keep variables in registers during "faux" memcpy Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Example on GodBolt: https://godbolt.org/z/Q17L1u Consider the following functions: template inline void replace_bytes (T1& v1 ,const T2& v2 ,std::size_t k) noexcept { if (k > sizeof(T1) - sizeof(T2)) { return; } std::memcpy( (void*) (((char*)&v1)+k) , (const void*) &v2 , sizeof(T2) ); } For plain-old-data types, this is nothing but the manipulation of v1's bytes (and there are no pointer aliasing issues). So, at least when k is known at compile-time, the compiler should IMHO keep the activity to within registers. And yet - GCC doesn't: With the extra code int foo1() { int x = 3; char c = 1; replace_bytes(x,c,1); return x; } we get (at maximum optimization): foo1(): mov DWORD PTR [rsp-4], 3 mov BYTE PTR [rsp-3], 1 mov eax, DWORD PTR [rsp-4] ret This, while clang _does_ optimize fully and has foo1() simply return 259 (= 256+3). Even if we make k a template parameter - it doesn't help.
[Bug ipa/89924] [missed-optimization] Function not de-virtualized within the same TU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924 --- Comment #5 from Eyal Rozenberg --- (In reply to Jan Hubicka from comment #3) > The reason why we do not devirtualize is that only information about Aint is > the type of function parameter "Only"? :-) > and we do not believe it implies the type of > memory location it points to because there is no read or anything from that > pointer before it is casted to struct A* and pointer of a given type does > not need to necessarily point to memory location of the same type unless you > dereference it. > > Is it really valid to devirtualize here? IANALL, but yes. You're using terms like "belief" and talk about speculative inference based on partial evidence. Why? foo_virtual gets a pointer to an Aint. Why should the compiler needs to second-guess this?
[Bug tree-optimization/89924] New: [missed-optimization] Function not de-virtualized within the same TU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924 Bug ID: 89924 Summary: [missed-optimization] Function not de-virtualized within the same TU Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Related StackOverflow question: https://stackoverflow.com/q/55464578/1593077 GodBolt example: https://godbolt.org/z/l0vdFG In the following code: struct A { virtual A& operator+=(const A& other) noexcept = 0; }; void foo_inner(int *p) noexcept { *p += *p; } void foo_virtual_inner(A *p) noexcept { *p += *p; } void foo(int *p) noexcept { return foo_inner(p); } struct Aint : public A { int i; A& operator+=(const A& other) noexcept override final { i += dynamic_cast(other).i; // i += reinterpret_cast(other).i; return *this; } }; void foo_virtual(Aint *p) noexcept { return foo_virtual_inner(p); } Both functions, `foo()` and `foo_virtual()`, should compile to the same thing. But g++ 8.3 (on x86_64) with -O3 produces: ``` foo(int*): sal DWORD PTR [rdi] ret foo_virtual(Aint*): mov rax, QWORD PTR [rdi] mov rax, QWORD PTR [rax] cmp rax, OFFSET FLAT:Aint::operator+=(A const&) jne .L19 pushrbx xor ecx, ecx mov edx, OFFSET FLAT:typeinfo for Aint mov esi, OFFSET FLAT:typeinfo for A mov rbx, rdi call__dynamic_cast testrax, rax je .L20 mov eax, DWORD PTR [rax+8] add DWORD PTR [rbx+8], eax pop rbx ret .L19: mov rsi, rdi jmp rax foo_virtual(Aint*) [clone .cold.1]: .L20: call__cxa_bad_cast ``` i.e. it doesn't manage to de-virtualize `Aint::operator+=` - although it really should. It has all the necessary information, as far as I can tell. As a side note, even regardless of de-virtualization, there's a whole lot of code there, while with with clang 8, we only get: ``` foo_virtual(Aint*): # @foo_virtual(Aint*) mov rax, qword ptr [rdi] mov rax, qword ptr [rax] mov rsi, rdi jmp rax ``` which at least doesn't need the type info.
[Bug ipa/89567] [missed-optimization] Should not be initializing unused struct parameter members
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89567 --- Comment #4 from Eyal Rozenberg --- > In the first excample, the interproceudral constant propagation pass > (IPA-CP) found that foo1 is so small that copying all of it might be > worth not passing the unused argument and so it does, that is why > you'll find function foo1 twice in the assembly. Why does this have anything to do with constant propagation? I also don't understand the sense in two identical copies. It also sounds like "the wrong optimization" is being used if it's not about noticing unused parameters. > This functionality > in the pass is there just "on the side" and it is not easy to make it > also work with aggegates, not even desireable (that is the job of a > different pass, see below). > > Both examples are compiled better if you make foo1 and foo2 static. This really makes no sense to me! bar() is not affected by other TUs at all... > In the latter case, you get exactly what you want, the structure is be > split and only the used part survives. In the first example, you > don't get a clone emitted which you probably don't need. Both of > these transformation are done by a pass called interprocedural scalar > replacement of aggregates (IPA-SRA), which specifically also aims to > remove unused arguments, but it never creates multiple clones. I like this pass :-) ... so, why does it work for the static case with bar2() but doesn't work with bar1() ? > I'm afraid you'd need to provide a strong real-world use-case to make > me investigate how to make IPA-SRA clone so you might not need static > and/or LTO because that would mean devising a cost/benefit > (size/speedup) heuristics and that is not easy. For now I'm just trying to understand why this isn't already happening. Then I'll perhaps try to understand why clang does do this. But - don't necessarily clone. IIUC, cloning would possibly mean removing that parameter even though it's a field of a struct. But even if you _don't_ clone, functions calling foo() should still not have to initialize that member. It seems like we're talking about different optimizations.
[Bug ipa/89567] [missed-optimization] Should not be initializing unused struct parameter members
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89567 --- Comment #2 from Eyal Rozenberg --- (In reply to Richard Biener from comment #1) > You are looking for IPA DSE I'm not a compiler expert and don't know what this means. Even literally, I don't know what these acronyms stand for. > by marshalling through a struct you make GCCs job a lot harder... Well, first - yes, I suppose this could make things harder. However, as GCC does its magic, I presume that at some point the struct abstraction is lost, and we only have code which passes values to another function in registers, and one of these values is unused. So at some point along the way this might be easier than analyzing structs. > "pro-active" IPA-SRA might help here, Again, I have no idea what that is... could I trouble you to elaborate just a bit more?
[Bug rtl-optimization/89567] New: [missed-optimization] Should not be initializing unused struct parameter members
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89567 Bug ID: 89567 Summary: [missed-optimization] Should not be initializing unused struct parameter members Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- The issue is captured in the example here: https://gcc.godbolt.org/z/_U4X80 The issue was first described in this StackOverflow question: https://stackoverflow.com/q/54964323/1593077 Consider the following code: __attribute__((noinline)) int foo1(int x, int y) { return x; } int bar1(int* a) { int b = foo1(a[5], a[10]); return b * b; } GCC (with -O3) optimizes-out the initialization of the y parameter with the a[10] argument, saving one of the two memory reads. This is good. Now suppose we put those two int parameters into a struct: struct two_ints { int x, y; }; __attribute__((noinline)) int foo2(struct two_ints s) { return s.x; } int bar2(int* a) { struct two_ints ti = { a[5], a[10] }; int b = foo2(ti); return b * b; } There shouldn't be any difference, right? The parameters (certainly as far as the assembly, which recognizes no such thing as "structs", is concerned) are two integers; and the second one is not used. So I would expect to see the same assembly code. Yet... I don't. Both integers are initialized and two `mov eax, DWORD PTR [rdx+something]` instructions are executed. This behavior also occurs also with "GCC trunk" on GodBolt, i.e. GCC version 9.0.1.
[Bug tree-optimization/89479] __restrict on a pointer ignored when a function is passed alongside it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479 --- Comment #6 from Eyal Rozenberg --- Thanks to a friendly StackOverflow user, I should also report that (about) the same code produces the same compiler behavior disparity for proper C: https://godbolt.org/z/kVYqp8 with the slight modification being `void g(void)` instead of `void g()` in the function signatures.
[Bug tree-optimization/89479] __restrict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479 --- Comment #5 from Eyal Rozenberg --- (In reply to Richard Biener from comment #4) > exposing __restrict to the IL). Is "IL" an acronym for "Intermediate Language"? Remember many bug posters/readers are not GCC developers and don't know all the lingo. > To elaborate further to successfully mark a function call > with clique == 1 and base == 0 we have to prove the pointer marked restrict > doesn't escape the function through calls Certainly, calling g() could be just the same as writing to an alias of the x pointer. But - __restrict is how we guarantee this doesn't happen (or can be ignored) even when the compiler can't prove that's the case on its own. So I'm not sure I understand what you're suggesting with your comment. I suppose you could try and "disprove the __restrict" to give a warning, but other than that - why not just respect it?
[Bug tree-optimization/89479] __restrict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479 Eyal Rozenberg changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE |--- --- Comment #2 from Eyal Rozenberg --- (In reply to Marc Glisse from comment #1) > Seems similar enough. With respect - this is not about x being a const __restrict pointer; what I said (including the clang behavior) applies exactly the same when we remove the const. See: https://godbolt.org/z/hH643a (where the const is gone).
[Bug rtl-optimization/89479] New: __restrict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479 Bug ID: 89479 Summary: __restrict Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- (This is all illustrated at: https://godbolt.org/z/nz2YXE ) Let us make our language C++17. Consider the following function: int foo(const int* x, void g()) { int result = *x; g(); result += *x; return result; } since we have no aliasing guarantees, we must assume the invocation of g() may change the value at address x, so we must perform two reads from x to compute the result - one before the call and one after. If, however, we add __restrict__ specifier to x: int bar(const int* __restrict__ x, void g()) { int result = *x; g(); result += *x; return result; } we may assume x "points to an unaliased integer" (as per https://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html ). That means we can read from address x just once, and double the value to get our result. I realize there's a subtle point here, which is whether being "unaliased" also applies to g()'s behavior. It is my understanding that it does. Well, clang 7.0 understands things they way I do, and indeed optimizes one of the reads away in `bar()`. But - g++ 8.3 (and g++ "trunk", whatever that means on GodBolt) doesn't do so, and reads _twice_ from x both in `foo()` and in `bar()`.
[Bug c++/88371] New: Gratuitous (?) warning regarding an implicit conversion in pointer arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88371 Bug ID: 88371 Summary: Gratuitous (?) warning regarding an implicit conversion in pointer arithmetic Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- See: https://godbolt.org/z/tYn9SX for a live example and comparison with clang Se: https://stackoverflow.com/q/53628998/1593077 for the question motivating this bug report. --- Consider the following program: #include template struct wrapper { T t; operator T() const { return t; } T get() const { return t; } }; int main() { int a[10]; int* x { a } ; wrapper y1{2}; wrapper y2{2}; wrapper y3{2}; std::cout << (x + y1) << '\n'; std::cout << (x + y2) << '\n'; std::cout << (x + y3) << '\n'; // this triggers a warning std::cout << (x + y3.get()) << '\n'; } When compiling it (with g++ 8.2.0) with -std=c++2a -Wsign-conversion we get: a.cpp: In function ‘int main()’: a.cpp:20:23: warning: conversion to ‘long int’ from ‘long unsigned int’ may change the sign of the result [-Wsign-conversion] std::cout << (x + y3) << '\n'; // this triggers a warning ^~ As far as I can tell, both the third and fourth line should trigger a warning, or none of them should. Also, a comment on the Stackoverflow page suggested this clause: http://eel.is/c++draft/over.match.oper#9 may be relevant here.
[Bug tree-optimization/87925] Missed optimization: Distinct-value if-then-else chains treated differently than switch statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87925 --- Comment #5 from Eyal Rozenberg --- (In reply to Martin Liška from comment #3) > Currently we only do switch -> balanced decision tree (read series of > if-then-else statements). Well definitely a potentially enhancement, > question is whether it worth doing that.. That is a question for another bug. I'm just saying that these two cases (or some expansion thereof, e.g. with some fallthrough on the case side, or with several distinct values in each if statement) should be treated the same. i.e. whatever optimizations are considered for one of them should be considered for the other. (In reply to Martin Liška from comment #4) > This is not problem because it's a const expression that is evaluate in C++ > front-end. Thus you don't get any penalty. The specific use example at the link is a static_assert, but the in() function there works with runtime values as well. So - not just compile-time computation. Another example - un-type-erasure dispatches, which may go into tight loops: https://stackoverflow.com/a/38924396/1593077
[Bug rtl-optimization/87925] Missed optimization: Distinct-value if-then-else chains treated differently than switch statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87925 Eyal Rozenberg changed: What|Removed |Added Version|unknown |9.0 --- Comment #1 from Eyal Rozenberg --- See also: https://stackoverflow.com/questions/53198276/do-compilers-optimize-switches-differently-than-long-if-then-else-chains
[Bug rtl-optimization/87925] New: Missed optimization: Single-value if-then-else chains treated differently than switch'es
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87925 Bug ID: 87925 Summary: Missed optimization: Single-value if-then-else chains treated differently than switch'es Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Have a look at this GodBolt example: https://gcc.godbolt.org/z/zR03rA On one hand, we have: void foo(int i) { switch (i) { case 1: boo<1>(); break; case 2: boo<2>(); break; case 3: boo<3>(); break; case 4: boo<4>(); break; // etc. etc. } } on the other hand we have the same, but using an if-then-else chain: void bar(int i) { if (i == 1) boo<1>(); else if (i == 2) boo<2>(); else if (i == 3) boo<3>(); else if (i == 4) boo<4>(); // etc. etc. } The switch statement gets a jump table; the if-then-else chain - does not. At the link, there are 20 cases; g++ starts using a jump table with 4 switch values. This is not just a matter of programmers needing to remember to prefer switch statements (which it's better not to require of them), but rather that if-then-else chains are sometimes generated by expansion of templated code, e.g. this example for checking for membership in a set of values (= all values of an enum): https://stackoverflow.com/a/53191264/1593077 while switch() statements of variable do not get generated AFAICT. It would thus be quite useful if such generated code would not result in highly-inefficient long chains of comparisons.
[Bug tree-optimization/10624] unroll-loops can't unroll nested constant loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10624 Eyal Rozenberg changed: What|Removed |Added CC||eyalroz at technion dot ac.il --- Comment #8 from Eyal Rozenberg --- This seems to have been solved at some point. Compiling with -O3 -funroll-loops using GCC 8.2 on GodBolt: https://godbolt.org/z/4gBcw- We get: .LC0: .string "%d, %d\n" unroll_me: sub rsp, 8 xor edx, edx mov esi, 1 mov edi, OFFSET FLAT:.LC0 xor eax, eax callprintf xor edx, edx mov esi, 2 xor eax, eax mov edi, OFFSET FLAT:.LC0 callprintf mov edx, 1 xor eax, eax add rsp, 8 mov esi, 2 mov edi, OFFSET FLAT:.LC0 jmp printf Which is quite unrolled.
[Bug tree-optimization/87543] Inconsistency in noticing a constant result rather than emitting code for a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87543 --- Comment #2 from Eyal Rozenberg --- (In reply to Richard Biener from comment #1) > The issue at -O2 is etc. That is one issue, but there is the question of the changes in behavior between versions and when `-march` is used. I don't know if you guys are actively maintaining 7.x or 6.x ; assuming you do, each of them should at least exhibit coherent behavior here. > because of the awkward IV structure PRE present us with I assume other GCC devs will understand what this means, but for my benefit as the lay reporter - can you define (or link to a definition of) what "PRE" is and what is an "awkward IV structure"? (I'm guessing the acronym expands to Induction Variable.)
[Bug c++/87543] New: Missed opportunity to compute constant return value at compile time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87543 Bug ID: 87543 Summary: Missed opportunity to compute constant return value at compile time Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Brief illustration on GodBolt: https://godbolt.org/z/sQyNGA A related question on StackOverflow: https://stackoverflow.com/q/52677512/1593077 Consider the following two functions: #include int f1() { int arr[] = {1, 2, 3, 4, 5}; auto n = sizeof(arr)/sizeof(arr[0]); return std::accumulate(arr, arr + n, 0); } int f2() { int arr[] = {1, 2, 3, 4, 5}; auto n = sizeof(arr)/sizeof(arr[0]); int sum = 0; for(int i = 0; i < n; i++) { sum += arr[i]; } return sum; } Both functions return 15, always; and while they're not marked constexpr, this can clearly be realized by the compiler. In fact, it is, if we compiler with -O3 (with GCC 8.2). However, with -O2, we get the following result: f1(): movabs rax, 8589934593 lea rdx, [rsp-40] mov ecx, 1 mov DWORD PTR [rsp-24], 5 mov QWORD PTR [rsp-40], rax lea rsi, [rdx+20] movabs rax, 17179869187 mov QWORD PTR [rsp-32], rax xor eax, eax jmp .L3 .L5: mov ecx, DWORD PTR [rdx] .L3: add rdx, 4 add eax, ecx cmp rdx, rsi jne .L5 ret f2(): mov eax, 15 ret I don't think `std::accumulate` should have any code which should make -O2 fail to notice the optimization opportunity in `f1()`. But if that assertion might be debatable, surely adding -march=skylake to the -O3 can only result in stronger optimization, right? However, it results in _both_ functions, rather than just `f1()`, failing to fully optimize. I asked about part of this issue at StackOverflow and a reply (by Florian Weimer) suggested this might be a regression relative to GCC 6.3 . And, indeed, if we switch the GCC version to 6.3 - both functions are not-fully-optimized in -O2, and fully-optimized with -O3: https://godbolt.org/z/JOqCoC if I try GCC 7.3, things get weird in yet a different way: -O2 optimizes both functions fully, and -O3 optimizes just the _first_ one.
[Bug web/85837] Listing of all error and warning messages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837 --- Comment #7 from Eyal Rozenberg --- (In reply to Jonathan Wakely from comment #5) > Be the change that you want to see in the world. > > If you want this, make it happen. Well, I already started by filing this bug, but point taken. > (In reply to Eyal Rozenberg from comment #4) > > Fair enough, but, honestly - if the page says "Please, feel free to suggest > > new content in gcc-help mailing list" - practically nobody will contribute. > > Why not? Really? Ok, I'll explain: Many/most people familiar with collaboratively-edited resources such as Wikis or Q&A sites expect either immediate ability to edit content, or a requirement of at most website registration. What this line is telling visitors is (with slight over-dramatization): "Don't expect to be able to edit existing content on this page, ever. Don't expect to easily add content to this page, ever. If you want to even add anything to this page, you have to increase your level of commitment to that of being a mailing list member. You'll have to talk to people on that mailing list. You'll have to convince them your addition is important. Then maybe it'll be added." - this amounts to telling most people to go away.
[Bug web/85837] Listing of all error and warning messages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837 --- Comment #4 from Eyal Rozenberg --- (In reply to Jonathan Wakely from comment #3) > There's https://gcc.gnu.org/wiki/VerboseDiagnostics for a few such errors. Well, that's a (tiny) start... however: * I wouldn't have found it if you wouldn't have provided the link - and I did search the Wiki (albeit not very thoroughly) * I think that has low search engine visibility * I believe there should be some auto-generated skeleton of that (either a single page or multiple pages) which collects all error messages. * I would definitely separate the language-specific errors for different languages (perhaps an even finer separation into pages is called for, but certainly at least that) > This absolutely should be done by users, not the GCC developers. We're all > busy working on GCC already, and if we knew how to make the diagnostics > easier to understand then we'd already have done it. Fair enough, but, honestly - if the page says "Please, feel free to suggest new content in gcc-help mailing list" - practically nobody will contribute. Also, I'm sure that some of this could be adapted from from other sources online.
[Bug web/85837] Listing of all error and warning messages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837 --- Comment #2 from Eyal Rozenberg --- (In reply to Andrew Pinski from comment #1) > We try to improve error messages rather than list all of the error messages > out. But the listed error messages must balance readability/accessibility with conciseness. Specifically, an error message will never have a short example of a typical error and a correction. Or an explanation, in a few sentences, of a some concept referred to by the message, or a quotation of a paragraph from the language standard and so on.
[Bug web/85837] New: Listing of all error and warning messages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837 Bug ID: 85837 Summary: Listing of all error and warning messages Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: web Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Compiler error and warning messages are sometimes difficult to understand - especially (but not exclusively) for novice developers. They are also typically concise, and assume some knowledge of relevant terms, which the program author may not know, despite being able to write a program. I also note that many users repeatedly ask questions on web forums and Q&A sites (e.g. StackOverflow) regarding specific error messages they get - not just asking "what's wrong with my code which causes the error?", but rather "What does this message mean? I don't understand what it says." Now, the GCC manual does not seem include such a listing, and I could not find it on the Wiki either. Assuming it indeed doesn't exist - I believe that it should. If it does exist, then the bug is that it's difficult to notice/locate. Note that to realize such a listing it should be possible to harness more than just the GCC developers, if it's done through the Wiki. (Of course people would need to be attracted to the Wiki to assist in doing this.)
[Bug middle-end/84083] [missed optimization] loop-invariant strlen() not hoisted out of loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84083 --- Comment #4 from Eyal Rozenberg --- (In reply to Richard Biener from comment #3) > Yes, we don't currently implement restrict disambiguation for calls. So, would that account for the different compilation result for test1() and test2() in the following code: #include inline size_t my_strlen(const char* __restrict__ s) { const char* p = s; while(*p != '\0') { p++; } return p - s; } size_t test1() { static const char* hw = "Hello, world!"; return my_strlen(hw); } size_t test2() { static const char* hw = "Hello, world!"; return strlen(hw); } where test2() compiles to just returning a fixed value while test1() executes a loop (See https://godbolt.org/g/CvVxru) ?
[Bug rtl-optimization/84083] [missed optimization] loop-invariant strlen() not hoisted out of loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84083 --- Comment #2 from Eyal Rozenberg --- (In reply to Andrew Pinski from comment #1) > I think bar can still change the value of what ss points to. What, you mean, by walking up the stack? I don't see why the compiler should accommodate that by avoiding hoisting. If you mean because ss is also somehow visible to bar(), then - the __restrict__ guarantees we don't have to worry about that. IIANM.
[Bug rtl-optimization/84083] New: [missed optimization] loop-invariant strlen() not hoisted out of loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84083 Bug ID: 84083 Summary: [missed optimization] loop-invariant strlen() not hoisted out of loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Consider the following code: #include void bar(char c); void foo(const char* __restrict__ ss) { for (int i = 0; i < strlen(ss); ++i) { bar(*ss); } } To my understanding, the fact that ss is __restrict__ed (and the fact that it isn't written through or that it's a const pointer) is sufficient to allow the compiler to assume the memory accessible via ss remains constant, and thus that strlen(ss) will return the same value. But - that's not what happens (with GCC 7.3): .L6: movsx edi, BYTE PTR [rbp+0] mov rbx, r12 callbar(char) .L3: mov rdi, rbp lea r12, [rbx+1] callstrlen cmp rax, rbx ja .L6 (obtained with https://godbolt.org/g/vdGSBe ) Now, I'm no compiler expert, so maybe there are considerations I'm ignoring, but it seems to me the compiler should be able to hoist the heavier code up above the loop. Cf. https://stackoverflow.com/q/48482003/1593077
[Bug rtl-optimization/83952] [missed optimization] difference calculation for floats vs ints in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83952 --- Comment #8 from Eyal Rozenberg --- Andrew, Marc: Sorry for the mess with the other bug. If only Bugzilla had an "edit comment" feature I wouldn't have opened this second one.
[Bug rtl-optimization/83952] [missed optimization] difference calculation for floats vs ints in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83952 --- Comment #1 from Eyal Rozenberg --- Also seeing this with -O3 -fno-unroll-loops -fno-tree-loop-vectorize : https://godbolt.org/g/r2v7X8
[Bug rtl-optimization/83952] New: [missed optimization] difference calculation for floats vs ints in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83952 Bug ID: 83952 Summary: [missed optimization] difference calculation for floats vs ints in a loop Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Created attachment 43195 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43195&action=edit Code exemplifying the issue Consider the following code: template void foo(T* __restrict__ a) { int i; T val = 0; for (i = 0; i < 100; i++) { val = 2 * i; a[i] = val; } } template void foo(int* __restrict__ a); template void foo(float* __restrict__ a); (This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but the use of C++ here is immaterial). The int version compiles, with -O2, into: void foo(int*): xor eax, eax .L2: mov DWORD PTR [rdi], eax add eax, 2 add rdi, 4 cmp eax, 200 jne .L2 rep ret One would expect that the float version would compile into something similar, except that instead of rdi we would have a floating-point register, initialized to 0 and incremented by float 2.0 with each iteration. Instead, we get: void foo(float*): xor eax, eax .L6: pxorxmm0, xmm0 add rdi, 4 cvtsi2ssxmm0, eax add eax, 2 movss DWORD PTR [rdi-4], xmm0 cmp eax, 200 jne .L6 rep ret which seems to be much slower. Checked here: https://godbolt.org/g/t8Hvyn
[Bug rtl-optimization/83951] [missed optimization] difference calculation for floats vs ints in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951 Eyal Rozenberg changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from Eyal Rozenberg --- Whoops, some typos. Let me redo this. Sorry.
[Bug other/83951] [missed optimization] difference calculation for floats vs ints in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951 --- Comment #1 from Eyal Rozenberg --- Created attachment 43194 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43194&action=edit Source producing the optimized (int) and unopmitized (float) object code
[Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951 Bug ID: 83951 Summary: [missed optimization] difference calculation for floats vs ints in a loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Consider the following code: template int foo(T* __restrict__ a) { int i; T val = 0; for (i = 0; i < 100; i++) { val = 2 * i; a[i] = val; } } template int foo(int* __restrict__ a); template int foo(float* __restrict__ a); (This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but the use of C++ here is immaterial). The int version compiles, with -O2, into: foo(int*): xor eax, eax .L2: mov DWORD PTR [rdi], eax add eax, 2 add rdi, 4 cmp eax, 200 jne .L2 rep ret One would expect that the float version would compile into something similar, except that instead of rdi we would have a floating-point register, initialized to 0 and incremented by float 2.0 with each iteration. Instead, we get: int foo(float*): xor eax, eax .L6: pxorxmm0, xmm0 add rdi, 4 cvtsi2ssxmm0, eax add eax, 2 movss DWORD PTR [rdi-4], xmm0 cmp eax, 200 jne .L6 rep ret which seems to be much slower. Checked here: https://godbolt.org/g/RVBNyY
[Bug tree-optimization/59970] Bogus Wuninitialized warnings at low optimization levels
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59970 Eyal Rozenberg changed: What|Removed |Added CC||eyalroz at technion dot ac.il --- Comment #6 from Eyal Rozenberg --- Chiming in after having noticed this issue with GCC 5.4.0 20160609 on Linux MInt 18.1 with Boost 1.58.0 (using lexical_cast). Quite annoying...
[Bug rtl-optimization/78963] New: Missed optimization opportunity in copies of small unaligned data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78963 Bug ID: 78963 Summary: Missed optimization opportunity in copies of small unaligned data Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Preliminary notes: * This bug report stems from a StackOverflow question I asked: http://stackoverflow.com/q/41407257/1593077 * This bug regards the x86_64 architecture, but may apply elsewhere. * This bug regards -O3 optimizations * Everything described here is about the same for GCC 6.3 and 7 - whatever version of it GodBolt uses. * The entire bug is demonstrated here: https://godbolt.org/g/lDJSRm plus here https://godbolt.org/g/9Y2ebd Consider the task of copying 3-byte values from one place to another. If both those places are in memory, it seems reasonable to do four moves, and indeed GCC compiles this: #include typedef struct { unsigned char data[3]; } uint24_t; void f(uint24_t* __restrict__ dest, uint24_t* __restrict__ src) { memcpy(dest,src,3); } into this (clipping the instructions for the return value): f(uint24_t*, uint24_t*): movzx eax, WORD PTR [rsi] mov WORD PTR [rdi], ax movzx eax, BYTE PTR [rsi+2] mov BYTE PTR [rdi+2], al If the source or the destination is a register, two mov's should suffice - either the first two or the second two of the above. However, if I write this (perhaps contrived, but likely demonstrative of what could happen with larger programs, especially with multi-translation units, or when the OS gives you a pointer to work with etc): #include typedef struct { unsigned char data[3]; } uint24_t; void f(uint24_t* __restrict__ dest, uint24_t* __restrict__ src) { memcpy(dest,src,3); } int main() { uint24_t* p = (uint24_t*) 48; unsigned x; f((uint24_t*) &x,p); x += 1; f(p,(uint24_t*) &x); return 0; } The 3-byte value is "constructed" on the stack rather than in a register (first four mov's), and then one cannot avoid using four more mov's to copy it to the destination: movzx eax, WORD PTR ds:48 mov WORD PTR [rsp-4], ax movzx eax, BYTE PTR ds:50 mov BYTE PTR [rsp-2], al add DWORD PTR [rsp-4], 1 movzx eax, WORD PTR [rsp-4] mov WORD PTR ds:48, ax movzx eax, BYTE PTR [rsp-2] mov BYTE PTR ds:50, al If we do this with 4-byte values, i.e. replace uint24_t with uint32_t, it's a single mov both ways, and in fact it gets further optimized, so that this: #include #include void f(uint32_t* __restrict__ dest, uint32_t* __restrict__ src) { memcpy(dest,src,4); } int main() { uint32_t* p = (uint32_t*) 48; uint32_t x; f(&x,p); x += 1; f(p,&x); return 0; } is compiled into just this add DWORD PTR ds:48, 1 Now obviously you can't expect to optimize-out _that_ much with a 3-byte value, but 2 mov's in and 2 mov's out should be enough. Indeed, clang (since at least 3.4.1 or so) emits this for the uint24_t code: movzx eax, byte ptr [50] shl eax, 16 movzx ecx, word ptr [48] lea eax, [rcx + rax + 1] mov word ptr [48], ax shr eax, 16 mov byte ptr [50], al which has just four mov's.