[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 --- Comment #8 from Richard Biener --- Author: rguenth Date: Mon Aug 26 09:29:07 2019 New Revision: 274922 URL: https://gcc.gnu.org/viewcvs?rev=274922=gcc=rev Log: 2019-08-26 Richard Biener PR tree-optimization/91526 * passes.def: Note that after late FRE we do TODO_update_address_taken. * tree-ssa-sccvn.c (pass_fre::execute): In late mode schedule TODO_update_address_taken. Modified: trunk/gcc/ChangeLog trunk/gcc/passes.def trunk/gcc/tree-ssa-sccvn.c
[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 --- Comment #7 from joseph at codesourcery dot com --- There's more or less the same ABI question as in bug 91398 about whether there is any constraint on the called function writing to the return value slot in cases where it does not return normally. Supposing the ABI allows the return value slot (register or memory) to be written to by the called function even if it does not end up returning normally, then the optimization in this bug would be valid, while that in bug 91398 would not be valid if non-normal return is a possibility. (The example in the present bug also doesn't allow non-normal return, unless we say longjmp from a SIGFPE handler is OK - is -fnon-call-exceptions only needed for language exceptions or also for longjmp?) (Validity would also depend on it not affecting the observed address of the variable "result" in such a way as to make it equal to the observed address of some object in a calling function - but I expect the interesting cases for this optimization are where the variable is only stored to, not ones where addresses get compared, if it's even possible for the same return value slot to get used in more than one function on the call stack.)
[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 --- Comment #6 from Richard Biener --- Oh, and then, since we vectorized things, we do not NRV because || DECL_ALIGN (found) > DECL_ALIGN (result) thus we adjusted the VAR_DECLs alignment but the ABI says the return slot isn't appropriately aligned (well, we do not end up returning in memory, but...).
[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 --- Comment #5 from Richard Biener --- (In reply to Jakub Jelinek from comment #4) > > so the C++ FE already elides the return copy by placing 'result' in the > > return slot while the C FE doesn't do this. > > That's because in C++ the language requires NRV to be performed in certain > cases, while for C there is nothing like that and we do the tree NRV in that > case only much later (nrv pass). > > Joseph, any thoughts whether it would be a valid C FE optimization that > valid C programs can't observe? I think we're careful on the caller side not using the destination as return slot in aggr = foo (); already so no need to try to be clever on the callee-side? Fixing this might also fix some missed tail-calling. Note in this particular case the return value is returned via xmm0/xmm2 so the extra copy we create during gimplification is even more pointless. And I guess NRV doesn't do anything because of the CLOBBER? = result; result ={v} {CLOBBER}; return ; or simply because /* If this function does not return an aggregate type in memory, then there is nothing to do. */ if (!aggregate_value_p (result, current_function_decl)) return 0; I guess. Or because 'result' ends up as TREE_ADDRESSABLE for some reason!? create_iv does this, as part of vectorization but after that we never again do update_address_taken ... :/ I guess after late FRE would be a good time.
[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 Jakub Jelinek changed: What|Removed |Added CC||jsm28 at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- > so the C++ FE already elides the return copy by placing 'result' in the > return slot while the C FE doesn't do this. That's because in C++ the language requires NRV to be performed in certain cases, while for C there is nothing like that and we do the tree NRV in that case only much later (nrv pass). Joseph, any thoughts whether it would be a valid C FE optimization that valid C programs can't observe?
[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 --- Comment #3 from Andrew Pinski --- >Interestingly, even if the __restrict__ attribute is removed, it still gets >vectorized. Is this correct behavior? Yes as v1->v[0] cannot be the same as v2->v[1] or result->v[1], etc. due to the full object v1 can either be a fully different object or the same object as result but not overlapping objects.
[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 Joel Yliluoma changed: What|Removed |Added CC||bisqwit at iki dot fi --- Comment #2 from Joel Yliluoma --- The theory that it is related to RVO seems to be confirmed by the fact that if the code is changed like this: struct Vec { float v[8]; }; void multiply(struct Vec* result, const struct Vec* __restrict__ v1, const struct Vec* __restrict__ v2) { for(unsigned i = 0; i < 8; ++i) result->v[i] = v1->v[i] * v2->v[i]; } Then it gets compiled in the shorter and proper form. Interestingly, even if the __restrict__ attribute is removed, it still gets vectorized. Is this correct behavior?
[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target||x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2019-08-23 CC||jakub at gcc dot gnu.org, ||mpolacek at gcc dot gnu.org Component|target |c Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I think I've seen duplicates about this issue where C/C++ differ in the IL presented to the middle-end for the aggregate return stmt which in the end causes us to not elide an aggregate copy. Usually SRA deals with this but it has a hard job with heuristics and arrays... The C++ FE does ;; Function Vec multiply(const Vec*, const Vec*) (null) ;; enabled by -tree-original { struct Vec result [value-expr: ]; ^^^ while the C FE does ;; Function multiply (null) ;; enabled by -tree-original { struct Vec result; so the C++ FE already elides the return copy by placing 'result' in the return slot while the C FE doesn't do this. Let's make this a C enhancement request rather than a missed optimization during GIMPLE optimizations (which there are dups for already). Marek - any chance the C FE could do sth like this? Maybe we can also do this during gimplification, we'd have to see what constraints the C++ FE has for performing this.