[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

--- Comment #8 from Richard Biener  ---
Author: rguenth
Date: Mon Aug 26 09:29:07 2019
New Revision: 274922

URL: https://gcc.gnu.org/viewcvs?rev=274922=gcc=rev
Log:
2019-08-26  Richard Biener  

PR tree-optimization/91526
* passes.def: Note that after late FRE we do TODO_update_address_taken.
* tree-ssa-sccvn.c (pass_fre::execute): In late mode schedule
TODO_update_address_taken.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/passes.def
trunk/gcc/tree-ssa-sccvn.c

[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-23 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

--- Comment #7 from joseph at codesourcery dot com  ---
There's more or less the same ABI question as in bug 91398 about whether 
there is any constraint on the called function writing to the return value 
slot in cases where it does not return normally.

Supposing the ABI allows the return value slot (register or memory) to be 
written to by the called function even if it does not end up returning 
normally, then the optimization in this bug would be valid, while that in 
bug 91398 would not be valid if non-normal return is a possibility.  (The 
example in the present bug also doesn't allow non-normal return, unless we 
say longjmp from a SIGFPE handler is OK - is -fnon-call-exceptions only 
needed for language exceptions or also for longjmp?)

(Validity would also depend on it not affecting the observed address of 
the variable "result" in such a way as to make it equal to the observed 
address of some object in a calling function - but I expect the 
interesting cases for this optimization are where the variable is only 
stored to, not ones where addresses get compared, if it's even possible 
for the same return value slot to get used in more than one function on 
the call stack.)

[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

--- Comment #6 from Richard Biener  ---
Oh, and then, since we vectorized things, we do not NRV because

 || DECL_ALIGN (found) > DECL_ALIGN (result)

thus we adjusted the VAR_DECLs alignment but the ABI says the return slot
isn't appropriately aligned (well, we do not end up returning in memory,
but...).

[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

--- Comment #5 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #4)
> > so the C++ FE already elides the return copy by placing 'result' in the
> > return slot while the C FE doesn't do this.
> 
> That's because in C++ the language requires NRV to be performed in certain
> cases, while for C there is nothing like that and we do the tree NRV in that
> case only much later (nrv pass).
> 
> Joseph, any thoughts whether it would be a valid C FE optimization that
> valid C programs can't observe?

I think we're careful on the caller side not using the destination as
return slot in

  aggr = foo ();

already so no need to try to be clever on the callee-side?  Fixing this
might also fix some missed tail-calling.

Note in this particular case the return value is returned via xmm0/xmm2
so the extra copy we create during gimplification is even more pointless.

And I guess NRV doesn't do anything because of the CLOBBER?

   = result;
  result ={v} {CLOBBER};
  return ;

or simply because

  /* If this function does not return an aggregate type in memory, then
 there is nothing to do.  */
  if (!aggregate_value_p (result, current_function_decl))
return 0;

I guess.  Or because 'result' ends up as TREE_ADDRESSABLE for some
reason!?  create_iv does this, as part of vectorization but after
that we never again do update_address_taken ... :/  I guess
after late FRE would be a good time.

[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-23 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jsm28 at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
> so the C++ FE already elides the return copy by placing 'result' in the
> return slot while the C FE doesn't do this.

That's because in C++ the language requires NRV to be performed in certain
cases, while for C there is nothing like that and we do the tree NRV in that
case only much later (nrv pass).

Joseph, any thoughts whether it would be a valid C FE optimization that valid C
programs can't observe?

[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-23 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

--- Comment #3 from Andrew Pinski  ---
>Interestingly, even if the __restrict__ attribute is removed, it still gets 
>vectorized. Is this correct behavior?

Yes as v1->v[0] cannot be the same as v2->v[1] or result->v[1], etc. due to the
full object v1 can either be a fully different object or the same object as
result but not overlapping objects.

[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-23 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

Joel Yliluoma  changed:

   What|Removed |Added

 CC||bisqwit at iki dot fi

--- Comment #2 from Joel Yliluoma  ---
The theory that it is related to RVO seems to be confirmed by the fact that if
the code is changed like this:

   struct Vec { float v[8]; };
   void multiply(struct Vec* result,
 const struct Vec* __restrict__ v1,
 const struct Vec* __restrict__ v2)
   {
   for(unsigned i = 0; i < 8; ++i)
   result->v[i] = v1->v[i] * v2->v[i];
   }

Then it gets compiled in the shorter and proper form. Interestingly, even if
the __restrict__ attribute is removed, it still gets vectorized. Is this
correct behavior?

[Bug c/91526] Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode)

2019-08-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-08-23
 CC||jakub at gcc dot gnu.org,
   ||mpolacek at gcc dot gnu.org
  Component|target  |c
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I think I've seen duplicates about this issue where C/C++ differ in the IL
presented to the middle-end for the aggregate return stmt which in the
end causes us to not elide an aggregate copy.  Usually SRA deals with
this but it has a hard job with heuristics and arrays...

The C++ FE does

;; Function Vec multiply(const Vec*, const Vec*) (null)
;; enabled by -tree-original


{
  struct Vec result [value-expr: ];
^^^

while the C FE does

;; Function multiply (null)
;; enabled by -tree-original


{
  struct Vec result;

so the C++ FE already elides the return copy by placing 'result' in the
return slot while the C FE doesn't do this.

Let's make this a C enhancement request rather than a missed optimization
during GIMPLE optimizations (which there are dups for already).

Marek - any chance the C FE could do sth like this?  Maybe we can also
do this during gimplification, we'd have to see what constraints the C++
FE has for performing this.