[Bug middle-end/100363] gcc generating wider load/store than warranted at -O3

2021-04-30 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Linus Torvalds  changed:

   What|Removed |Added

 CC||torvalds@linux-foundation.o
   ||rg

--- Comment #4 from Linus Torvalds  ---
(In reply to Andrew Pinski from comment #1)
> The loop gets vectorized, I don't see the problem really.


See

   
https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/372

and in particular the comment

   "In the first 8-byte copy, src and dst overlap"

so apparently gcc has decided that they can't overlap, despite the two pointers
being literally generated from the same base pointer.

But I don't real arc assembly, so I'll have to take Vineet's word for it.

Vineet, have you been able to generate a smaller test-case?

[Bug middle-end/100363] gcc generating wider load/store than warranted at -O3

2021-04-30 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #3 from Vineet Gupta  ---
Created attachment 50723
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50723=edit
preprocessed source file (with extra nop annotation)

[Bug middle-end/100363] gcc generating wider load/store than warranted at -O3

2021-04-30 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #2 from Andrew Pinski  ---
Note in the tar file there is only:
inffast2.s  inffast2.s.aarch64.gcc10.O3  inffast2.s.aarch64.gcc9.O3 
inffast2.s.arc.gcc10.O3

[Bug middle-end/100363] gcc generating wider load/store than warranted at -O3

2021-04-30 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-04-30
 Status|UNCONFIRMED |WAITING

--- Comment #1 from Andrew Pinski  ---
The loop gets vectorized, I don't see the problem really.

Also I don't see the preprocessed source.  Can you attach that?

Is the problem that the loads have to be done in 2 bytes always from the
hardware?
If so then you need to mark the pointer as volatile.