[Bug tree-optimization/94401] [10 Regression] pr92420.c fails on aarch64 since r10-7415
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94401 Kewen Lin changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #7 from Kewen Lin --- Should be fixed now.
[Bug tree-optimization/94401] [10 Regression] pr92420.c fails on aarch64 since r10-7415
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94401 --- Comment #6 from CVS Commits --- The master branch has been updated by Kewen Lin : https://gcc.gnu.org/g:81ce375d1fdd99f9d93b00f4895eab74c3d8b54a commit r10-7519-g81ce375d1fdd99f9d93b00f4895eab74c3d8b54a Author: Kewen Lin Date: Thu Apr 2 08:48:03 2020 -0500 Fix PR94401 by considering reverse overrun The commit r10-7415 brings scalar type consideration to eliminate epilogue peeling for gaps, but it exposed one problem that the current handling doesn't consider the memory access type VMAT_CONTIGUOUS_REVERSE, for which the overrun happens on low address side. This patch is to make the code take care of it by updating the offset and construction element order accordingly. Bootstrapped/regtested on powerpc64le-linux-gnu P8 and aarch64-linux-gnu. 2020-04-02 Kewen Lin gcc/ChangeLog PR tree-optimization/94401 * tree-vect-loop.c (vectorizable_load): Handle VMAT_CONTIGUOUS_REVERSE access type when loading halves of vector to avoid peeling for gaps.
[Bug tree-optimization/94401] [10 Regression] pr92420.c fails on aarch64 since r10-7415
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94401 --- Comment #5 from Kewen Lin --- Created attachment 48150 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48150=edit untested patch This can fix the REG failures on aarch64.
[Bug tree-optimization/94401] [10 Regression] pr92420.c fails on aarch64 since r10-7415
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94401 Kewen Lin changed: What|Removed |Added CC||segher at gcc dot gnu.org, ||wschmidt at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Kewen Lin --- My commit extends the current scalar epilogue peeling for gaps elimination, it makes the case can make use of int for the construction. But it reveals the existing handlings misses to handle VMAT_CONTIGUOUS_REVERSE case, currently it assumes overrun happens on high address end, it's true for almost all cases, but this case is on the low address end. So if we have to load the high part and put it in the latter part of constructed vector for VMAT_CONTIGUOUS_REVERSE. The IR before/after the commit looks good: vect__9.16_80 = MEM [(int *)vectp_y.14_78]; vect__9.17_81 = VEC_PERM_EXPR ; vect__9.18_82 = VEC_PERM_EXPR ; bad: _30 = MEM[(int *)vectp_y.12_34]; _20 = {_30, 0}; vect__9.14_19 = VIEW_CONVERT_EXPR(_20); vect__9.15_61 = VEC_PERM_EXPR ; vect__9.16_54 = VEC_PERM_EXPR ;
[Bug tree-optimization/94401] [10 Regression] pr92420.c fails on aarch64 since r10-7415
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94401 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-03-30 CC||law at redhat dot com Status|UNCONFIRMED |NEW --- Comment #3 from Jeffrey A. Law --- Confirmed. My tester tripped over this as well. No special options are needed at configure time.
[Bug tree-optimization/94401] [10 Regression] pr92420.c fails on aarch64 since r10-7415
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94401 Richard Biener changed: What|Removed |Added Target||aarch64 Version|unknown |10.0 Target Milestone|--- |10.0 Keywords||wrong-code Priority|P3 |P1 Summary|pr92420.c fails on aarch64 |[10 Regression] pr92420.c |since r10-7415 |fails on aarch64 since ||r10-7415