[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #44 from Richard Biener --- (In reply to Richard Sandiford from comment #42) > Created attachment 57605 [details] > proof-of-concept patch to suppress peeling for gaps > > How about the attached? It records whether all accesses

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-05 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #43 from rguenther at suse dot de --- On Mon, 4 Mar 2024, rsandifo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 > > --- Comment #41 from Richard Sandiford --- > (In reply to Richard Biener from

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Richard Sandiford changed: What|Removed |Added Attachment #57602|0 |1 is obsolete|

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #41 from Richard Sandiford --- (In reply to Richard Biener from comment #40) > So I wonder if we can use "local costing" to decide a gather is always OK > compared to the alternative with peeling for gaps. On x86 gather tends > to

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #40 from Richard Biener --- So I wonder if we can use "local costing" to decide a gather is always OK compared to the alternative with peeling for gaps. On x86 gather tends to be slow compared to open-coding it. In the future we

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #39 from Richard Sandiford --- (In reply to Richard Sandiford from comment #38) > (In reply to Richard Biener from comment #37) > > Even more iteration looks bad. I do wonder why when gather can avoid > > peeling for GAPs using

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #38 from Richard Sandiford --- (In reply to Richard Biener from comment #37) > Even more iteration looks bad. I do wonder why when gather can avoid > peeling for GAPs using load-lanes cannot? Like you say, we don't realise that all

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #37 from Richard Biener --- (In reply to Richard Sandiford from comment #36) > Created attachment 57602 [details] > proof-of-concept patch to suppress peeling for gaps > > This patch does what I suggested in the previous comment:

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #36 from Richard Sandiford --- Created attachment 57602 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57602=edit proof-of-concept patch to suppress peeling for gaps This patch does what I suggested in the previous comment:

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #35 from Richard Sandiford --- Maybe I've misunderstood the flow of the ticket, but it looks to me like we do still correctly recognise the truncating scatter stores. And, on their own, we would be able to convert them into masked

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-01 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #34 from rguenther at suse dot de --- On Fri, 1 Mar 2024, rsandifo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 > > --- Comment #33 from Richard Sandiford --- > Can you give me a chance to look

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-01 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #33 from Richard Sandiford --- Can you give me a chance to look at it a bit when I back? This doesn't feel like the way to go to me.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-01 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #32 from Richard Biener --- (In reply to Richard Sandiford from comment #31) > (In reply to Tamar Christina from comment #29) > > This works fine for normal gather and scatters but doesn't work for widening > > gathers and narrowing

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-29 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #31 from Richard Sandiford --- (In reply to Tamar Christina from comment #29) > This works fine for normal gather and scatters but doesn't work for widening > gathers and narrowing scatters which only the pattern seems to handle.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #30 from Richard Biener --- The x86 and "emulation" paths handle narrowing/widening during code generation (but yes, the IFN path doesn't). A fix would be to do similar as for the gs_info.decl case in vectorizable_load/store and

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Tamar Christina changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org ---

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-26 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #28 from rguenther at suse dot de --- On Mon, 26 Feb 2024, tnfchris at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 > > --- Comment #27 from Tamar Christina --- > Created attachment 57538 > -->

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #27 from Tamar Christina --- Created attachment 57538 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57538=edit proposed1.patch proposed patch, this gets the gathers and scatters back. doing regression run.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Summary|[14 Regression]

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #25 from Tamar Christina --- > > void record_nonwrapping_chrec (tree chrec) > > { > > - CHREC_NOWRAP(chrec) = 1; > > + CHREC_NOWRAP(chrec) = 0; > > > >if (dump_file && (dump_flags & TDF_SCEV)) > > { > > Hmmm. With

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #24 from JuzheZhong --- (In reply to Richard Biener from comment #19) > (In reply to Richard Biener from comment #18) > > (In reply to Tamar Christina from comment #17) > > > Ok, bisected to > > > > > >

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #23 from Tamar Christina --- tamar:~/gcc-dsg/test$ extract-toolchain gcc 2efe3a7de01 A 1514 files D 0 files M 0 files Extracted 'origin/manygcc-basepoints-gcc-14-6292-g2f512f6fcdd:2efe3a7de01' > ./bin/gcc -S -o

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #22 from Tamar Christina --- for me with `-fno-vect-cost-model` on without this commit we generate https://gist.github.com/Mistuke/d9252bfcb2aa766327c5f377e162f5b7 for the loop and with the commit well.. it doesn't fit on the screen

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #21 from Richard Biener --- On aarch64 I can see already GCC 13.2 looking very much different from 12.3, but I can't decipher the code to decide whether 12.3 vectorizes the loop or not. trunk looks similar to 13.2 here, so the

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #20 from Richard Biener --- (In reply to Richard Biener from comment #19) > (In reply to Richard Biener from comment #18) > > (In reply to Tamar Christina from comment #17) > > > Ok, bisected to > > > > > >

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #19 from Richard Biener --- (In reply to Richard Biener from comment #18) > (In reply to Tamar Christina from comment #17) > > Ok, bisected to > > > > g:2efe3a7de0107618397264017fb045f237764cc7 is the first bad commit > > commit

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #18 from Richard Biener --- (In reply to Tamar Christina from comment #17) > Ok, bisected to > > g:2efe3a7de0107618397264017fb045f237764cc7 is the first bad commit > commit 2efe3a7de0107618397264017fb045f237764cc7 > Author: Hao Liu

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #17 from Tamar Christina --- Ok, bisected to g:2efe3a7de0107618397264017fb045f237764cc7 is the first bad commit commit 2efe3a7de0107618397264017fb045f237764cc7 Author: Hao Liu Date: Wed Dec 6 14:52:19 2023 +0800

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #16 from Tamar Christina --- (In reply to rguent...@suse.de from comment #13) > > > You could check if we call this with sane values. > > > > Do you mean it's RISC-V backend cost model issue ? > > I responded to Tamar which means

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #15 from rguenther at suse dot de --- On Tue, 23 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 > > --- Comment #14 from JuzheZhong --- > I just tried again both GCC-13.2 and

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #14 from JuzheZhong --- I just tried again both GCC-13.2 and GCC-14 with -fno-vect-cost-model. https://godbolt.org/z/enEG3qf5K GCC-14 requires scalar epilogue loop, whereas GCC-13.2 doesn't. I believe it's not cost model issue.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #13 from rguenther at suse dot de --- On Tue, 23 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 > > --- Comment #12 from JuzheZhong --- > (In reply to Richard Biener from

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #12 from JuzheZhong --- (In reply to Richard Biener from comment #11) > (In reply to Tamar Christina from comment #9) > > There is a weird costing going on in the PHI nodes though: > > > > m_108 = PHI 1 times vector_stmt costs 0

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #11 from Richard Biener --- (In reply to Tamar Christina from comment #9) > There is a weird costing going on in the PHI nodes though: > > m_108 = PHI 1 times vector_stmt costs 0 in body > m_108 = PHI 2 times scalar_to_vec costs

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #10 from JuzheZhong --- (In reply to Tamar Christina from comment #9) > So on SVE the change is cost modelling. > > Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed > the compiler's defaults to using the

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #9 from Tamar Christina --- So on SVE the change is cost modelling. Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed the compiler's defaults to using the new throughput matched cost modelling used be newer

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Richard Biener changed: What|Removed |Added Target Milestone|13.3|14.0 Summary|[13/14

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-17 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #1 from JuzheZhong --- GCC trunk RVV also process 11 elements in vector: https://godbolt.org/z/q9bb8Gj4G ``` vsetivlizero,11,e32,m1,ta,ma ``` vector codes ``` lh s8,0(t4) lh t4,0(t1) ld