https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #28 from Linus Torvalds ---
(In reply to Roger Sayle from comment #27)
> This should now be fixed on both mainline and the GCC 12 release branch.
Thanks everybody.
Looks like the xchg optimization isn't in the gcc-12 release
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Roger Sayle changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #26 from CVS Commits ---
The master branch has been updated by Roger Sayle :
https://gcc.gnu.org/g:00193676a5a3e7e50e1fa6646bb5abb5a7b2acbb
commit r13-1362-g00193676a5a3e7e50e1fa6646bb5abb5a7b2acbb
Author: Roger Sayle
Date: Thu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #25 from Jakub Jelinek ---
(In reply to Linus Torvalds from comment #23)
> (In reply to Jakub Jelinek from comment #22)
> >
> > If the wider registers are narrowed before register allocation, it is just
> > a pair like (reg:SI 123)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #24 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #23)
>
> And this now brings back my memory of the earlier similar discussion - it
> wasn't about DImode code generation, it was about bitfield code generation
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #23 from Linus Torvalds ---
(In reply to Jakub Jelinek from comment #22)
>
> If the wider registers are narrowed before register allocation, it is just
> a pair like (reg:SI 123) (reg:SI 256) and it can be allowed anywhere.
That
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #22 from Jakub Jelinek ---
(In reply to Linus Torvalds from comment #21)
> Whee.
>
> Why does gcc have that constraint, btw? I tried to look at the clang code
> generation once more, and I don't *think* clang has the same
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #21 from Linus Torvalds ---
(In reply to CVS Commits from comment #20)
>
> One might think
> that splitting early gives the register allocator more freedom to
> use available registers, but in practice the constraint
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #20 from CVS Commits ---
The master branch has been updated by Roger Sayle :
https://gcc.gnu.org/g:3b8794302b52a819ca3ea78238e9b5025d1c56dd
commit r13-1239-g3b8794302b52a819ca3ea78238e9b5025d1c56dd
Author: Roger Sayle
Date: Fri
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Arnd Bergmann changed:
What|Removed |Added
CC||arnd at linaro dot org
--- Comment #19
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #18 from Jakub Jelinek ---
Of course, size comparisons of -O2 code aren't the most important, for -O2 it
is more important how fast the code is.
When comparing -Os -m32 -mno-mmx -mno-sse, the numbers are
sub on %esp412 2564
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Jakub Jelinek changed:
What|Removed |Added
Keywords|needs-bisection |
--- Comment #17 from Jakub Jelinek
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #16 from Jakub Jelinek ---
Though, ix86_rot{l,r}di3_doubleword define_insn_and_split patterns were split
only after reload both before and after Roger's change, so somehow whether we
emit it as SImode from the beginning or only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #15 from Richard Biener ---
So we feed DImode rotates into RA which constrains register allocation
enough to require spills (all 4 DImode vals are live across the kernel,
not even -fschedule-insn can do anything here). I wonder if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #14 from Linus Torvalds ---
(In reply to Samuel Neves from comment #13)
> Something simple like this -- https://godbolt.org/z/61orYdjK7 -- already
> exhibits the effect.
Yup.
That's a much better test-case. I think you should
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #13 from Samuel Neves ---
Something simple like this -- https://godbolt.org/z/61orYdjK7 -- already
exhibits the effect.
Furthermore, and this also applies to the full BLAKE2b compression function, if
you replace all the xors in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #12 from Linus Torvalds ---
(In reply to Jakub Jelinek from comment #11)
> Anyway, I think we need to understand what makes it spill that much more,
> and unfortunately the testcase is too large to find that out easily, I think
> we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #10 from Linus Torvalds ---
(In reply to Roger Sayle from comment #7)
> Investigating. Adding -mno-stv the stack size reduces from 2612 to 428 (and
> on godbolt the number of assembler lines reduces from 6952 to 6203).
So now that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #9 from Linus Torvalds ---
Looks like STV is "scalar to vector" and it should have been disabled
automatically by the -mno-avx flag anyway.
And the excessive stack usage was perhaps due to GCC preparing all those stack
slots for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #8 from Linus Torvalds ---
(In reply to Roger Sayle from comment #7)
> Investigating. Adding -mno-stv the stack size reduces from 2612 to 428 (and
> on godbolt the number of assembler lines reduces from 6952 to 6203).
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Roger Sayle changed:
What|Removed |Added
Ever confirmed|0 |1
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Samuel Neves changed:
What|Removed |Added
CC||sneves at dei dot uc.pt
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #5 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #4)
>
> I'm not proud of that hacky thing, but since gcc documentation is written
> in sanskrit, and mere mortals can't figure it out, it's the best I could do.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #4 from Linus Torvalds ---
So hey, since you guys use git now, I thought I might as well just bisect this.
Now, I have no idea what the best and most efficient way is to generate only
"cc1", so my bisection run was this unholy mess
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #3 from Linus Torvalds ---
Created attachment 53123
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53123=edit
Mindless revert that fixes things for me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #2 from Andrew Pinski ---
thumb1 (which has 16 registers but really only 8 are GPRs) does not have this
issue in GCC 12, so I suspect a target specific change caused this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Andrew Pinski changed:
What|Removed |Added
Keywords||needs-bisection
Target Milestone|---
28 matches
Mail list logo