[Bug rtl-optimization/15792] missed subreg optimization

2023-05-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||

--- Comment #12 from Andrew Pinski  ---
(In reply to Gabriel Ravier from comment #11)
> Seems like the issue is present again, except it's test1 that gets the
> better asm now. Perhaps this should be re-opened ?

This bug was about 32bit x86 and the code looks good in GCC 9, 10, 11, and 12
and the trunk.
If you were testing on x86_64, you need to use __int128_t to see what the
original issue was about:
void gh();
void test(__int128_t x) {
long g = (long)x|((long)(x>>64));
  if (g) gh();
}
void  test1(__int128_t x) {
  if (x) gh();
}

GCC 4.8+ produces:
test1:
.cfi_startproc
orq %rdi, %rsi
jne .L7
rep ret

For both. There was an extra mov in GCC 4.5.0-4.7.0 for test though. In GCC
4.4.0, test1 was two compare and jumps (ok). GCC 4.1.2 had the bad code
generation which was mentioned in comment #0.

[Bug rtl-optimization/15792] missed subreg optimization

2021-10-14 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #11 from Gabriel Ravier  ---
Seems like the issue is present again, except it's test1 that gets the better
asm now. Perhaps this should be re-opened ?

[Bug rtl-optimization/15792] missed subreg optimization

2007-11-09 Thread rask at gcc dot gnu dot org


--- Comment #10 from rask at gcc dot gnu dot org  2007-11-10 00:15 ---
This was fixed in 4.3.0.


-- 

rask at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Keywords||ra
  Known to fail||4.1.2 4.2.0 4.2.1 4.2.2
  Known to work||4.3.0
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792



[Bug rtl-optimization/15792] missed subreg optimization

2006-02-07 Thread ian at airs dot com


--- Comment #9 from ian at airs dot com  2006-02-07 08:23 ---
I now have a reasonably simple reload patch which eliminates the unnecessary
move.  For the test case in comment #4, I get this code with -O2
-momit-leaf-frame-pointer:

foo:
movl12(%esp), %eax
movl16(%esp), %edx
addl4(%esp), %eax
adcl8(%esp), %edx
orl %eax, %edx
jne .L7
rep ; ret
.p2align 4,,7
.L7:
jmp gh


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792



[Bug rtl-optimization/15792] missed subreg optimization

2006-02-06 Thread tony dot linthicum at amd dot com


--- Comment #7 from tony dot linthicum at amd dot com  2006-02-06 17:13 
---
So do I, at least for the original code (i.e. test and test1).  I'm curious,
though, if you've tried the example that I listed above (foo).  I still get
subregs with that one, though I honestly don't recall at the moment whether or
not it makes the register allocator screw up or not (I *think* it does, but I'd
have to check).  Either way, though, the presence of the subregs provides the
needed fodder for RA badness so I'm curious if it's present in what you're
working on.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792



[Bug rtl-optimization/15792] missed subreg optimization

2006-02-06 Thread ian at airs dot com


--- Comment #8 from ian at airs dot com  2006-02-07 00:30 ---
Yes, I still get an unnecessary move in your test case which uses addition.

One reason this happens is because the addition can not be split until after
the reload pass is complete.  That is because the add relies on the condition
code registers, but reload can clobber the condition code registers between any
arbitrary pair of instructions.

Another reason this happens is that the compiler knows how to set the condition
flags using a bitwise or, but it does so using a scratch register to hold the
destination of the bitwise or.  The register allocator is not clever enough to
see that if it has a DImode pair of registers which dies in the insn, that it
can use the second register in the DImode pair as the scratch register.  If the
register allocator saw that, then it could use that register as the scratch
register and avoid allocating a new scratch register and copying the value into
it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792



[Bug rtl-optimization/15792] missed subreg optimization

2006-02-02 Thread ian at airs dot com


--- Comment #6 from ian at airs dot com  2006-02-02 18:18 ---
With the version of RTH's subreg lowering pass which I am working on, I get
identical code for both functions:

test1:
movl8(%esp), %eax
orl 4(%esp), %eax
jne .L7
ret
.p2align 4,,7
.L7:
jmp gh


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792



[Bug rtl-optimization/15792] missed subreg optimization

2006-01-20 Thread tony dot linthicum at amd dot com


--- Comment #4 from tony dot linthicum at amd dot com  2006-01-20 15:48 
---
I've been looking at this a bit, and tried the patch.  It does indeed fix the
problem in test1 above, but it does not appear to be the complete solution. 
The load of 'x' in test1 is actually split fairly early, and from what I can
tell, the  superfluous move is actually the result of the register allocator
doing a poor job of live range analysis when confronted with subregs.  I
suspect this is why most things (i.e. those things other than branches) are not
split into subregs until after reload.  Unfortunately, the subreg lowering
won't touch a subreg if it's seen a reference to the inner register so we get
the same unnecessary move if the code looks like:

foo(long long y, long long z) 
{
  unsigned long long x;

  x = y + z;
  if (x) gh();
}

I'm going to experiment with moving where the subreg lowering code occurs and
moving up the splitting into subregs and see if I can get the desired results. 
I'm pretty new to GCC, so if any of the above seems like I'm off in the weeds
then please let me know.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792



[Bug rtl-optimization/15792] missed subreg optimization

2006-01-20 Thread pinskia at gcc dot gnu dot org


--- Comment #5 from pinskia at gcc dot gnu dot org  2006-01-20 15:52 ---
(In reply to comment #4)
 I'm going to experiment with moving where the subreg lowering code occurs and
 moving up the splitting into subregs and see if I can get the desired 
 results. 
 I'm pretty new to GCC, so if any of the above seems like I'm off in the weeds
 then please let me know.

This seems right but the other issue is that register allocator allocates DI as
two consecutive register as one (that might be only part of the cause).


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||tony dot linthicum at amd
   ||dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792



[Bug rtl-optimization/15792] missed subreg optimization

2006-01-17 Thread pinskia at gcc dot gnu dot org


--- Comment #3 from pinskia at gcc dot gnu dot org  2006-01-18 04:45 ---
The problem here is that we don't split up the subregister early before
register allocation.
If we split it up before combine, we would be able to combine the or and get
the more optimial results.

A patch like
http://gcc.gnu.org/ml/gcc-patches/2005-05/msg00554.html
should help.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792