[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b60bc913cca7439d29a7ec9e9a7f448d8841b43c

commit r12-7721-gb60bc913cca7439d29a7ec9e9a7f448d8841b43c
Author: Jakub Jelinek 
Date:   Sat Mar 19 13:53:12 2022 +0100

i386: Don't emit pushf;pop for __builtin_ia32_readeflags_u* with unused lhs
[PR104971]

__builtin_ia32_readeflags_u* aren't marked const or pure I think
intentionally, so that they aren't CSEd from different regions of a
function
etc. because we don't and can't easily track all dependencies between
it and surrounding code (if somebody looks at the condition flags, it is
dependent on the vast majority of instructions).
But the builtin itself doesn't have any side-effects, so if we ignore the
result of the builtin, there is no point to emit anything.

There is a LRA bug that miscompiles the testcase which this patch makes
latent, which is certainly worth fixing too, but IMHO this change
(and maybe ix86_gimple_fold_builtin too which would fold it even earlier
when it looses lhs) is worth it as well.

2022-03-19  Jakub Jelinek  

PR middle-end/104971
* config/i386/i386-expand.cc
(ix86_expand_builtin) : If ignore,
don't push/pop anything and just return const0_rtx.

* gcc.target/i386/pr104971.c: New test.

[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

--- Comment #5 from Jakub Jelinek  ---
Created attachment 52650
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52650&action=edit
gcc12-pr104971.patch

Here is untested patch with the optimization not to expand the pushf/pop at all
if we don't need the result.  Though, the LRA bug will remain latent with it
and should be fixed anyway.

[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-17 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

Marek Polacek  changed:

   What|Removed |Added

   Last reconfirmed||2022-03-17
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||mpolacek at gcc dot gnu.org

--- Comment #4 from Marek Polacek  ---
Confirmed then.

[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-17 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

--- Comment #3 from Andrew Cooper  ---
So yes - my experimentation did start from investigating the memory ordering
behaviour of these builtins, based on a thread on LKML.

The pushf in readflags and popf in writeflags have wildly different ordering
requirements, depending on which flags are wanted/modified.  AC for example
(and IF for kernels) need to not be reordered with respect to any memory
access.

As you observe, readflags in particular needs to not be reordered with any
instruction that modifies the arithmetic flags (which is most of them).

IMO, it would be safe to omit the pushf from readflags if the result is not not
used, because there are no unexpected side effects for pushf.

The same is not true of popf in writeflags, which has side effects even when
written twice with the same value.

[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

Jakub Jelinek  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Might be useful to drop the builtin on the floor during expansion if it doesn't
have lhs as an optimization (I think it is intentionally not const or pure
because it depends on any code that sets the flags; though it seems rather
poorly defined because we don't have any dependencies on the surrounding code,
but after all, user can't know what arithmetic instruction that modifies flags
we decide to use last before the builtin).
That said, LRA shouldn't optimize away instructions like:
(insn 6 5 0 2 (set (reg:DI 82)
(mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0  S8 A8])) "pr104971.c":3:14 62
{*popdi1}
 (expr_list:REG_UNUSED (reg:DI 82)
(nil)))
just because they are unused, because they have side-effects.

[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|--- |9.5
Summary|Optimisation for|[9/10/11/12 Regression]
   |__builtin_ia32_readeflags   |Optimisation for
   |corrupts the stack  |__builtin_ia32_readeflags
   ||corrupts the stack
 CC||jakub at gcc dot gnu.org
   Priority|P3  |P2

--- Comment #1 from Jakub Jelinek  ---
The builtin has been implemented in
r0-127024-g9bbd48d120d203e8eade09e0bb830370b6d69801
and the pop is optimized away with pushf kept since
r5-4951-g4ab74a01477d4089a3474d52479ed372c9b5ae29
so this is a regression from GCC 4.9.