[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 --- Comment #6 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b60bc913cca7439d29a7ec9e9a7f448d8841b43c commit r12-7721-gb60bc913cca7439d29a7ec9e9a7f448d8841b43c Author: Jakub Jelinek Date: Sat Mar 19 13:53:12 2022 +0100 i386: Don't emit pushf;pop for __builtin_ia32_readeflags_u* with unused lhs [PR104971] __builtin_ia32_readeflags_u* aren't marked const or pure I think intentionally, so that they aren't CSEd from different regions of a function etc. because we don't and can't easily track all dependencies between it and surrounding code (if somebody looks at the condition flags, it is dependent on the vast majority of instructions). But the builtin itself doesn't have any side-effects, so if we ignore the result of the builtin, there is no point to emit anything. There is a LRA bug that miscompiles the testcase which this patch makes latent, which is certainly worth fixing too, but IMHO this change (and maybe ix86_gimple_fold_builtin too which would fold it even earlier when it looses lhs) is worth it as well. 2022-03-19 Jakub Jelinek PR middle-end/104971 * config/i386/i386-expand.cc (ix86_expand_builtin) : If ignore, don't push/pop anything and just return const0_rtx. * gcc.target/i386/pr104971.c: New test.
[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 --- Comment #5 from Jakub Jelinek --- Created attachment 52650 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52650&action=edit gcc12-pr104971.patch Here is untested patch with the optimization not to expand the pushf/pop at all if we don't need the result. Though, the LRA bug will remain latent with it and should be fixed anyway.
[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 Marek Polacek changed: What|Removed |Added Last reconfirmed||2022-03-17 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC||mpolacek at gcc dot gnu.org --- Comment #4 from Marek Polacek --- Confirmed then.
[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 --- Comment #3 from Andrew Cooper --- So yes - my experimentation did start from investigating the memory ordering behaviour of these builtins, based on a thread on LKML. The pushf in readflags and popf in writeflags have wildly different ordering requirements, depending on which flags are wanted/modified. AC for example (and IF for kernels) need to not be reordered with respect to any memory access. As you observe, readflags in particular needs to not be reordered with any instruction that modifies the arithmetic flags (which is most of them). IMO, it would be safe to omit the pushf from readflags if the result is not not used, because there are no unexpected side effects for pushf. The same is not true of popf in writeflags, which has side effects even when written twice with the same value.
[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 Jakub Jelinek changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- Might be useful to drop the builtin on the floor during expansion if it doesn't have lhs as an optimization (I think it is intentionally not const or pure because it depends on any code that sets the flags; though it seems rather poorly defined because we don't have any dependencies on the surrounding code, but after all, user can't know what arithmetic instruction that modifies flags we decide to use last before the builtin). That said, LRA shouldn't optimize away instructions like: (insn 6 5 0 2 (set (reg:DI 82) (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0 S8 A8])) "pr104971.c":3:14 62 {*popdi1} (expr_list:REG_UNUSED (reg:DI 82) (nil))) just because they are unused, because they have side-effects.
[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 Jakub Jelinek changed: What|Removed |Added Target Milestone|--- |9.5 Summary|Optimisation for|[9/10/11/12 Regression] |__builtin_ia32_readeflags |Optimisation for |corrupts the stack |__builtin_ia32_readeflags ||corrupts the stack CC||jakub at gcc dot gnu.org Priority|P3 |P2 --- Comment #1 from Jakub Jelinek --- The builtin has been implemented in r0-127024-g9bbd48d120d203e8eade09e0bb830370b6d69801 and the pop is optimized away with pushf kept since r5-4951-g4ab74a01477d4089a3474d52479ed372c9b5ae29 so this is a regression from GCC 4.9.