[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #55 from Jeffrey A. Law --- Per c#54. If it turns out we're wrong, we can always reopen or file a new report.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #54 from John David Anglin --- The f-m-o issue is probably fixed.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #53 from GCC Commits --- The master branch has been updated by John David Anglin : https://gcc.gnu.org/g:f0fda1aff0b752e4182c009c5526b9306bd35f7c commit r14-9511-gf0fda1aff0b752e4182c009c5526b9306bd35f7c Author: John David Anglin Date: Mon Mar 18 00:19:36 2024 + hppa: Improve handling of REG+D addresses when generating PA 2.0 code In looking at PR 112415, it became clear that improvements could be made in the handling of loads and stores using REG+D addresses. A change in 2002 conflated two issues: 1) We can't generate insns with 14-bit displacements before reload completes when generating PA 1.x code since floating-point loads and stores only support 5-bit offsets in PA 1.x. 2) The GNU ELF 32-bit linker lacks relocation support for PA 2.0 floating point instructions with 14-bit displacements. These relocations affect instructions with symbolic references. The result of the change was to block creation of PA 2.0 instructions with 14-bit REG_D displacements for SImode, DImode, SFmode and DFmode on the GNU linux target before reload. This was unnecessary as these instructions don't need relocation. This change revises the INT14_OK_STRICT define to allow creation of instructions with 14-bit REG+D addresses before reload when generating PA 2.0 code. 2024-03-17 John David Anglin gcc/ChangeLog: PR rtl-optimization/112415 * config/pa/pa.cc (pa_emit_move_sequence): Revise condition for symbolic memory operands. (pa_legitimate_address_p): Revise LO_SUM condition. * config/pa/pa.h (INT14_OK_STRICT): Revise define. Move comment about GNU linker to predicates.md. * config/pa/predicates.md (floating_point_store_memory_operand): Revise condition for symbolic memory operands. Update comment.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #52 from Manolis Tsamis --- (In reply to Sam James from comment #51) > manolis, did you have a chance to look at the remaining pass issue? You'll > need to revert Dave's commit locally which made the issue latent for > building Python. Hi Sam, I had to work on some other things so I didn't get to find a fix yet, but I'll be working on that again now (in light of the new info from PR111601 too). Thanks for the ping, Manolis
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #51 from Sam James --- manolis, did you have a chance to look at the remaining pass issue? You'll need to revert Dave's commit locally which made the issue latent for building Python.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #50 from CVS Commits --- The master branch has been updated by John David Anglin : https://gcc.gnu.org/g:d2934eb6ae92471484469d8ddd039eb34ef400b1 commit r14-5538-gd2934eb6ae92471484469d8ddd039eb34ef400b1 Author: John David Anglin Date: Thu Nov 16 17:42:26 2023 + hppa: Revise REG+D address support to allow long displacements before reload In analyzing PR rtl-optimization/112415, I realized that restricting REG+D offsets to 5-bits before reload results in very poor code and complexities in optimizing these instructions after reload. The general problem is long displacements are not allowed for floating point accesses when generating PA 1.1 code. Even with PA 2.0, there is a ELF linker bug that prevents using long displacements for floating point loads and stores. In the past, enabling long displacements before reload caused issues in reload. However, there have been fixes in the handling of reloads for floating-point accesses. This change allows long displacements before reload and corrects a couple of issues in the constraint handling for integer and floating-point accesses. 2023-11-16 John David Anglin gcc/ChangeLog: PR rtl-optimization/112415 * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit displacements before reload. Simplify logic flow. Revise comments. * config/pa/pa.h (TARGET_ELF64): New define. (INT14_OK_STRICT): Update define and comment. * config/pa/pa64-linux.h (TARGET_ELF64): Define. * config/pa/predicates.md (base14_operand): Don't check alignment of short displacements. (integer_store_memory_operand): Don't return true when reload_in_progress is true. Remove INT_5_BITS check. (floating_point_store_memory_operand): Don't return true when reload_in_progress is true. Use INT14_OK_STRICT to check whether long displacements are always okay.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #49 from John David Anglin --- Created attachment 56576 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56576=edit Patch to improve reg+d address handling This patch revise pa_legitimate_address_p to allow 14-bit displacements for all memory accesses before reload. Comments and flow in this routine are improved. So far, I haven't seen any issues related to reloading out-of-range floating-point accesses This significantly improves code generation and saves more than two thousand instructions in compile.s. I was able to successfully build python with the patched compiler. This is version two of the change and it still needs more testing.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #48 from Manolis Tsamis --- (In reply to dave.anglin from comment #47) > On 2023-11-13 4:33 a.m., manolis.tsamis at vrull dot eu wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 > > > > --- Comment #44 from Manolis Tsamis --- > > (In reply to John David Anglin from comment #39) > >> In the f-m-o pass, the following three insns that set call clobbered > >> registers r20-r22 are pulled from loop: > >> > >> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478]) > >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > >> (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120 > >> {addsi3} > >> (nil)) > >> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479]) > >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > >> (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120 > >> {addsi3} > >> (nil)) > >> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480]) > >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > >> (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120 > >> {addsi3} > >> (nil)) > >> > >> They are used in the following insns before call to compiler_visit_expr1: > >> > >> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int > >> *)prephit > >> mp_37 + 388B]+0 S4 A32]) > >> (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])) > >> "../Python/compile.c" > >> :5968:22 42 {*pa.md:2193} > >> (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]) > >> (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478]) > >> (nil > >> (insn 258 242 246 32 (set (reg:SI 26 %r26) > >> (reg/v/f:SI 5 %r5 [orig:198 c ] [198])) > >> "../Python/compile.c":5969:15 42 {*pa.md:2193} > >> (nil)) > >> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int > >> *)prephitmp_37 + 392B]+0 S4 A32]) > >> (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])) > >> "../Python/compile.c":5968:22 42 {*pa.md:2193} > >> (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]) > >> (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479]) > >> (nil > >> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int > >> *)prephitmp_37 + 396B]+0 S4 A32]) > >> (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])) > >> "../Python/compile.c":5968:22 42 {*pa.md:2193} > >> (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]) > >> (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480]) > >> (nil > >> > >> After the call, we have: > >> > >> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478]) > >> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) > >> "../Python/compile.c":5970:20 -1 > >> (nil)) > >> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478]) > >> (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4 > >> A32]) > >> (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) > >> "../Python/compile.c":5970:20 42 {*pa.md:2193} > >> (nil)) > >> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479]) > >> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) > >> "../Python/compile.c":5970:20 -1 > >> (nil)) > >> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479]) > >> (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4 > >> A32]) > >> (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156])) > >> "../Python/compile.c":5970:20 42 {*pa.md:2193} > >> (nil)) > >> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480]) > >> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) > >> "../Python/compile.c":5970:20 -1 > >> (nil)) > >> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480]) > >> (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4 > >> A32]) > >> (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134])) > >> "../Python/compile.c":5970:20 42 {*pa.md:2193} > >> (nil)) > >> > >> We have lost the offsets that were added initially to r20, r21 and r22. > >> > >> Previous ce3 pass had: > >> > >> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478]) > >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > >> (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120 > >> {addsi3} > >> (nil)) > >> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int > >> *)_107 + 388B]+0 S4 A32]) > >> (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) > >> "../Python/compile.c":5970:20 42 {*pa.md:2193} > >> (nil)) > >> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479]) > >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > >> (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120 > >> {addsi3} > >>
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #47 from dave.anglin at bell dot net --- On 2023-11-13 4:33 a.m., manolis.tsamis at vrull dot eu wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 > > --- Comment #44 from Manolis Tsamis --- > (In reply to John David Anglin from comment #39) >> In the f-m-o pass, the following three insns that set call clobbered >> registers r20-r22 are pulled from loop: >> >> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478]) >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) >> (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120 >> {addsi3} >> (nil)) >> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479]) >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) >> (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120 >> {addsi3} >> (nil)) >> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480]) >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) >> (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120 >> {addsi3} >> (nil)) >> >> They are used in the following insns before call to compiler_visit_expr1: >> >> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int >> *)prephit >> mp_37 + 388B]+0 S4 A32]) >> (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])) >> "../Python/compile.c" >> :5968:22 42 {*pa.md:2193} >> (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]) >> (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478]) >> (nil >> (insn 258 242 246 32 (set (reg:SI 26 %r26) >> (reg/v/f:SI 5 %r5 [orig:198 c ] [198])) >> "../Python/compile.c":5969:15 42 {*pa.md:2193} >> (nil)) >> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int >> *)prephitmp_37 + 392B]+0 S4 A32]) >> (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])) >> "../Python/compile.c":5968:22 42 {*pa.md:2193} >> (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]) >> (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479]) >> (nil >> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int >> *)prephitmp_37 + 396B]+0 S4 A32]) >> (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])) >> "../Python/compile.c":5968:22 42 {*pa.md:2193} >> (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]) >> (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480]) >> (nil >> >> After the call, we have: >> >> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478]) >> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) >> "../Python/compile.c":5970:20 -1 >> (nil)) >> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478]) >> (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4 >> A32]) >> (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) >> "../Python/compile.c":5970:20 42 {*pa.md:2193} >> (nil)) >> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479]) >> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) >> "../Python/compile.c":5970:20 -1 >> (nil)) >> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479]) >> (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4 >> A32]) >> (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156])) >> "../Python/compile.c":5970:20 42 {*pa.md:2193} >> (nil)) >> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480]) >> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) >> "../Python/compile.c":5970:20 -1 >> (nil)) >> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480]) >> (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4 >> A32]) >> (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134])) >> "../Python/compile.c":5970:20 42 {*pa.md:2193} >> (nil)) >> >> We have lost the offsets that were added initially to r20, r21 and r22. >> >> Previous ce3 pass had: >> >> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478]) >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) >> (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120 >> {addsi3} >> (nil)) >> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int >> *)_107 + 388B]+0 S4 A32]) >> (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) >> "../Python/compile.c":5970:20 42 {*pa.md:2193} >> (nil)) >> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479]) >> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) >> (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120 >> {addsi3} >> (nil)) >> (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int >> *)_107 + 392B]+0 S4 A32]) >> (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156])) >> "../Python/compile.c":5970:20 42 {*pa.md:2193} >> (nil)) >>
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #46 from Manolis Tsamis --- I have reproduced the segfault with f-m-o limited to only fold insn 272 from compiler_call_helper. The exact transformation is: Memory offset changed from 0 to 388 for instruction: (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(intD.1 *)_107 + 388B]+0 S4 A32]) (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2448D.32932 ] [167])) "Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) deferring rescan insn with uid = 273. Instruction folded:(insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 388 [0x184]))) "Python/compile.c":5970:20 120 {addsi3} (nil)) This instruction is also included to the ones that Dave mentioned. Again, if I'm missing something as to why this transformation is illegal please tell me. Given these are also consecutive instructions, I'm just seeing here that %r22 = %r19 + 388 [%r22] = %r14 is transformed to %r22 = %r19 [%r22 + 388] = %r14 I haven't tracked all other uses of %r22 yet, but in theory if there was any non-foldable use of that register then the transformation wouldn't be made. Manolis
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #45 from Manolis Tsamis --- (In reply to Jeffrey A. Law from comment #41) > I would agree. In fact,the whole point of the f-m-o pass is to bring those > immediates into the memory reference. It'd be really useful to know why > that isn't happening. > > The only thing I can think of would be if multiple instructions needed the > %r20 in the RTL you attached. Which might point to a refinement we should > make in f-m-o, specifically the transformation isn't likely profitable if we > aren't able to fold away a term or fold a constant term into the actual > memory reference. Jeff, I'm confused about "It'd be really useful to know why that isn't happening.". It can be seen in Dave's dumps that it *is* happening, e.g.: Memory offset changed from 0 to 396 for instruction: (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 + 396B]+0 S4 A32]) (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) Instruction folded:(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120 {addsi3} (nil)) If you looks at the RTL in f-m-o all these offsets are indeed moved in the respective load/store. I don't know if cprop afterwards manages to eliminate the unwanted move, but f-m-o does what it's supposed to do in this case.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #44 from Manolis Tsamis --- (In reply to John David Anglin from comment #39) > In the f-m-o pass, the following three insns that set call clobbered > registers r20-r22 are pulled from loop: > > (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478]) > (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120 > {addsi3} > (nil)) > (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479]) > (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120 > {addsi3} > (nil)) > (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480]) > (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120 > {addsi3} > (nil)) > > They are used in the following insns before call to compiler_visit_expr1: > > (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int > *)prephit > mp_37 + 388B]+0 S4 A32]) > (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])) > "../Python/compile.c" > :5968:22 42 {*pa.md:2193} > (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]) > (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478]) > (nil > (insn 258 242 246 32 (set (reg:SI 26 %r26) > (reg/v/f:SI 5 %r5 [orig:198 c ] [198])) > "../Python/compile.c":5969:15 42 {*pa.md:2193} > (nil)) > (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int > *)prephitmp_37 + 392B]+0 S4 A32]) > (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])) > "../Python/compile.c":5968:22 42 {*pa.md:2193} > (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]) > (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479]) > (nil > (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int > *)prephitmp_37 + 396B]+0 S4 A32]) > (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])) > "../Python/compile.c":5968:22 42 {*pa.md:2193} > (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]) > (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480]) > (nil > > After the call, we have: > > (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478]) > (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) > "../Python/compile.c":5970:20 -1 > (nil)) > (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478]) > (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4 > A32]) > (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) > "../Python/compile.c":5970:20 42 {*pa.md:2193} > (nil)) > (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479]) > (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) > "../Python/compile.c":5970:20 -1 > (nil)) > (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479]) > (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4 > A32]) > (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156])) > "../Python/compile.c":5970:20 42 {*pa.md:2193} > (nil)) > (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480]) > (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) > "../Python/compile.c":5970:20 -1 > (nil)) > (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480]) > (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4 > A32]) > (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134])) > "../Python/compile.c":5970:20 42 {*pa.md:2193} > (nil)) > > We have lost the offsets that were added initially to r20, r21 and r22. > > Previous ce3 pass had: > > (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478]) > (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120 > {addsi3} > (nil)) > (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int > *)_107 + 388B]+0 S4 A32]) > (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) > "../Python/compile.c":5970:20 42 {*pa.md:2193} > (nil)) > (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479]) > (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120 > {addsi3} > (nil)) > (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int > *)_107 + 392B]+0 S4 A32]) > (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156])) > "../Python/compile.c":5970:20 42 {*pa.md:2193} > (nil)) > (insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480]) > (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) > (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120 > {addsi3} > (nil)) > (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int > *)_107 + 396B]+0 S4 A32]) >
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #43 from Jeffrey A. Law --- I would expect allowing larger offsets before reload to be a significant problem. The core issue is integer memory operations allow 14 bits while FP only allows 5. During reloading we don't know if any given memory reference is FP or integer. xmpyu plays a role here too since it's going to require FP registers in integer modes. But what I don't understand is why f-m-o fails to push the offset into the memory reference -- it should be conditional on the insn being recognized. And since it's after reload we know if we're doing an FP or integer load.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #42 from John David Anglin --- The problem is we are limiting displacements to five bits in pa_legitimate_address_p. The comment is somewhat confusing but we may have reload issues if we allow 14-bit displacements before reload completes. Testing a patch to see if we can allow 14-bit displacements before reload.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #41 from Jeffrey A. Law --- I would agree. In fact,the whole point of the f-m-o pass is to bring those immediates into the memory reference. It'd be really useful to know why that isn't happening. The only thing I can think of would be if multiple instructions needed the %r20 in the RTL you attached. Which might point to a refinement we should make in f-m-o, specifically the transformation isn't likely profitable if we aren't able to fold away a term or fold a constant term into the actual memory reference.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #40 from John David Anglin --- Jeff, I don't think these split instructions make a lot of sense on PA-RISC. (insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120 {addsi3} (nil)) (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 + 396B]+0 S4 A32]) (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) They increase code size and register pressure. That may lead to unnecessary spills and longer branches. They increase probability of problems like the one in this PR. I suspect the two instructions generated are actually slower than one with a nonzero memory offset. It's not clear that memory accesses with a zero offset are faster than ones with nonzero offsets. Integer loads and stores on pa support fairly large offsets. I think we need to look at why this happens frequently. Thoughts?
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #39 from John David Anglin --- In the f-m-o pass, the following three insns that set call clobbered registers r20-r22 are pulled from loop: (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120 {addsi3} (nil)) (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120 {addsi3} (nil)) (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120 {addsi3} (nil)) They are used in the following insns before call to compiler_visit_expr1: (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int *)prephit mp_37 + 388B]+0 S4 A32]) (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])) "../Python/compile.c" :5968:22 42 {*pa.md:2193} (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]) (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478]) (nil (insn 258 242 246 32 (set (reg:SI 26 %r26) (reg/v/f:SI 5 %r5 [orig:198 c ] [198])) "../Python/compile.c":5969:15 42 {*pa.md:2193} (nil)) (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int *)prephitmp_37 + 392B]+0 S4 A32]) (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])) "../Python/compile.c":5968:22 42 {*pa.md:2193} (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]) (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479]) (nil (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)prephitmp_37 + 396B]+0 S4 A32]) (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])) "../Python/compile.c":5968:22 42 {*pa.md:2193} (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]) (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480]) (nil After the call, we have: (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478]) (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) "../Python/compile.c":5970:20 -1 (nil)) (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478]) (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4 A32]) (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479]) (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) "../Python/compile.c":5970:20 -1 (nil)) (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479]) (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4 A32]) (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480]) (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])) "../Python/compile.c":5970:20 -1 (nil)) (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480]) (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4 A32]) (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) We have lost the offsets that were added initially to r20, r21 and r22. Previous ce3 pass had: (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120 {addsi3} (nil)) (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int *)_107 + 388B]+0 S4 A32]) (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120 {addsi3} (nil)) (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int *)_107 + 392B]+0 S4 A32]) (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) (insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480]) (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]) (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120 {addsi3} (nil)) (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 + 396B]+0 S4 A32]) (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134])) "../Python/compile.c":5970:20 42 {*pa.md:2193} (nil)) So, this is a f-m-o bug.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 Sam James changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-11-11 --- Comment #38 from Sam James --- Confirming since Dave repro'd too.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #37 from John David Anglin --- (In reply to Sam James from comment #35) > If you still need dumps off me, please let me know which. I've attached > those w/ f-o-m on for the fold-mem-offsets pass. If you need others, just > say. I have a set of dumps. The problem is determining where the wrong RTL occurs in compiler_call_helper. It changes a lot in pass to pass. Many of the changes in f-m-o seem to get destroyed by later transformations.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #36 from John David Anglin --- Created attachment 56562 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56562=edit fold_mem_offsets, prop_hardreg, rtl_dce and bbro dumps Comment #33 is wrong. The issue is not reload. It's okay to pick a call clobbered register as the code stands. The initialization of the register used for the store at offset 392B ends up outside the loop. It ends up in a call clobbered register and clobbered by the call to compiler_visit_expr1 in the loop. This occurs around the second call to compiler_visit_expr1 in compiler_call_helper Various initializations get moved out of the loop between the f-m-o and bbro passes. I think it's the bbro pass that's at fault but it could be something that happens before that causes the initialization to get moved outside the loop.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #35 from Sam James --- If you still need dumps off me, please let me know which. I've attached those w/ f-o-m on for the fold-mem-offsets pass. If you need others, just say.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #34 from John David Anglin --- Same wrong code is generated with x86-64 cross to hppa-linux-gnu. This it seems this bug is not due to gcc being miscompiled.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #33 from John David Anglin --- Created attachment 56549 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56549=edit ira and reload dumps for compiler_call_helper The incorrect code for insn 246 in compiler_call_helper appears in the reload pass.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #32 from dave.anglin at bell dot net --- At this point, I don't have gcc-14 builds that bracket the f-m-o change. Maybe Sam can check. I'm trying to determine RTL pass where things go bad.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #31 from Jeffrey A. Law --- IIRC r21 is call-clobbered. So I guess the question turns into what was the sequence before f-m-o got involved -- was it assuming r21 would be preserved, or did f-m-o make r21 live across the call?
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #30 from John David Anglin --- 0x0019c684 <+588>: stw r23,0(r22) => 0x0019c688 <+592>: stw ret1,0(r21) 0x0019c68c <+596>: stw r31,0(r20) 0x0019c690 <+600>: b,l 0x198d58 ,rp 0x0019c694 <+604>: stw ret0,0(r19) These instructions are in a loop: /* No * or ** args, so can use faster calling sequence */ for (i = 0; i < nelts; i++) { expr_ty elt = asdl_seq_GET(args, i); assert(elt->kind != Starred_kind); VISIT(c, expr, elt); } r21 is clobbered by VISIT call. Value is okay in first iteration. The initialization instructions are outside the loop: 0x0019c638 <+512>: ldo 184(r19),r22 0x0019c63c <+516>: ldw 184(r19),r14 0x0019c640 <+520>: ldo 188(r19),r21 0x0019c644 <+524>: ldw 188(r19),r13 0x0019c648 <+528>: ldo 18c(r19),r20 0x0019c64c <+532>: ldw 18c(r19),r12 0x0019c650 <+536>: ldw 190(r19),r11
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #29 from John David Anglin --- The miscompilation is in compiler_visit_expr: (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/dave/debian/python3.11/python3.11-3.11.6/build-static/Programs/_freeze_module importlib._bootstrap ../Lib/importlib/_bootstrap.py Python/frozen_modules/importlib._bootstrap.h warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Breakpoint 2, compiler_jump_if (c=0xf8f02508, e=0x5763f8, next=0xfaeaa908, cond=0) at ../Python/compile.c:2898 2898{ (gdb) watch *0xfaea51b8 Watchpoint 3: *0xfaea51b8 (gdb) c Continuing. Watchpoint 3: *0xfaea51b8 Old value = -85046408 New value = 43 0x0019c688 in compiler_visit_expr (e=0x576308, c=0xf8f02508) at ../Python/compile.c:5968 5968SET_LOC(c, e); (gdb) bt #0 0x0019c688 in compiler_visit_expr (e=0x576308, c=0xf8f02508) at ../Python/compile.c:5968 #1 compiler_call_helper (c=0xf8f02508, n=0, args=, keywords=0x0) at ../Python/compile.c:5138 #2 0x0019ec70 in compiler_visit_expr (e=, c=0xf8f02508) at ../Python/compile.c:5969 #3 compiler_jump_if (c=0xf8f02508, e=, next=0x0, cond=) at ../Python/compile.c:2988 #4 0x001a0770 in compiler_if (s=0x0, c=0x5763c0) at ../Python/compile.c:3090 #5 compiler_visit_stmt (c=0x5763c0, s=0x0) at ../Python/compile.c:4118 #6 0x001a1378 in compiler_for (s=0x0, c=0x5763c0) at ../Python/compile.c:3124 #7 compiler_visit_stmt (c=0x5763c0, s=0x0) at ../Python/compile.c:4114 #8 0x001a3170 in compiler_function (c=0x2, s=, is_async=) at ../Python/compile.c:2670 #9 0x001a3438 in compiler_body (c=0x0, stmts=0x5763c0) at ../Python/compile.c:2180 #10 0x001a5cdc in compiler_mod (mod=0x0, c=0xf8f02528) at ../Python/compile.c:2197 #11 _PyAST_Compile (mod=0x0, filename=0xf8f02528, flags=, optimize=, arena=) at ../Python/compile.c:581 #12 0x001dea00 in Py_CompileStringObject (optimize=0, flags=0x5763c0, start=0, filename=0x2, str=0x0) at ../Python/pythonrun.c:1799 #13 Py_CompileStringExFlags (str=0x0, filename_str=, start=0, --Type for more, q to quit, c to continue without paging-- flags=0x5763c0, optimize=) at ../Python/pythonrun.c:1812 #14 0x000167a4 in compile_and_marshal (text=0x0, name=0x2 ) at ../Programs/_freeze_module.c:125 #15 main (argc=0, argv=) at ../Programs/_freeze_module.c:230 (gdb) diass $pc-16,$pc+16 Undefined command: "diass". Try "help". (gdb) disass $pc-16,$pc+16 Dump of assembler code from 0x19c678 to 0x19c698: 0x0019c678 : ldw 14(r25),ret1 0x0019c67c : ldw 18(r25),r31 0x0019c680 : ldw 1c(r25),ret0 0x0019c684 : stw r23,0(r22) => 0x0019c688 : stw ret1,0(r21) 0x0019c68c : stw r31,0(r20) 0x0019c690 : b,l 0x198d58 ,rp 0x0019c694 : stw ret0,0(r19) End of assembler dump. The code at 0x0019c688 clobbers the value at c->u->u_ste: (gdb) p/x $r21 $35 = 0xfaea51b8 (gdb) p/x *c $36 = {c_filename = 0xfaed9480, c_st = 0xfaeafd10, c_future = 0xfaef7030, c_flags = 0xf8f02544, c_optimize = 0x0, c_interactive = 0x0, c_nestlevel = 0x2, c_const_cache = 0xfae81280, u = 0xfaea51b8, c_stack = 0xfae57a88, c_arena = 0xfaec0c90} (gdb) p/x *c->u $37 = {u_ste = 0x2b, u_name = 0xfae7ff80, u_qualname = 0xfae7ff80, u_scope_type = 0x2, u_consts = 0xfaeaa7f8, u_names = 0xfaeaa7d0, u_varnames = 0xfaeaa780, u_cellvars = 0xfaeaa7a8, u_freevars = 0xfaeaa758, u_private = 0x0, u_argcount = 0x2, u_posonlyargcount = 0x0, u_kwonlyargcount = 0x0, u_blocks = 0xfaeaa908, u_curblock = 0xfaeaa868, u_nfblocks = 0x1, u_fblock = {{fb_type = 0x1, fb_block = 0xfaeaa840, fb_exit = 0xfaeaa8b8, fb_datum = 0x0}, {fb_type = 0x0, fb_block = 0x0, fb_exit = 0x0, fb_datum = 0x0} }, u_firstlineno = 0x28, u_lineno = 0x2b, u_col_offset = 0xb, u_end_lineno = 0x2b, u_end_col_offset = 0x20, u_need_new_implicit_block = 0x0} (gdb) p/x $r23 $38 = 0x2b #define SET_LOC(c, x) \ (c)->u->u_lineno = (x)->lineno; \ (c)->u->u_col_offset = (x)->col_offset; \ (c)->u->u_end_lineno = (x)->end_lineno; \ (c)->u->u_end_col_offset = (x)->end_col_offset; (gdb) p/x *e $40 = {kind = 0x18, v = {BoolOp = {op = 0xfaeb8b60, values = 0x1}, NamedExpr = {target = 0xfaeb8b60, value = 0x1}, BinOp = { left = 0xfaeb8b60, op = 0x1, right = 0x0}, UnaryOp = {op = 0xfaeb8b60, operand = 0x1}, Lambda = {args = 0xfaeb8b60, body = 0x1}, IfExp = { test = 0xfaeb8b60, body = 0x1, orelse = 0x0}, Dict = {keys = 0xfaeb8b60, values = 0x1}, Set = {elts = 0xfaeb8b60}, ListComp = {elt = 0xfaeb8b60, generators = 0x1}, SetComp = {elt = 0xfaeb8b60, generators = 0x1}, DictComp = {key = 0xfaeb8b60, value = 0x1, generators = 0x0}, GeneratorExp = {elt = 0xfaeb8b60, generators = 0x1}, Await = { value = 0xfaeb8b60}, Yield = {value = 0xfaeb8b60}, YieldFrom = {
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #28 from dave.anglin at bell dot net --- On 2023-11-08 7:07 p.m., law at gcc dot gnu.org wrote: > Do we already have a dump for the key function? Presumably f-m-o doesn't > trigger*that* much. And if this is triggering w/o LTO we can probably move > to > cross debugging and analysis of those dump files and assembly code with and > without f-m-o enabled, narrowing our focus on the key function. I tried looking at the difference with and without f-m-o and it was quite large. The difference with and without strict aliasing is much smaller. The main differences that I saw relate to the inlining of compiler_visit_expr and compiler_visit_expr1.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #27 from dave.anglin at bell dot net --- On 2023-11-08 7:00 p.m., John David Anglin wrote: > On 2023-11-08 6:51 p.m., sjames at gcc dot gnu.org wrote: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 >> >> --- Comment #23 from Sam James --- >> (In reply to Andrew Pinski from comment #21) >>> The other option to try is -fstack-reuse=none. There is definitely known >>> issues with the code that coalesces stack variables together too (see PR >>> 111843 for examples). >> I had a good feeling about this but no, didn't help when applied to >> compile.o. > At this point, I don't know whether this is a python or gcc bug. I scanned > for unions in compile.i > that might be problematic but I didn't find anything obvious. Note -no-strict-aliasing affects the inlining of compiler_visit_expr. It is not inlined with -no-strict-aliasing.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #26 from Jeffrey A. Law --- As a compiler junkie, I tend to think compiler first until I can prove it otherwise. I wouldn't get too hung up on aliasing issues and such at this point. Do we already have a dump for the key function? Presumably f-m-o doesn't trigger *that* much. And if this is triggering w/o LTO we can probably move to cross debugging and analysis of those dump files and assembly code with and without f-m-o enabled, narrowing our focus on the key function.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #25 from Sam James --- I am having the same thoughts. It would not be the first time Python had something dubious, like... * https://wiki.gentoo.org/wiki/Project:Python/Strict_aliasing -> https://www.python.org/dev/peps/pep-3123/ * https://github.com/python/cpython/issues/78 So far, I did not see this failure on any other target (-> makes me think it's a gcc bug). But also, I didn't yet see any other software break on hppa (-> makes me think it might be a Python bug). I tried ubsan on amd64 with Python 3.12 at least and got a lot of different errors, although ubsan does not diagnose aliasing issues... I am undecided myself still.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #24 from dave.anglin at bell dot net --- On 2023-11-08 6:51 p.m., sjames at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 > > --- Comment #23 from Sam James --- > (In reply to Andrew Pinski from comment #21) >> The other option to try is -fstack-reuse=none. There is definitely known >> issues with the code that coalesces stack variables together too (see PR >> 111843 for examples). > I had a good feeling about this but no, didn't help when applied to compile.o. At this point, I don't know whether this is a python or gcc bug. I scanned for unions in compile.i that might be problematic but I didn't find anything obvious.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #23 from Sam James --- (In reply to Andrew Pinski from comment #21) > The other option to try is -fstack-reuse=none. There is definitely known > issues with the code that coalesces stack variables together too (see PR > 111843 for examples). I had a good feeling about this but no, didn't help when applied to compile.o.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #22 from John David Anglin --- Created attachment 56542 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56542=edit Preprocessed source and assembly files for Python/compile.c
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #21 from Andrew Pinski --- (In reply to dave.anglin from comment #20) > Both -fno-strict-aliasing and -fno-schedule-insns2 applied to > compiler_visit_expr() > work around issue. The other option to try is -fstack-reuse=none. There is definitely known issues with the code that coalesces stack variables together too (see PR 111843 for examples).
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #20 from dave.anglin at bell dot net --- On 2023-11-08 2:07 p.m., pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 > > --- Comment #18 from Andrew Pinski --- > I wonder if -fno-strict-aliasing works around the issue too? > I get the feeling that `fold mem offset pass` allows the aliasing code to have > a better time with the offset and that might be expose more aliasing issues. > > The other thing to try is add `-fno-schedule-insns2 -fno-schedule-insns` > instead of `-fno-strict-aliasing` as the scheduler is normally where the > aliasing issues are exposed on the RTL level ... Both -fno-strict-aliasing and -fno-schedule-insns2 applied to compiler_visit_expr() work around issue.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #19 from Jeffrey A. Law --- f-m-o runs post-allocation, so the scope of where it's behavior can change things is narrower. So testing with -fno-schedule-insns isn't going to be useful, but -fno-schedule-insns2 might. I'm a bit concerned that we can't turn off f-m-o with an attribute. That would indicating something isn't wired up right in the options handling.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #18 from Andrew Pinski --- I wonder if -fno-strict-aliasing works around the issue too? I get the feeling that `fold mem offset pass` allows the aliasing code to have a better time with the offset and that might be expose more aliasing issues. The other thing to try is add `-fno-schedule-insns2 -fno-schedule-insns` instead of `-fno-strict-aliasing` as the scheduler is normally where the aliasing issues are exposed on the RTL level ...
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #17 from dave.anglin at bell dot net --- On 2023-11-08 9:42 a.m., jeffreyalaw at gmail dot com wrote: > I'd probably continue with the process of narrowing down what code is > affected using the attributes. We already know the file, narrowing it > down to a function might help considerably with the evaluation effort. The problem seems to be in compiler_visit_expr(). -static int compiler_visit_expr(struct compiler *, expr_ty); +static int compiler_visit_expr(struct compiler *, expr_ty) __attribute__((optimize("no-inline-small-functions"))); Python builds okay if this function is not inlined, if it is compiled at -O1, or if -fno-inline-small-functions is specified as above. Can't specify -fno-fold-mem-offsets as a function attribute.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #16 from Jeffrey A. Law --- On 11/8/23 03:09, manolis.tsamis at vrull dot eu wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 > > --- Comment #15 from Manolis Tsamis --- > (In reply to Sam James from comment #13) >> Created attachment 56527 [details] >> compile.c.323r.fold_mem_offsets.bad.xz >> >> Output from >> ``` >> hppa2.0-unknown-linux-gnu-gcc -c -DNDEBUG -g -fwrapv -O3 -Wall -O2 >> -std=c11 -Werror=implicit-function-declaration -fvisibility=hidden >> -I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I. >> -I/home/sam/git/cpython/Include-DPy_BUILD_CORE -o Python/compile.o >> /home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all >> ``` >> >> If I instrument certain functions in compile.c with no optimisation >> attribuet or build the file with -fno-fold-mem-offsets, Python works, so I'm >> reasonably sure this is the relevant object. > > Thanks for the dump file! There are 66 folded/eliminated instructions in this > object file; I did look at each case and there doesn't seem to be anything > strange. In fact most of the transformations are straightforward: > > - All except a couple of cases don't involve any arithmetic, so it's just > moving a constant around. > - The majority of the transformations are 'trivial' and consist of a single > add and then a memory operation: a sequence like X = Y + Const, R = MEM[X + 0] > is folded to X = Y, R = MEM[X + Const]. I wonder why so many of these exist > and > are not optimized elsewhere. > - There are some cases with negative offsets, but the calculations look > correct. > - There are few more complicated cases, but I've done these on paper and > also > look correct. The PA port is "weird". It's addressing modes aren't a good match for GCC (they're not symmetrical across loads vs stores and across fp vs integer) and they have the implicit space register problem. But I don't immediately recall needing to avoid propagation of constants into memory references or anything like that. I'd probably continue with the process of narrowing down what code is affected using the attributes. We already know the file, narrowing it down to a function might help considerably with the evaluation effort. Note that QEMU has a functional PA port. So you might be able to just take a root filesystem, add the tarball referenced earlier and play around to narrow things down further. I haven't done work on the PA in about 20 years at this point, but I can probably still grok its code. Between David and myself I'm sure we can help interpret what's going on Jeff
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #15 from Manolis Tsamis --- (In reply to Sam James from comment #13) > Created attachment 56527 [details] > compile.c.323r.fold_mem_offsets.bad.xz > > Output from > ``` > hppa2.0-unknown-linux-gnu-gcc -c -DNDEBUG -g -fwrapv -O3 -Wall -O2 > -std=c11 -Werror=implicit-function-declaration -fvisibility=hidden > -I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I. > -I/home/sam/git/cpython/Include-DPy_BUILD_CORE -o Python/compile.o > /home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all > ``` > > If I instrument certain functions in compile.c with no optimisation > attribuet or build the file with -fno-fold-mem-offsets, Python works, so I'm > reasonably sure this is the relevant object. Thanks for the dump file! There are 66 folded/eliminated instructions in this object file; I did look at each case and there doesn't seem to be anything strange. In fact most of the transformations are straightforward: - All except a couple of cases don't involve any arithmetic, so it's just moving a constant around. - The majority of the transformations are 'trivial' and consist of a single add and then a memory operation: a sequence like X = Y + Const, R = MEM[X + 0] is folded to X = Y, R = MEM[X + Const]. I wonder why so many of these exist and are not optimized elsewhere. - There are some cases with negative offsets, but the calculations look correct. - There are few more complicated cases, but I've done these on paper and also look correct. Of course I could be missing some more complicated effect, but what I want to say is that everything looks sensible in this particular file. > Thanks! You are very welcome to have access to some HPPA machines for > this kind of work. Please email me an SSH public key + desired username > if that sounds helpful. Yes, since I couldn't find anything interesting in the dump, that would definitely be helpful. Thanks! Manolis
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #14 from dave.anglin at bell dot net --- On 2023-11-07 8:36 p.m., sjames at gcc dot gnu.org wrote: > If I instrument certain functions in compile.c with no optimisation attribuet > or build the file with -fno-fold-mem-offsets, Python works, so I'm reasonably > sure this is the relevant object. I believe this bug is related to https://gcc.gnu.org/PR97431 I see the same fault with using debian/rules and -finline-small-functions option. Debian has been building with -fno-inline-small-functions on sh and hppa. This hides problem.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #13 from Sam James --- Created attachment 56527 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56527=edit compile.c.323r.fold_mem_offsets.bad.xz Output from ``` hppa2.0-unknown-linux-gnu-gcc -c -DNDEBUG -g -fwrapv -O3 -Wall -O2 -std=c11 -Werror=implicit-function-declaration -fvisibility=hidden -I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I. -I/home/sam/git/cpython/Include-DPy_BUILD_CORE -o Python/compile.o /home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all ``` If I instrument certain functions in compile.c with no optimisation attribuet or build the file with -fno-fold-mem-offsets, Python works, so I'm reasonably sure this is the relevant object.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #12 from Sam James --- (In reply to Manolis Tsamis from comment #11) > Hi all, > > I will also go ahead and try to reproduce that, although it may take me some > time due to my limited experience with HPPA. Once I manage to reproduce, > most f-m-o issues are straightforward to locate by bisecting the transformed > instructions. Thanks! You are very welcome to have access to some HPPA machines for this kind of work. Please email me an SSH public key + desired username if that sounds helpful. > > > I think the key object is Python/compile.o, but not certain yet. > > In this case the dump file of fold-mem-offsets > (-fdump-rtl-fold_mem_offsets-all) could also be useful, as it contains all > the information needed to see whether a transformation is valid. If it would > be easy for anyone to provide the dump file, I could look at it and see if > anything stands out (until I manage to reproduce this). I'll get the dumps in a moment, thanks.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #11 from Manolis Tsamis --- Hi all, I will also go ahead and try to reproduce that, although it may take me some time due to my limited experience with HPPA. Once I manage to reproduce, most f-m-o issues are straightforward to locate by bisecting the transformed instructions. > I think the key object is Python/compile.o, but not certain yet. In this case the dump file of fold-mem-offsets (-fdump-rtl-fold_mem_offsets-all) could also be useful, as it contains all the information needed to see whether a transformation is valid. If it would be easy for anyone to provide the dump file, I could look at it and see if anything stands out (until I manage to reproduce this). Thanks, Manolis
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #10 from dave.anglin at bell dot net --- On 2023-11-06 5:49 p.m., sjames at gcc dot gnu.org wrote: > Program received signal SIGSEGV, Segmentation fault. > 0x412083f0 in _PyST_GetSymbol (name=0xf9a34a00, ste=) at > Python/symtable.c:396 > 396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name); > (gdb) x/20i $pc > => 0x412083f0 <_PyST_GetScope+20>: ldw c(r26),r26 r26=0x34, so the ldw will fault. It appears r26 and r25 have been exchanged in the code prior to <_PyST_GetScope+20>. In any case, the problem is with the ste argument passed to _PyST_GetSymbol.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #9 from Sam James --- I think the key object is Python/compile.o, but not certain yet.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #8 from Sam James --- (In reply to Jeffrey A. Law from comment #6) Program received signal SIGSEGV, Segmentation fault. 0x412083f0 in _PyST_GetSymbol (name=0xf9a34a00, ste=) at Python/symtable.c:396 396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name); (gdb) x/20i $pc => 0x412083f0 <_PyST_GetScope+20>: ldw c(r26),r26 0x412083f4 <_PyST_GetScope+24>: movb,= ret0,r26,0x41208414 <_PyST_GetScope+56> 0x412083f8 <_PyST_GetScope+28>: copy r4,r19 0x412083fc <_PyST_GetScope+32>: b,l 0x410d6900 ,rp 0x41208400 <_PyST_GetScope+36>: nop 0x41208404 <_PyST_GetScope+40>: ldw -54(sp),rp 0x41208408 <_PyST_GetScope+44>: extrw,u ret0,20,4,ret0 0x4120840c <_PyST_GetScope+48>: bve (rp) 0x41208410 <_PyST_GetScope+52>: ldw,mb -40(sp),r4 0x41208414 <_PyST_GetScope+56>: copy r26,ret0 0x41208418 <_PyST_GetScope+60>: ldw -54(sp),rp 0x4120841c <_PyST_GetScope+64>: bve (rp) 0x41208420 <_PyST_GetScope+68>: ldw,mb -40(sp),r4 0x41208424 <_Py_SymtableStringObjectFlags>: stw rp,-14(sp) 0x41208428 <_Py_SymtableStringObjectFlags+4>:stw,ma r8,80(sp) 0x4120842c <_Py_SymtableStringObjectFlags+8>:copy r23,r8 0x41208430 <_Py_SymtableStringObjectFlags+12>: stw r7,-7c(sp) 0x41208434 <_Py_SymtableStringObjectFlags+16>: copy r24,r7 0x41208438 <_Py_SymtableStringObjectFlags+20>: stw r6,-78(sp) 0x4120843c <_Py_SymtableStringObjectFlags+24>: copy r25,r6 (gdb) (gdb) i r flags r1 0x411bc688 1092339336 rp 0x412083f7 1092649975 r3 0x1 1 r4 0x4136c000 1094107136 r5 0xf9a34a00 4188228096 r6 0x8d141 r7 0xf7b03b88 4155521928 r8 0xf7b03ba8 4155521960 r9 0xf9953b68 4187306856 r100x0 0 r110x8e142 r120x414e1820 1095637024 r130x414e4490 1095648400 r140xf9a76498 4188497048 r150x1 1 r160xf99bb5e8 4187731432 r170xf9ae11b4 4188934580 r180xf99e3b68 4187896680 r190x4136c000 1094107136 r200x411bc7f0 1092339696 r210x41450268 1095041640 r220x8d141 r230x1 1 r240x1 1 r250xf9a34a00 4188228096 r260x3452 dp 0x4136c000 1094107136 ret0 0xf9964020 4187373600 ret1 0x8d141 sp 0xf7b04080 4155523200 r310x1 1 sar0x3d61 pcoqh 0x412083f3 1092649971 pcsqh pcoqt 0x410e4c0f 1091456015 pcsqt eiem iir isr ior ipsw 0xeff0f 982799 goto sr4 sr0 sr1 sr2 sr3 sr5 sr6 sr7 cr0 cr8 cr9 ccr cr12 cr13 cr24 cr25 cr26 0xeff0f 982799 mpsfu_high 0xf7afa500 4155483392 mpsfu_low mpsfu_ovflo pad fpsr fpe1 fpe2 fpe3 fpe4 fpe5 fpe6 fpe7 (gdb)
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #7 from dave.anglin at bell dot net --- On 2023-11-06 5:20 p.m., law at gcc dot gnu.org wrote: > The biggest concern I'd have with f-m-o on the PA would be the > implicit segment selection that happens on the base register -- but it would > only be an issue if we are faulting on an unscaled indexed addressing mode and > only if the linux-gnu port was actually putting different values into the > space > registers. The linux-gnu port does not put different values into the space resisters.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #6 from Jeffrey A. Law --- Do we have assembly code around the faulting point (x/20i $pc) and a register dump (i r)? The biggest concern I'd have with f-m-o on the PA would be the implicit segment selection that happens on the base register -- but it would only be an issue if we are faulting on an unscaled indexed addressing mode and only if the linux-gnu port was actually putting different values into the space registers. WRT testing -- we did test this on hppa1.1-linux-gnu. Just a bootstrap and regression test of the compiler itself.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 Sam James changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #5 from Sam James --- Built with 14.0.0 20231029. * https://dev.gentoo.org/~sam/bugs/gcc/gcc-python-hppa/cpython-3.11.6-good.tar.xz * https://dev.gentoo.org/~sam/bugs/gcc/gcc-python-hppa/cpython-3.11.6-bad.tar.xz
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #4 from Sam James --- Created attachment 56520 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56520=edit list_of_differing_files.txt
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #3 from dave.anglin at bell dot net --- On 2023-11-06 4:00 p.m., sjames at gcc dot gnu.org wrote: > Program received signal SIGSEGV, Segmentation fault. > 0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at > Python/symtable.c:396 > 396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name); > (gdb) bt > #0 0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at > Python/symtable.c:396 > #1 _PyST_GetScope (ste=, name=0xf9a33a60) at > Python/symtable.c:406 Probably, ste is NULL or in page 0, and it's symtable.c that's miscompiled. There's not a lot of testing of gcc-14 on hppa yet.
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 Andrew Pinski changed: What|Removed |Added Keywords||wrong-code Target Milestone|--- |14.0 Target||hppa2.0-unknown-linux-gnu
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 --- Comment #2 from Sam James --- I'll grab a bad vs good build directory next and upload both, and then try see which objects differ. Dave, can you reproduce?
[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415 Sam James changed: What|Removed |Added Summary|[14 regression] Python 3.11 |[14 regression] Python 3.11 |miscompiled with new RTL|miscompiled on HPPA with |fold mem offset pass, since |new RTL fold mem offset |r14-4664-g04c9cf5c786b94|pass, since ||r14-4664-g04c9cf5c786b94 --- Comment #1 from Sam James --- Backtrace from the crashing Python: ``` (gdb) r Starting program: /var/tmp/portage/dev-lang/python-3.11.6/work/Python-3.11.6/_bootstrap_python ./Tools/scripts/deepfreeze.py Python/frozen_modules/importlib._bootstrap.h:importlib._bootstrap Python/frozen_modules/importlib._bootstrap_external.h:importlib._bootstrap_external Python/frozen_modules/zipimport.h:zipimport Python/frozen_modules/abc.h:abc Python/frozen_modules/codecs.h:codecs Python/frozen_modules/io.h:io Python/frozen_modules/_collections_abc.h:_collections_abc Python/frozen_modules/_sitebuiltins.h:_sitebuiltins Python/frozen_modules/genericpath.h:genericpath Python/frozen_modules/ntpath.h:ntpath Python/frozen_modules/posixpath.h:posixpath Python/frozen_modules/os.h:os Python/frozen_modules/site.h:site Python/frozen_modules/stat.h:stat Python/frozen_modules/importlib.util.h:importlib.util Python/frozen_modules/importlib.machinery.h:importlib.machinery Python/frozen_modules/runpy.h:runpy Python/frozen_modules/__hello__.h:__hello__ Python/frozen_modules/__phello__.h:__phello__ Python/frozen_modules/__phello__.ham.h:__phello__.ham Python/frozen_modules/__phello__.ham.eggs.h:__phello__.ham.eggs Python/frozen_modules/__phello__.spam.h:__phello__.spam Python/frozen_modules/frozen_only.h:frozen_only -o Python/deepfreeze/deepfreeze.c warning: File "/usr/lib/libthread_db.so.1" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /usr/lib/libthread_db.so.1 line to your configuration file "/root/.config/gdb/gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.config/gdb/gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Program received signal SIGSEGV, Segmentation fault. 0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at Python/symtable.c:396 396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name); (gdb) bt #0 0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at Python/symtable.c:396 #1 _PyST_GetScope (ste=, name=0xf9a33a60) at Python/symtable.c:406 #2 0x411bb8f8 in compiler_nameop (c=0xf7b03b88, name=, ctx=Load) at Python/compile.c:4274 #3 0x411be074 in compiler_visit_expr (c=0x1, e=) at Python/compile.c:5969 #4 0x411bcc88 in compiler_visit_expr1 (c=0xf7b03b88, e=0x1) at Python/compile.c:5915 #5 0x411be074 in compiler_visit_expr (c=0x1, e=) at Python/compile.c:5969 #6 0x411bceac in compiler_call (e=0x1, c=0xf7b03b88) at Python/compile.c:4952 #7 compiler_visit_expr1 (c=0xf7b03b88, e=0x1) at Python/compile.c:5905 #8 0x411c1f34 in compiler_visit_expr (e=, c=0xf9a33a60) at Python/compile.c:5969 #9 compiler_decorators (decos=0x8d, c=0xf9a33a60) at Python/compile.c:2327 #10 compiler_class (c=0xf9a33a60, s=0x414e4490) at Python/compile.c:2702 #11 0x411c566c in compiler_body (c=0xf7b03b88, stmts=0xf9a33a60) at Python/compile.c:2180 #12 0x411c7e98 in compiler_mod (mod=0xf7b03b88, c=0x0) at Python/compile.c:2197 #13 _PyAST_Compile (mod=0xf7b03b88, filename=0x8d, flags=, optimize=, arena=) at Python/compile.c:581 #14 0x411fe7b8 in Py_CompileStringObject (str=0xf7b03b88 "\371\240\277\220\371\236\353`\371\257\221\260\367\260:t", filename=0x8d, start=-139445336, flags=0xf9a33a60, optimize=) at Python/pythonrun.c:1799 #15 0x4119c334 in builtin_compile_impl (module=, feature_version=, optimize=, dont_inherit=, flags=, mode=, filename=0xf998db68, source=0x8d) at Python/bltinmodule.c:831 #16 builtin_compile (module=, args=, nargs=, kwnames=) at Python/clinic/bltinmodule.c.h:328 #17 0x410f3ae4 in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0xf9a33a60, args=0x8d, nargsf=, kwnames=) at ./Include/cpython/methodobject.h:52 #18 0x4109fa88 in _PyVectorcall_Call (tstate=0xf7b03b88, func=, callable=0xf9a33a60, tuple=, kwargs=) at Objects/call.c:257 #19 0x4109fd28 in _PyObject_Call (tstate=0xf9a33a60, callable=0x1, args=0xf7b03ba8, kwargs=0x8d) at Objects/call.c:328 #20 0x4109fdb8 in PyObject_Call () at Objects/call.c:352 #21 0x411a47c8 in do_call_core (tstate=0x8d, func=0x1, callargs=0xf9a33a60,