[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2024-03-22 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #55 from Jeffrey A. Law  ---
Per c#54. If it turns out we're wrong, we can always reopen or file a new
report.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2024-03-17 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #54 from John David Anglin  ---
The f-m-o issue is probably fixed.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2024-03-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #53 from GCC Commits  ---
The master branch has been updated by John David Anglin :

https://gcc.gnu.org/g:f0fda1aff0b752e4182c009c5526b9306bd35f7c

commit r14-9511-gf0fda1aff0b752e4182c009c5526b9306bd35f7c
Author: John David Anglin 
Date:   Mon Mar 18 00:19:36 2024 +

hppa: Improve handling of REG+D addresses when generating PA 2.0 code

In looking at PR 112415, it became clear that improvements could be
made in the handling of loads and stores using REG+D addresses.  A
change in 2002 conflated two issues:

1) We can't generate insns with 14-bit displacements before reload
completes when generating PA 1.x code since floating-point loads and
stores only support 5-bit offsets in PA 1.x.

2) The GNU ELF 32-bit linker lacks relocation support for PA 2.0
floating point instructions with 14-bit displacements.  These
relocations affect instructions with symbolic references.

The result of the change was to block creation of PA 2.0 instructions
with 14-bit REG_D displacements for SImode, DImode, SFmode and DFmode
on the GNU linux target before reload.  This was unnecessary as these
instructions don't need relocation.

This change revises the INT14_OK_STRICT define to allow creation
of instructions with 14-bit REG+D addresses before reload when
generating PA 2.0 code.

2024-03-17  John David Anglin  

gcc/ChangeLog:

PR rtl-optimization/112415
* config/pa/pa.cc (pa_emit_move_sequence): Revise condition
for symbolic memory operands.
(pa_legitimate_address_p): Revise LO_SUM condition.
* config/pa/pa.h (INT14_OK_STRICT): Revise define.  Move
comment about GNU linker to predicates.md.
* config/pa/predicates.md (floating_point_store_memory_operand):
Revise condition for symbolic memory operands.  Update
comment.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-28 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #52 from Manolis Tsamis  ---
(In reply to Sam James from comment #51)
> manolis, did you have a chance to look at the remaining pass issue? You'll
> need to revert Dave's commit locally which made the issue latent for
> building Python.

Hi Sam, I had to work on some other things so I didn't get to find a fix yet,
but I'll be working on that again now (in light of the new info from PR111601
too). 

Thanks for the ping,
Manolis

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #51 from Sam James  ---
manolis, did you have a chance to look at the remaining pass issue? You'll need
to revert Dave's commit locally which made the issue latent for building
Python.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #50 from CVS Commits  ---
The master branch has been updated by John David Anglin :

https://gcc.gnu.org/g:d2934eb6ae92471484469d8ddd039eb34ef400b1

commit r14-5538-gd2934eb6ae92471484469d8ddd039eb34ef400b1
Author: John David Anglin 
Date:   Thu Nov 16 17:42:26 2023 +

hppa: Revise REG+D address support to allow long displacements before
reload

In analyzing PR rtl-optimization/112415, I realized that restricting
REG+D offsets to 5-bits before reload results in very poor code and
complexities in optimizing these instructions after reload.  The
general problem is long displacements are not allowed for floating
point accesses when generating PA 1.1 code.  Even with PA 2.0, there
is a ELF linker bug that prevents using long displacements for
floating point loads and stores.

In the past, enabling long displacements before reload caused issues
in reload.  However, there have been fixes in the handling of reloads
for floating-point accesses.  This change allows long displacements
before reload and corrects a couple of issues in the constraint
handling for integer and floating-point accesses.

2023-11-16  John David Anglin  

gcc/ChangeLog:

PR rtl-optimization/112415
* config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
displacements before reload.  Simplify logic flow.  Revise
comments.
* config/pa/pa.h (TARGET_ELF64): New define.
(INT14_OK_STRICT): Update define and comment.
* config/pa/pa64-linux.h (TARGET_ELF64): Define.
* config/pa/predicates.md (base14_operand): Don't check
alignment of short displacements.
(integer_store_memory_operand): Don't return true when
reload_in_progress is true.  Remove INT_5_BITS check.
(floating_point_store_memory_operand): Don't return true when
reload_in_progress is true.  Use INT14_OK_STRICT to check
whether long displacements are always okay.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-13 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #49 from John David Anglin  ---
Created attachment 56576
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56576=edit
Patch to improve reg+d address handling

This patch revise pa_legitimate_address_p to allow 14-bit displacements
for all memory accesses before reload.  Comments and flow in this routine
are improved.

So far, I haven't seen any issues related to reloading out-of-range
floating-point accesses

This significantly improves code generation and saves more than two
thousand instructions in compile.s.  I was able to successfully build
python with the patched compiler.

This is version two of the change and it still needs more testing.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-13 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #48 from Manolis Tsamis  ---
(In reply to dave.anglin from comment #47)
> On 2023-11-13 4:33 a.m., manolis.tsamis at vrull dot eu wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
> >
> > --- Comment #44 from Manolis Tsamis  ---
> > (In reply to John David Anglin from comment #39)
> >> In the f-m-o pass, the following three insns that set call clobbered
> >> registers r20-r22 are pulled from loop:
> >>
> >> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
> >>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>  (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120
> >> {addsi3}
> >>   (nil))
> >> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
> >>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>  (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120
> >> {addsi3}
> >>   (nil))
> >> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
> >>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>  (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120
> >> {addsi3}
> >>   (nil))
> >>
> >> They are used in the following insns before call to compiler_visit_expr1:
> >>
> >> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> >> *)prephit
> >> mp_37 + 388B]+0 S4 A32])
> >>  (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
> >> "../Python/compile.c"
> >> :5968:22 42 {*pa.md:2193}
> >>   (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
> >>  (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
> >>  (nil
> >> (insn 258 242 246 32 (set (reg:SI 26 %r26)
> >>  (reg/v/f:SI 5 %r5 [orig:198 c ] [198]))
> >> "../Python/compile.c":5969:15 42 {*pa.md:2193}
> >>   (nil))
> >> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
> >> *)prephitmp_37 + 392B]+0 S4 A32])
> >>  (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
> >> "../Python/compile.c":5968:22 42 {*pa.md:2193}
> >>   (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
> >>  (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
> >>  (nil
> >> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
> >> *)prephitmp_37 + 396B]+0 S4 A32])
> >>  (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
> >> "../Python/compile.c":5968:22 42 {*pa.md:2193}
> >>   (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
> >>  (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
> >>  (nil
> >>
> >> After the call, we have:
> >>
> >> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
> >>  (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> >> "../Python/compile.c":5970:20 -1
> >>   (nil))
> >> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
> >>  (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4
> >> A32])
> >>  (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>   (nil))
> >> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
> >>  (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> >> "../Python/compile.c":5970:20 -1
> >>   (nil))
> >> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
> >>  (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4
> >> A32])
> >>  (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>   (nil))
> >> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
> >>  (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> >> "../Python/compile.c":5970:20 -1
> >>   (nil))
> >> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
> >>  (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4
> >> A32])
> >>  (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>   (nil))
> >>
> >> We have lost the offsets that were added initially to r20, r21 and r22.
> >>
> >> Previous ce3 pass had:
> >>
> >> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
> >>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>  (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
> >> {addsi3}
> >>   (nil))
> >> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> >> *)_107 + 388B]+0 S4 A32])
> >>  (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>   (nil))
> >> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
> >>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>  (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
> >> {addsi3}
> >>   

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-13 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #47 from dave.anglin at bell dot net ---
On 2023-11-13 4:33 a.m., manolis.tsamis at vrull dot eu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>
> --- Comment #44 from Manolis Tsamis  ---
> (In reply to John David Anglin from comment #39)
>> In the f-m-o pass, the following three insns that set call clobbered
>> registers r20-r22 are pulled from loop:
>>
>> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
>>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>  (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120
>> {addsi3}
>>   (nil))
>> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
>>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>  (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120
>> {addsi3}
>>   (nil))
>> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
>>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>  (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120
>> {addsi3}
>>   (nil))
>>
>> They are used in the following insns before call to compiler_visit_expr1:
>>
>> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
>> *)prephit
>> mp_37 + 388B]+0 S4 A32])
>>  (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
>> "../Python/compile.c"
>> :5968:22 42 {*pa.md:2193}
>>   (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
>>  (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
>>  (nil
>> (insn 258 242 246 32 (set (reg:SI 26 %r26)
>>  (reg/v/f:SI 5 %r5 [orig:198 c ] [198]))
>> "../Python/compile.c":5969:15 42 {*pa.md:2193}
>>   (nil))
>> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
>> *)prephitmp_37 + 392B]+0 S4 A32])
>>  (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
>> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>>   (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
>>  (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
>>  (nil
>> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
>> *)prephitmp_37 + 396B]+0 S4 A32])
>>  (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
>> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>>   (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
>>  (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
>>  (nil
>>
>> After the call, we have:
>>
>> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
>>  (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
>> "../Python/compile.c":5970:20 -1
>>   (nil))
>> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
>>  (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4
>> A32])
>>  (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>   (nil))
>> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
>>  (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
>> "../Python/compile.c":5970:20 -1
>>   (nil))
>> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
>>  (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4
>> A32])
>>  (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>   (nil))
>> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
>>  (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
>> "../Python/compile.c":5970:20 -1
>>   (nil))
>> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
>>  (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4
>> A32])
>>  (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>   (nil))
>>
>> We have lost the offsets that were added initially to r20, r21 and r22.
>>
>> Previous ce3 pass had:
>>
>> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
>>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>  (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
>> {addsi3}
>>   (nil))
>> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
>> *)_107 + 388B]+0 S4 A32])
>>  (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>   (nil))
>> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
>>  (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>  (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
>> {addsi3}
>>   (nil))
>> (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
>> *)_107 + 392B]+0 S4 A32])
>>  (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>   (nil))
>> 

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-13 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #46 from Manolis Tsamis  ---
I have reproduced the segfault with f-m-o limited to only fold insn 272 from
compiler_call_helper. The exact transformation is:

Memory offset changed from 0 to 388 for instruction:
(insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(intD.1
*)_107 + 388B]+0 S4 A32])
(reg:SI 14 %r14 [orig:167 vect_pretmp_36.2448D.32932 ] [167]))
"Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))
deferring rescan insn with uid = 273.
Instruction folded:(insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 388 [0x184]))) "Python/compile.c":5970:20 120 {addsi3}
 (nil))

This instruction is also included to the ones that Dave mentioned. Again, if
I'm missing something as to why this transformation is illegal please tell me.
Given these are also consecutive instructions, I'm just seeing here that 

%r22 = %r19 + 388
[%r22] = %r14

is transformed to

%r22 = %r19
[%r22 + 388] = %r14

I haven't tracked all other uses of %r22 yet, but in theory if there was any
non-foldable use of that register then the transformation wouldn't be made.

Manolis

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-13 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #45 from Manolis Tsamis  ---
(In reply to Jeffrey A. Law from comment #41)
> I would agree.  In fact,the whole point of the f-m-o pass is to bring those
> immediates into the memory reference.  It'd be really useful to know why
> that isn't happening.
> 
> The only thing I can think of would be if multiple instructions needed the
> %r20 in the RTL you attached.  Which might point to a refinement we should
> make in f-m-o, specifically the transformation isn't likely profitable if we
> aren't able to fold away a term or fold a constant term into the actual
> memory reference.

Jeff,

I'm confused about "It'd be really useful to know why that isn't happening.".
It can be seen in Dave's dumps that it *is* happening, e.g.:

Memory offset changed from 0 to 396 for instruction:
(insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 +
396B]+0 S4 A32])
(reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))

Instruction folded:(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
{addsi3}
 (nil))

If you looks at the RTL in f-m-o all these offsets are indeed moved in the
respective load/store. I don't know if cprop afterwards manages to eliminate
the unwanted move, but f-m-o does what it's supposed to do in this case.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-13 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #44 from Manolis Tsamis  ---
(In reply to John David Anglin from comment #39)
> In the f-m-o pass, the following three insns that set call clobbered
> registers r20-r22 are pulled from loop:
> 
> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120
> {addsi3}
>  (nil))
> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120
> {addsi3}
>  (nil))
> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120
> {addsi3}
>  (nil))
> 
> They are used in the following insns before call to compiler_visit_expr1:
> 
> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> *)prephit
> mp_37 + 388B]+0 S4 A32])
> (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
> "../Python/compile.c"
> :5968:22 42 {*pa.md:2193}
>  (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
> (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
> (nil
> (insn 258 242 246 32 (set (reg:SI 26 %r26)
> (reg/v/f:SI 5 %r5 [orig:198 c ] [198]))
> "../Python/compile.c":5969:15 42 {*pa.md:2193}
>  (nil))
> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
> *)prephitmp_37 + 392B]+0 S4 A32])
> (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>  (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
> (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
> (nil
> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
> *)prephitmp_37 + 396B]+0 S4 A32])
> (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>  (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
> (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
> (nil
> 
> After the call, we have:
> 
> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> "../Python/compile.c":5970:20 -1
>  (nil))
> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
> (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4
> A32])
> (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>  (nil))
> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> "../Python/compile.c":5970:20 -1
>  (nil))
> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
> (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4
> A32])
> (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>  (nil))
> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
> (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> "../Python/compile.c":5970:20 -1
>  (nil))
> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
> (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4
> A32])
> (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>  (nil))
> 
> We have lost the offsets that were added initially to r20, r21 and r22.
> 
> Previous ce3 pass had:
> 
> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
> {addsi3}
>  (nil))
> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> *)_107 + 388B]+0 S4 A32])
> (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>  (nil))
> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
> {addsi3}
>  (nil))
> (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
> *)_107 + 392B]+0 S4 A32])
> (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>  (nil))
> (insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
> (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
> {addsi3}
>  (nil))
> (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
> *)_107 + 396B]+0 S4 A32])
>   

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #43 from Jeffrey A. Law  ---
I would expect allowing larger offsets before reload to be a significant
problem.

The core issue is integer memory operations allow 14 bits while FP only allows
5.  During reloading we don't know if any given memory reference is FP or
integer.  xmpyu plays a role here too since it's going to require FP registers
in integer modes.

But what I don't understand is why f-m-o fails to push the offset into the
memory reference -- it should be conditional on the insn being recognized.  And
since it's after reload we know if we're doing an FP or integer load.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-12 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #42 from John David Anglin  ---
The problem is we are limiting displacements to five bits in
pa_legitimate_address_p.  The comment is somewhat confusing but
we may have reload issues if we allow 14-bit displacements before
reload completes.  Testing a patch to see if we can allow 14-bit
displacements before reload.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #41 from Jeffrey A. Law  ---
I would agree.  In fact,the whole point of the f-m-o pass is to bring those
immediates into the memory reference.  It'd be really useful to know why that
isn't happening.

The only thing I can think of would be if multiple instructions needed the %r20
in the RTL you attached.  Which might point to a refinement we should make in
f-m-o, specifically the transformation isn't likely profitable if we aren't
able to fold away a term or fold a constant term into the actual memory
reference.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-12 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #40 from John David Anglin  ---
Jeff,

I don't think these split instructions make a lot of sense on PA-RISC.

(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
{addsi3}
 (nil))
(insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 +
396B]+0 S4 A32])
(reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))

They increase code size and register pressure.  That may lead to unnecessary
spills and longer branches.  They increase probability of problems like the
one in this PR.

I suspect the two instructions generated are actually slower than one with a
nonzero memory offset.  It's not clear that memory accesses with a zero offset
are faster than ones with nonzero offsets.

Integer loads and stores on pa support fairly large offsets.

I think we need to look at why this happens frequently.

Thoughts?

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-11 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #39 from John David Anglin  ---
In the f-m-o pass, the following three insns that set call clobbered
registers r20-r22 are pulled from loop:

(insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 388 [0x184]))) "../Python/compile.c":5964:9 120 {addsi3}
 (nil))
(insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 392 [0x188]))) "../Python/compile.c":5964:9 120 {addsi3}
 (nil))
(insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120 {addsi3}
 (nil))

They are used in the following insns before call to compiler_visit_expr1:

(insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
*)prephit
mp_37 + 388B]+0 S4 A32])
(reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
"../Python/compile.c"
:5968:22 42 {*pa.md:2193}
 (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
(expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
(nil
(insn 258 242 246 32 (set (reg:SI 26 %r26)
(reg/v/f:SI 5 %r5 [orig:198 c ] [198])) "../Python/compile.c":5969:15
42 {*pa.md:2193}
 (nil))
(insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
*)prephitmp_37 + 392B]+0 S4 A32])
(reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
"../Python/compile.c":5968:22 42 {*pa.md:2193}
 (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
(expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
(nil
(insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
*)prephitmp_37 + 396B]+0 S4 A32])
(reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
"../Python/compile.c":5968:22 42 {*pa.md:2193}
 (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
(expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
(nil

After the call, we have:

(insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
(reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
"../Python/compile.c":5970:20 -1
 (nil))
(insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
(const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4 A32])
(reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))
(insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
(reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
"../Python/compile.c":5970:20 -1
 (nil))
(insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
(const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4 A32])
(reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))
(insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
(reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
"../Python/compile.c":5970:20 -1
 (nil))
(insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
(const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4 A32])
(reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))

We have lost the offsets that were added initially to r20, r21 and r22.

Previous ce3 pass had:

(insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
{addsi3}
 (nil))
(insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int *)_107 +
388B]+0 S4 A32])
(reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))
(insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
{addsi3}
 (nil))
(insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int *)_107 +
392B]+0 S4 A32])
(reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))
(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
(plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
(const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
{addsi3}
 (nil))
(insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 +
396B]+0 S4 A32])
(reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
 (nil))

So, this is a f-m-o bug.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-11 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Sam James  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-11-11

--- Comment #38 from Sam James  ---
Confirming since Dave repro'd too.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-11 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #37 from John David Anglin  ---
(In reply to Sam James from comment #35)
> If you still need dumps off me, please let me know which. I've attached
> those w/ f-o-m on for the fold-mem-offsets pass. If you need others, just
> say.

I have a set of dumps.  The problem is determining where the wrong RTL
occurs in compiler_call_helper.  It changes a lot in pass to pass.

Many of the changes in f-m-o seem to get destroyed by later transformations.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-11 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #36 from John David Anglin  ---
Created attachment 56562
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56562=edit
fold_mem_offsets, prop_hardreg, rtl_dce and bbro dumps

Comment #33 is wrong.  The issue is not reload.  It's okay to pick a
call clobbered register as the code stands.

The initialization of the register used for the store at
offset 392B ends up outside the loop.  It ends up in a call clobbered
register and clobbered by the call to compiler_visit_expr1 in the loop.
This occurs around the second call to compiler_visit_expr1 in
compiler_call_helper

Various initializations get moved out of the loop between the f-m-o and bbro
passes.  I think it's the bbro pass that's at fault but it could be something
that happens before that causes the initialization to get moved outside the
loop.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-11 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #35 from Sam James  ---
If you still need dumps off me, please let me know which. I've attached those
w/ f-o-m on for the fold-mem-offsets pass. If you need others, just say.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-11 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #34 from John David Anglin  ---
Same wrong code is generated with x86-64 cross to hppa-linux-gnu. This it seems
this bug is not due to gcc being miscompiled.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-09 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #33 from John David Anglin  ---
Created attachment 56549
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56549=edit
ira and reload dumps for compiler_call_helper

The incorrect code for insn 246 in compiler_call_helper appears in the reload
pass.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-09 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #32 from dave.anglin at bell dot net ---
At this point, I don't have gcc-14 builds that bracket the f-m-o change.  Maybe
Sam can check.

I'm trying to determine RTL pass where things go bad.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-09 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #31 from Jeffrey A. Law  ---
IIRC r21 is call-clobbered.  So I guess the question turns into what was the
sequence before f-m-o got involved -- was it assuming r21 would be preserved,
or did f-m-o make r21 live across the call?

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-09 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #30 from John David Anglin  ---
   0x0019c684 <+588>:   stw r23,0(r22)
=> 0x0019c688 <+592>:   stw ret1,0(r21)
   0x0019c68c <+596>:   stw r31,0(r20)
   0x0019c690 <+600>:   b,l 0x198d58 ,rp
   0x0019c694 <+604>:   stw ret0,0(r19)

These instructions are in a loop:

/* No * or ** args, so can use faster calling sequence */
for (i = 0; i < nelts; i++) {
expr_ty elt = asdl_seq_GET(args, i);
assert(elt->kind != Starred_kind);
VISIT(c, expr, elt);
}

r21 is clobbered by VISIT call.  Value is okay in first iteration.

The initialization instructions are outside the loop:

   0x0019c638 <+512>:   ldo 184(r19),r22
   0x0019c63c <+516>:   ldw 184(r19),r14
   0x0019c640 <+520>:   ldo 188(r19),r21
   0x0019c644 <+524>:   ldw 188(r19),r13
   0x0019c648 <+528>:   ldo 18c(r19),r20
   0x0019c64c <+532>:   ldw 18c(r19),r12
   0x0019c650 <+536>:   ldw 190(r19),r11

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-09 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #29 from John David Anglin  ---
The miscompilation is in compiler_visit_expr:

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program:
/home/dave/debian/python3.11/python3.11-3.11.6/build-static/Programs/_freeze_module
importlib._bootstrap ../Lib/importlib/_bootstrap.py
Python/frozen_modules/importlib._bootstrap.h
warning: Unable to find libthread_db matching inferior's thread library, thread
debugging will not be available.

Breakpoint 2, compiler_jump_if (c=0xf8f02508, e=0x5763f8, next=0xfaeaa908,
cond=0) at ../Python/compile.c:2898
2898{
(gdb) watch *0xfaea51b8
Watchpoint 3: *0xfaea51b8
(gdb) c
Continuing.

Watchpoint 3: *0xfaea51b8

Old value = -85046408
New value = 43
0x0019c688 in compiler_visit_expr (e=0x576308, c=0xf8f02508) at
../Python/compile.c:5968
5968SET_LOC(c, e);
(gdb) bt
#0  0x0019c688 in compiler_visit_expr (e=0x576308, c=0xf8f02508)
at ../Python/compile.c:5968
#1  compiler_call_helper (c=0xf8f02508, n=0, args=,
keywords=0x0) at ../Python/compile.c:5138
#2  0x0019ec70 in compiler_visit_expr (e=, c=0xf8f02508)
at ../Python/compile.c:5969
#3  compiler_jump_if (c=0xf8f02508, e=, next=0x0,
cond=) at ../Python/compile.c:2988
#4  0x001a0770 in compiler_if (s=0x0, c=0x5763c0) at ../Python/compile.c:3090
#5  compiler_visit_stmt (c=0x5763c0, s=0x0) at ../Python/compile.c:4118
#6  0x001a1378 in compiler_for (s=0x0, c=0x5763c0) at ../Python/compile.c:3124
#7  compiler_visit_stmt (c=0x5763c0, s=0x0) at ../Python/compile.c:4114
#8  0x001a3170 in compiler_function (c=0x2, s=,
is_async=) at ../Python/compile.c:2670
#9  0x001a3438 in compiler_body (c=0x0, stmts=0x5763c0)
at ../Python/compile.c:2180
#10 0x001a5cdc in compiler_mod (mod=0x0, c=0xf8f02528)
at ../Python/compile.c:2197
#11 _PyAST_Compile (mod=0x0, filename=0xf8f02528, flags=,
optimize=, arena=) at ../Python/compile.c:581
#12 0x001dea00 in Py_CompileStringObject (optimize=0, flags=0x5763c0, start=0,
filename=0x2, str=0x0) at ../Python/pythonrun.c:1799
#13 Py_CompileStringExFlags (str=0x0, filename_str=, start=0,
--Type  for more, q to quit, c to continue without paging--
flags=0x5763c0, optimize=) at ../Python/pythonrun.c:1812
#14 0x000167a4 in compile_and_marshal (text=0x0,
name=0x2 )
at ../Programs/_freeze_module.c:125
#15 main (argc=0, argv=) at ../Programs/_freeze_module.c:230
(gdb) diass $pc-16,$pc+16
Undefined command: "diass".  Try "help".
(gdb) disass $pc-16,$pc+16
Dump of assembler code from 0x19c678 to 0x19c698:
   0x0019c678 :   ldw 14(r25),ret1
   0x0019c67c :   ldw 18(r25),r31
   0x0019c680 :   ldw 1c(r25),ret0
   0x0019c684 :   stw r23,0(r22)
=> 0x0019c688 :   stw ret1,0(r21)
   0x0019c68c :   stw r31,0(r20)
   0x0019c690 :   b,l 0x198d58
,rp
   0x0019c694 :   stw ret0,0(r19)
End of assembler dump.

The code at 0x0019c688 clobbers the value at c->u->u_ste:
(gdb) p/x $r21
$35 = 0xfaea51b8
(gdb) p/x *c
$36 = {c_filename = 0xfaed9480, c_st = 0xfaeafd10, c_future = 0xfaef7030,
  c_flags = 0xf8f02544, c_optimize = 0x0, c_interactive = 0x0,
  c_nestlevel = 0x2, c_const_cache = 0xfae81280, u = 0xfaea51b8,
  c_stack = 0xfae57a88, c_arena = 0xfaec0c90}
(gdb) p/x *c->u
$37 = {u_ste = 0x2b, u_name = 0xfae7ff80, u_qualname = 0xfae7ff80,
  u_scope_type = 0x2, u_consts = 0xfaeaa7f8, u_names = 0xfaeaa7d0,
  u_varnames = 0xfaeaa780, u_cellvars = 0xfaeaa7a8, u_freevars = 0xfaeaa758,
  u_private = 0x0, u_argcount = 0x2, u_posonlyargcount = 0x0,
  u_kwonlyargcount = 0x0, u_blocks = 0xfaeaa908, u_curblock = 0xfaeaa868,
  u_nfblocks = 0x1, u_fblock = {{fb_type = 0x1, fb_block = 0xfaeaa840,
  fb_exit = 0xfaeaa8b8, fb_datum = 0x0}, {fb_type = 0x0, fb_block = 0x0,
  fb_exit = 0x0, fb_datum = 0x0} },
  u_firstlineno = 0x28, u_lineno = 0x2b, u_col_offset = 0xb,
  u_end_lineno = 0x2b, u_end_col_offset = 0x20,
  u_need_new_implicit_block = 0x0}
(gdb) p/x $r23
$38 = 0x2b

#define SET_LOC(c, x)   \
(c)->u->u_lineno = (x)->lineno; \
(c)->u->u_col_offset = (x)->col_offset; \
(c)->u->u_end_lineno = (x)->end_lineno; \
(c)->u->u_end_col_offset = (x)->end_col_offset;

(gdb) p/x *e
$40 = {kind = 0x18, v = {BoolOp = {op = 0xfaeb8b60, values = 0x1},
NamedExpr = {target = 0xfaeb8b60, value = 0x1}, BinOp = {
  left = 0xfaeb8b60, op = 0x1, right = 0x0}, UnaryOp = {op = 0xfaeb8b60,
  operand = 0x1}, Lambda = {args = 0xfaeb8b60, body = 0x1}, IfExp = {
  test = 0xfaeb8b60, body = 0x1, orelse = 0x0}, Dict = {keys = 0xfaeb8b60,
  values = 0x1}, Set = {elts = 0xfaeb8b60}, ListComp = {elt = 0xfaeb8b60,
  generators = 0x1}, SetComp = {elt = 0xfaeb8b60, generators = 0x1},
DictComp = {key = 0xfaeb8b60, value = 0x1, generators = 0x0},
GeneratorExp = {elt = 0xfaeb8b60, generators = 0x1}, Await = {
  value = 0xfaeb8b60}, Yield = {value = 0xfaeb8b60}, YieldFrom = {
  

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #28 from dave.anglin at bell dot net ---
On 2023-11-08 7:07 p.m., law at gcc dot gnu.org wrote:
> Do we already have a dump for the key function?  Presumably f-m-o doesn't
> trigger*that*  much.  And if this is triggering w/o LTO we can probably move 
> to
> cross debugging and analysis of those dump files and assembly code with and
> without f-m-o enabled, narrowing our focus on the key function.
I tried looking at the difference with and without f-m-o and it was quite
large.  The difference
with and without strict aliasing is much smaller.  The main differences that I
saw relate to the
inlining of compiler_visit_expr and compiler_visit_expr1.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #27 from dave.anglin at bell dot net ---
On 2023-11-08 7:00 p.m., John David Anglin wrote:
> On 2023-11-08 6:51 p.m., sjames at gcc dot gnu.org wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>>
>> --- Comment #23 from Sam James  ---
>> (In reply to Andrew Pinski from comment #21)
>>> The other option to try is -fstack-reuse=none. There is definitely known
>>> issues with the code that coalesces stack variables together too (see PR
>>> 111843 for examples).
>> I had a good feeling about this but no, didn't help when applied to 
>> compile.o.
> At this point, I don't know whether this is a python or gcc bug. I scanned 
> for unions in compile.i
> that might be problematic but I didn't find anything obvious.
Note -no-strict-aliasing affects the inlining of compiler_visit_expr.  It is
not inlined with -no-strict-aliasing.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #26 from Jeffrey A. Law  ---
As a compiler junkie, I tend to think compiler first until I can prove it
otherwise.  I wouldn't get too hung up on aliasing issues and such at this
point.

Do we already have a dump for the key function?  Presumably f-m-o doesn't
trigger *that* much.  And if this is triggering w/o LTO we can probably move to
cross debugging and analysis of those dump files and assembly code with and
without f-m-o enabled, narrowing our focus on the key function.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #25 from Sam James  ---
I am having the same thoughts. It would not be the first time Python had
something dubious, like...
* https://wiki.gentoo.org/wiki/Project:Python/Strict_aliasing ->
https://www.python.org/dev/peps/pep-3123/
* https://github.com/python/cpython/issues/78

So far, I did not see this failure on any other target (-> makes me think it's
a gcc bug). But also, I didn't yet see any other software break on hppa (->
makes me think it might be a Python bug).

I tried ubsan on amd64 with Python 3.12 at least and got a lot of different
errors, although ubsan does not diagnose aliasing issues...

I am undecided myself still.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #24 from dave.anglin at bell dot net ---
On 2023-11-08 6:51 p.m., sjames at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>
> --- Comment #23 from Sam James  ---
> (In reply to Andrew Pinski from comment #21)
>> The other option to try is -fstack-reuse=none. There is definitely known
>> issues with the code that coalesces stack variables together too (see PR
>> 111843 for examples).
> I had a good feeling about this but no, didn't help when applied to compile.o.
At this point, I don't know whether this is a python or gcc bug.  I scanned for
unions in compile.i
that might be problematic but I didn't find anything obvious.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #23 from Sam James  ---
(In reply to Andrew Pinski from comment #21)
> The other option to try is -fstack-reuse=none. There is definitely known
> issues with the code that coalesces stack variables together too (see PR
> 111843 for examples).

I had a good feeling about this but no, didn't help when applied to compile.o.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #22 from John David Anglin  ---
Created attachment 56542
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56542=edit
Preprocessed source and assembly files for Python/compile.c

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #21 from Andrew Pinski  ---
(In reply to dave.anglin from comment #20)
> Both -fno-strict-aliasing and -fno-schedule-insns2 applied to
> compiler_visit_expr()
> work around issue.

The other option to try is -fstack-reuse=none. There is definitely known issues
with the code that coalesces stack variables together too (see PR 111843 for
examples).

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #20 from dave.anglin at bell dot net ---
On 2023-11-08 2:07 p.m., pinskia at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>
> --- Comment #18 from Andrew Pinski  ---
> I wonder if -fno-strict-aliasing works around the issue too?
> I get the feeling that `fold mem offset pass` allows the aliasing code to have
> a better time with the offset and that might be expose more aliasing issues.
>
> The other thing to try is add `-fno-schedule-insns2 -fno-schedule-insns`
> instead of `-fno-strict-aliasing` as the scheduler is normally where the
> aliasing issues are exposed on the RTL level ...
Both -fno-strict-aliasing and -fno-schedule-insns2 applied to
compiler_visit_expr()
work around issue.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #19 from Jeffrey A. Law  ---
f-m-o runs post-allocation, so the scope of where it's behavior can change
things is narrower.  So testing with -fno-schedule-insns isn't going to be
useful, but -fno-schedule-insns2 might.

I'm a bit concerned that we can't turn off f-m-o with an attribute.  That would
indicating something isn't wired up right in the options handling.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #18 from Andrew Pinski  ---
I wonder if -fno-strict-aliasing works around the issue too?
I get the feeling that `fold mem offset pass` allows the aliasing code to have
a better time with the offset and that might be expose more aliasing issues.

The other thing to try is add `-fno-schedule-insns2 -fno-schedule-insns`
instead of `-fno-strict-aliasing` as the scheduler is normally where the
aliasing issues are exposed on the RTL level ...

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #17 from dave.anglin at bell dot net ---
On 2023-11-08 9:42 a.m., jeffreyalaw at gmail dot com wrote:
> I'd probably continue with the process of narrowing down what code is
> affected using the attributes.  We already know the file, narrowing it
> down to a function might help considerably with the evaluation effort.
The problem seems to be in compiler_visit_expr().

-static int compiler_visit_expr(struct compiler *, expr_ty);
+static int compiler_visit_expr(struct compiler *, expr_ty)
__attribute__((optimize("no-inline-small-functions")));

Python builds okay if this function is not inlined, if it is compiled at -O1,
or if -fno-inline-small-functions is
specified as above.  Can't specify -fno-fold-mem-offsets as a function
attribute.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread jeffreyalaw at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #16 from Jeffrey A. Law  ---
On 11/8/23 03:09, manolis.tsamis at vrull dot eu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
> 
> --- Comment #15 from Manolis Tsamis  ---
> (In reply to Sam James from comment #13)
>> Created attachment 56527 [details]
>> compile.c.323r.fold_mem_offsets.bad.xz
>>
>> Output from
>> ```
>> hppa2.0-unknown-linux-gnu-gcc -c  -DNDEBUG -g -fwrapv -O3 -Wall -O2
>> -std=c11 -Werror=implicit-function-declaration -fvisibility=hidden
>> -I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I.
>> -I/home/sam/git/cpython/Include-DPy_BUILD_CORE -o Python/compile.o
>> /home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all
>> ```
>>
>> If I instrument certain functions in compile.c with no optimisation
>> attribuet or build the file with -fno-fold-mem-offsets, Python works, so I'm
>> reasonably sure this is the relevant object.
> 
> Thanks for the dump file! There are 66 folded/eliminated instructions in this
> object file; I did look at each case and there doesn't seem to be anything
> strange. In fact most of the transformations are straightforward:
> 
>   - All except a couple of cases don't involve any arithmetic, so it's just
> moving a constant around.
>   - The majority of the transformations are 'trivial' and consist of a single
> add and then a memory operation: a sequence like X = Y + Const, R = MEM[X + 0]
> is folded to X = Y, R = MEM[X + Const]. I wonder why so many of these exist 
> and
> are not optimized elsewhere.
>   - There are some cases with negative offsets, but the calculations look
> correct.
>   - There are few more complicated cases, but I've done these on paper and 
> also
> look correct.
The PA port is "weird".  It's addressing modes aren't a good match for 
GCC (they're not symmetrical across loads vs stores and across fp vs 
integer) and they have the implicit space register problem.  But I don't 
immediately recall needing to avoid propagation of constants into memory 
references or anything like that.

I'd probably continue with the process of narrowing down what code is 
affected using the attributes.  We already know the file, narrowing it 
down to a function might help considerably with the evaluation effort.

Note that QEMU has a functional PA port.  So you might be able to just 
take a root filesystem, add the tarball referenced earlier and play 
around to narrow things down further.

I haven't done work on the PA in about 20 years at this point, but I can 
probably still grok its code.  Between David and myself I'm sure we can 
help interpret what's going on


Jeff

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #15 from Manolis Tsamis  ---
(In reply to Sam James from comment #13)
> Created attachment 56527 [details]
> compile.c.323r.fold_mem_offsets.bad.xz
> 
> Output from
> ```
> hppa2.0-unknown-linux-gnu-gcc -c  -DNDEBUG -g -fwrapv -O3 -Wall -O2  
> -std=c11 -Werror=implicit-function-declaration -fvisibility=hidden 
> -I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I.
> -I/home/sam/git/cpython/Include-DPy_BUILD_CORE -o Python/compile.o
> /home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all
> ```
> 
> If I instrument certain functions in compile.c with no optimisation
> attribuet or build the file with -fno-fold-mem-offsets, Python works, so I'm
> reasonably sure this is the relevant object.

Thanks for the dump file! There are 66 folded/eliminated instructions in this
object file; I did look at each case and there doesn't seem to be anything
strange. In fact most of the transformations are straightforward:

 - All except a couple of cases don't involve any arithmetic, so it's just
moving a constant around.
 - The majority of the transformations are 'trivial' and consist of a single
add and then a memory operation: a sequence like X = Y + Const, R = MEM[X + 0]
is folded to X = Y, R = MEM[X + Const]. I wonder why so many of these exist and
are not optimized elsewhere.
 - There are some cases with negative offsets, but the calculations look
correct.
 - There are few more complicated cases, but I've done these on paper and also
look correct.

Of course I could be missing some more complicated effect, but what I want to
say is that everything looks sensible in this particular file.

> Thanks! You are very welcome to have access to some HPPA machines for
> this kind of work. Please email me an SSH public key + desired username
> if that sounds helpful.

Yes, since I couldn't find anything interesting in the dump, that would
definitely be helpful. Thanks!

Manolis

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-07 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #14 from dave.anglin at bell dot net ---
On 2023-11-07 8:36 p.m., sjames at gcc dot gnu.org wrote:
> If I instrument certain functions in compile.c with no optimisation attribuet
> or build the file with -fno-fold-mem-offsets, Python works, so I'm reasonably
> sure this is the relevant object.
I believe this bug is related to https://gcc.gnu.org/PR97431
I see the same fault with using debian/rules and -finline-small-functions
option.

Debian has been building with -fno-inline-small-functions on sh and hppa.  This
hides
problem.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-07 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #13 from Sam James  ---
Created attachment 56527
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56527=edit
compile.c.323r.fold_mem_offsets.bad.xz

Output from
```
hppa2.0-unknown-linux-gnu-gcc -c  -DNDEBUG -g -fwrapv -O3 -Wall -O2   -std=c11
-Werror=implicit-function-declaration -fvisibility=hidden 
-I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I.
-I/home/sam/git/cpython/Include-DPy_BUILD_CORE -o Python/compile.o
/home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all
```

If I instrument certain functions in compile.c with no optimisation attribuet
or build the file with -fno-fold-mem-offsets, Python works, so I'm reasonably
sure this is the relevant object.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-07 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #12 from Sam James  ---
(In reply to Manolis Tsamis from comment #11)
> Hi all,
> 
> I will also go ahead and try to reproduce that, although it may take me some
> time due to my limited experience with HPPA. Once I manage to reproduce,
> most f-m-o issues are straightforward to locate by bisecting the transformed
> instructions.

Thanks! You are very welcome to have access to some HPPA machines for this kind
of work. Please email me an SSH public key + desired username if that sounds
helpful.

> 
> > I think the key object is Python/compile.o, but not certain yet.
> 
> In this case the dump file of fold-mem-offsets
> (-fdump-rtl-fold_mem_offsets-all) could also be useful, as it contains all
> the information needed to see whether a transformation is valid. If it would
> be easy for anyone to provide the dump file, I could look at it and see if
> anything stands out (until I manage to reproduce this).

I'll get the dumps in a moment, thanks.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-07 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #11 from Manolis Tsamis  ---
Hi all,

I will also go ahead and try to reproduce that, although it may take me some
time due to my limited experience with HPPA. Once I manage to reproduce, most
f-m-o issues are straightforward to locate by bisecting the transformed
instructions.

> I think the key object is Python/compile.o, but not certain yet.

In this case the dump file of fold-mem-offsets
(-fdump-rtl-fold_mem_offsets-all) could also be useful, as it contains all the
information needed to see whether a transformation is valid. If it would be
easy for anyone to provide the dump file, I could look at it and see if
anything stands out (until I manage to reproduce this).

Thanks,
Manolis

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #10 from dave.anglin at bell dot net ---
On 2023-11-06 5:49 p.m., sjames at gcc dot gnu.org wrote:
> Program received signal SIGSEGV, Segmentation fault.
> 0x412083f0 in _PyST_GetSymbol (name=0xf9a34a00, ste=) at
> Python/symtable.c:396
> 396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
> (gdb) x/20i $pc
> => 0x412083f0 <_PyST_GetScope+20>:  ldw c(r26),r26
r26=0x34, so the ldw will fault.  It appears r26 and r25 have been exchanged in
the code
prior to <_PyST_GetScope+20>.  In any case, the problem is with the ste
argument passed
to  _PyST_GetSymbol.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #9 from Sam James  ---
I think the key object is Python/compile.o, but not certain yet.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #8 from Sam James  ---
(In reply to Jeffrey A. Law from comment #6)

Program received signal SIGSEGV, Segmentation fault.
0x412083f0 in _PyST_GetSymbol (name=0xf9a34a00, ste=) at
Python/symtable.c:396
396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
(gdb) x/20i $pc
=> 0x412083f0 <_PyST_GetScope+20>:  ldw c(r26),r26
   0x412083f4 <_PyST_GetScope+24>:  movb,= ret0,r26,0x41208414
<_PyST_GetScope+56>
   0x412083f8 <_PyST_GetScope+28>:  copy r4,r19
   0x412083fc <_PyST_GetScope+32>:  b,l 0x410d6900 ,rp
   0x41208400 <_PyST_GetScope+36>:  nop
   0x41208404 <_PyST_GetScope+40>:  ldw -54(sp),rp
   0x41208408 <_PyST_GetScope+44>:  extrw,u ret0,20,4,ret0
   0x4120840c <_PyST_GetScope+48>:  bve (rp)
   0x41208410 <_PyST_GetScope+52>:  ldw,mb -40(sp),r4
   0x41208414 <_PyST_GetScope+56>:  copy r26,ret0
   0x41208418 <_PyST_GetScope+60>:  ldw -54(sp),rp
   0x4120841c <_PyST_GetScope+64>:  bve (rp)
   0x41208420 <_PyST_GetScope+68>:  ldw,mb -40(sp),r4
   0x41208424 <_Py_SymtableStringObjectFlags>:  stw rp,-14(sp)
   0x41208428 <_Py_SymtableStringObjectFlags+4>:stw,ma r8,80(sp)
   0x4120842c <_Py_SymtableStringObjectFlags+8>:copy r23,r8
   0x41208430 <_Py_SymtableStringObjectFlags+12>:   stw r7,-7c(sp)
   0x41208434 <_Py_SymtableStringObjectFlags+16>:   copy r24,r7
   0x41208438 <_Py_SymtableStringObjectFlags+20>:   stw r6,-78(sp)
   0x4120843c <_Py_SymtableStringObjectFlags+24>:   copy r25,r6
(gdb)

(gdb) i r
flags  
r1 0x411bc688  1092339336
rp 0x412083f7  1092649975
r3 0x1 1
r4 0x4136c000  1094107136
r5 0xf9a34a00  4188228096
r6 0x8d141
r7 0xf7b03b88  4155521928
r8 0xf7b03ba8  4155521960
r9 0xf9953b68  4187306856
r100x0 0
r110x8e142
r120x414e1820  1095637024
r130x414e4490  1095648400
r140xf9a76498  4188497048
r150x1 1
r160xf99bb5e8  4187731432
r170xf9ae11b4  4188934580
r180xf99e3b68  4187896680
r190x4136c000  1094107136
r200x411bc7f0  1092339696
r210x41450268  1095041640
r220x8d141
r230x1 1
r240x1 1
r250xf9a34a00  4188228096
r260x3452
dp 0x4136c000  1094107136
ret0   0xf9964020  4187373600
ret1   0x8d141
sp 0xf7b04080  4155523200
r310x1 1
sar0x3d61
pcoqh  0x412083f3  1092649971
pcsqh  
pcoqt  0x410e4c0f  1091456015
pcsqt  
eiem   
iir
isr
ior
ipsw   0xeff0f 982799
goto   
sr4
sr0
sr1
sr2
sr3
sr5
sr6
sr7
cr0
cr8
cr9
ccr
cr12   
cr13   
cr24   
cr25   
cr26   0xeff0f 982799
mpsfu_high 0xf7afa500  4155483392
mpsfu_low  
mpsfu_ovflo
pad
fpsr   
fpe1   
fpe2   
fpe3   
fpe4   
fpe5   
fpe6   
fpe7   
(gdb)

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #7 from dave.anglin at bell dot net ---
On 2023-11-06 5:20 p.m., law at gcc dot gnu.org wrote:
> The biggest concern I'd have with f-m-o on the PA would be the
> implicit segment selection that happens on the base register -- but it would
> only be an issue if we are faulting on an unscaled indexed addressing mode and
> only if the linux-gnu port was actually putting different values into the 
> space
> registers.
The linux-gnu port does not put different values into the space resisters.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #6 from Jeffrey A. Law  ---
Do we have assembly code around the faulting point (x/20i $pc) and a register
dump (i r)?  The biggest concern I'd have with f-m-o on the PA would be the
implicit segment selection that happens on the base register -- but it would
only be an issue if we are faulting on an unscaled indexed addressing mode and
only if the linux-gnu port was actually putting different values into the space
registers.

WRT testing -- we did test this on hppa1.1-linux-gnu.  Just a bootstrap and
regression test of the compiler itself.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Sam James  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #5 from Sam James  ---
Built with 14.0.0 20231029.

*
https://dev.gentoo.org/~sam/bugs/gcc/gcc-python-hppa/cpython-3.11.6-good.tar.xz
*
https://dev.gentoo.org/~sam/bugs/gcc/gcc-python-hppa/cpython-3.11.6-bad.tar.xz

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #4 from Sam James  ---
Created attachment 56520
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56520=edit
list_of_differing_files.txt

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #3 from dave.anglin at bell dot net ---
On 2023-11-06 4:00 p.m., sjames at gcc dot gnu.org wrote:
> Program received signal SIGSEGV, Segmentation fault.
> 0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at
> Python/symtable.c:396
> 396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
> (gdb) bt
> #0  0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at
> Python/symtable.c:396
> #1  _PyST_GetScope (ste=, name=0xf9a33a60) at
> Python/symtable.c:406
Probably, ste is NULL or in page 0, and it's symtable.c that's miscompiled.

There's not a lot of testing of gcc-14 on hppa yet.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |14.0
 Target||hppa2.0-unknown-linux-gnu

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #2 from Sam James  ---
I'll grab a bad vs good build directory next and upload both, and then try see
which objects differ.

Dave, can you reproduce?

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Sam James  changed:

   What|Removed |Added

Summary|[14 regression] Python 3.11 |[14 regression] Python 3.11
   |miscompiled with new RTL|miscompiled on HPPA with
   |fold mem offset pass, since |new RTL fold mem offset
   |r14-4664-g04c9cf5c786b94|pass, since
   ||r14-4664-g04c9cf5c786b94

--- Comment #1 from Sam James  ---
Backtrace from the crashing Python:
```
(gdb) r
Starting program:
/var/tmp/portage/dev-lang/python-3.11.6/work/Python-3.11.6/_bootstrap_python
./Tools/scripts/deepfreeze.py
Python/frozen_modules/importlib._bootstrap.h:importlib._bootstrap
Python/frozen_modules/importlib._bootstrap_external.h:importlib._bootstrap_external
Python/frozen_modules/zipimport.h:zipimport Python/frozen_modules/abc.h:abc
Python/frozen_modules/codecs.h:codecs Python/frozen_modules/io.h:io
Python/frozen_modules/_collections_abc.h:_collections_abc
Python/frozen_modules/_sitebuiltins.h:_sitebuiltins
Python/frozen_modules/genericpath.h:genericpath
Python/frozen_modules/ntpath.h:ntpath
Python/frozen_modules/posixpath.h:posixpath Python/frozen_modules/os.h:os
Python/frozen_modules/site.h:site Python/frozen_modules/stat.h:stat
Python/frozen_modules/importlib.util.h:importlib.util
Python/frozen_modules/importlib.machinery.h:importlib.machinery
Python/frozen_modules/runpy.h:runpy Python/frozen_modules/__hello__.h:__hello__
Python/frozen_modules/__phello__.h:__phello__
Python/frozen_modules/__phello__.ham.h:__phello__.ham
Python/frozen_modules/__phello__.ham.eggs.h:__phello__.ham.eggs
Python/frozen_modules/__phello__.spam.h:__phello__.spam
Python/frozen_modules/frozen_only.h:frozen_only -o
Python/deepfreeze/deepfreeze.c
warning: File "/usr/lib/libthread_db.so.1" auto-loading has been declined by
your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /usr/lib/libthread_db.so.1
line to your configuration file "/root/.config/gdb/gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/root/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library, thread
debugging will not be available.

Program received signal SIGSEGV, Segmentation fault.
0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at
Python/symtable.c:396
396 PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
(gdb) bt
#0  0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=) at
Python/symtable.c:396
#1  _PyST_GetScope (ste=, name=0xf9a33a60) at
Python/symtable.c:406
#2  0x411bb8f8 in compiler_nameop (c=0xf7b03b88, name=,
ctx=Load) at Python/compile.c:4274
#3  0x411be074 in compiler_visit_expr (c=0x1, e=) at
Python/compile.c:5969
#4  0x411bcc88 in compiler_visit_expr1 (c=0xf7b03b88, e=0x1) at
Python/compile.c:5915
#5  0x411be074 in compiler_visit_expr (c=0x1, e=) at
Python/compile.c:5969
#6  0x411bceac in compiler_call (e=0x1, c=0xf7b03b88) at Python/compile.c:4952
#7  compiler_visit_expr1 (c=0xf7b03b88, e=0x1) at Python/compile.c:5905
#8  0x411c1f34 in compiler_visit_expr (e=, c=0xf9a33a60) at
Python/compile.c:5969
#9  compiler_decorators (decos=0x8d, c=0xf9a33a60) at Python/compile.c:2327
#10 compiler_class (c=0xf9a33a60, s=0x414e4490) at Python/compile.c:2702
#11 0x411c566c in compiler_body (c=0xf7b03b88, stmts=0xf9a33a60) at
Python/compile.c:2180
#12 0x411c7e98 in compiler_mod (mod=0xf7b03b88, c=0x0) at Python/compile.c:2197
#13 _PyAST_Compile (mod=0xf7b03b88, filename=0x8d, flags=,
optimize=, arena=) at Python/compile.c:581
#14 0x411fe7b8 in Py_CompileStringObject (str=0xf7b03b88
"\371\240\277\220\371\236\353`\371\257\221\260\367\260:t", filename=0x8d,
start=-139445336, flags=0xf9a33a60, optimize=)
at Python/pythonrun.c:1799
#15 0x4119c334 in builtin_compile_impl (module=,
feature_version=, optimize=,
dont_inherit=, flags=, mode=,
filename=0xf998db68, source=0x8d) at Python/bltinmodule.c:831
#16 builtin_compile (module=, args=,
nargs=, kwnames=) at
Python/clinic/bltinmodule.c.h:328
#17 0x410f3ae4 in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0xf9a33a60,
args=0x8d, nargsf=, kwnames=) at
./Include/cpython/methodobject.h:52
#18 0x4109fa88 in _PyVectorcall_Call (tstate=0xf7b03b88, func=,
callable=0xf9a33a60, tuple=, kwargs=) at
Objects/call.c:257
#19 0x4109fd28 in _PyObject_Call (tstate=0xf9a33a60, callable=0x1,
args=0xf7b03ba8, kwargs=0x8d) at Objects/call.c:328
#20 0x4109fdb8 in PyObject_Call () at Objects/call.c:352
#21 0x411a47c8 in do_call_core (tstate=0x8d, func=0x1, callargs=0xf9a33a60,