[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #16 from Andreas Krebbel  ---
(In reply to Aleksei Nikiforov from comment #15)
> I think fixing compiled code should be possible. I'm not sure if this bug
> should be just closed.

In addition to fixing the PyTorch usage of the builtin, I also plan to change
GCC to the "alias everything" approach now. Although the documentation does not
strictly requires us to, it prevents other users from falling into the same
trap and makes GCC to match what Clang already does. The documentation anyway
discourages everyone from using these builtins. So it should not be a big deal,
if we sacrifice a bit of performance to make it more robust.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-17 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #13 from Andreas Krebbel  ---
We will go and fix PyTorch instead. Although it is not clearly documented, the
way PyTorch uses the builtin right now is probably not what was intended. It is
pretty clear that the element type pointer needs to alias vectors of the same
element type, but there is no saying about aliasing everything.

I'm just wondering how to improve the diagnostics in our backend to catch this.
The example below is similar to what PyTorch does today. Casting mem to
(float*) prevents our builtin code from complaining about the type mismatch and
by that opens the door for the much harder to debug TBAA problem.

#include 

void __attribute__((noinline)) foo (int *mem)
{
  vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem);
}

int
main ()
{
  int m[4] = { 0 };
  foo (m);
  if (m[3] == 0)
__builtin_abort ();
  return 0;
}

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #11 from Andreas Krebbel  ---
The documentation of vec_xl and vec_xst doesn't seem to mention anything
special with regard to that. So I understand the memory is only accessed
through pointers which are compatible to the ones used when invoking the
builtin.

That particular usage within pytorch looks ok to me.

I'm already testing a patch which matches what you are proposing. I hope to be
able to reduce the testcase somewhat.

Thanks for your help!

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #8 from Andreas Krebbel  ---
Apparently, I decided to go with a MEM_REF already for the load variant of the
builtin - vec_xl. I've to check whether there was any reason not to do this
also for vec_xst.

Making it a pointer which aliases everything might be too big of a hammer I
guess?!

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #33 from Andreas Krebbel  ---
(In reply to Andrew Pinski from comment #26)
...
> I suspect if we change the s390 backend just slightly to set the cost when
> there is an index to the address to 1 for the MEM, combine won't be acting
> up here.
> Basically putting in sync the 2 cost methods.

I've tried that but this didn't change anything. As you have expected the
problem goes away when letting s390_address_cost always return 0.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #32 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #25)
> So this testcase compiles on powerpc64-linux (-O2) in about 34s.  Is s390x
> way worse, or is this in lie what you are seeing?

Way worse. See #c22 : 20s before your commit and 5min with it.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #23 from Andreas Krebbel  ---
Created attachment 57646
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57646=edit
Testcase for comment #22

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #22 from Andreas Krebbel  ---
I did a git bisect which ended up pointing at this commit, somewhere between
GCC 8 and 9:

commit c4c5ad1d6d1e1e1fe7a1c2b3bb097cc269dc7306 (bad)
Author: Segher Boessenkool 
Date:   Mon Jul 30 15:18:17 2018 +0200

combine: Allow combining two insns to two insns

This patch allows combine to combine two insns into two.  This helps
in many cases, by reducing instruction path length, and also allowing
further combinations to happen.  PR85160 is a typical example of code
that it can improve.

This patch does not allow such combinations if either of the original
instructions was a simple move instruction.  In those cases combining
the two instructions increases register pressure without improving the
code.  With this move test register pressure does no longer increase
noticably as far as I can tell.

(At first I also didn't allow either of the resulting insns to be a
move instruction.  But that is actually a very good thing to have, as
should have been obvious).

With this command line:
cc1plus -O2 -march=z196 -fpreprocessed Q111-8.ii -quiet

before:   20s compile-time and21846 total combine attempts
after: > 5min compile-time and 43175686 total combine attempts

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #21 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #16)
...
> When some insns have changed (or might have changed, combine does not always
> know
> the details), combinations of the insn with later insns are tried again. 
> Sometimes
> this finds new combination opportunities.
> 
> Not retrying combinations after one of the insns has changed would be a
> regression.

Wouldn't it in this particular case be possible to recognize already in
try_combine that separating the move out of the parallel cannot lead to
additional optimization opportunities? To me it looks like we are just
recreating the situation we had before merging the INSNs into a parallel. Is
there a situation where this could lead to any improvement in the end?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #20 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #17)
...
> So what is really happening?  And, when did this start, anyway, because
> apparently at some point in time all was fine?

Due to the C++ constructs used the testcase doesn't compile with much older
GCCs. However, I can confirm that the problem can already be reproduced with
GCC 11.1.0.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #19 from Andreas Krebbel  ---
(In reply to Sarah Julia Kriesch from comment #15)
> (In reply to Segher Boessenkool from comment #13)
> > (In reply to Sarah Julia Kriesch from comment #12)
> > A bigger case of what?  What do you mean?
> Not only one software package is affected by this bug. "Most" software
> builds are affected. As Andreas mentioned correctly, the fix is also
> beneficial for other projects/target software.

I don't think we have any evidence yet that this is the problem which also hits
us with other packages builds. If you have other cases please open separate BZs
for that and we will try to figure out whether it is actually a DUP of this
one.

With "targets" I meant other GCC build targets. This pattern doesn't look
s390x-specific to me, although I haven't tried to reproduce it somewhere else.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #14 from Andreas Krebbel  ---
If my analysis from comment #1 is correct, combine does superfluous steps here.
Getting rid of this should not cause any harm, but should be beneficial for
other targets as well. I agree that the patch I've proposed is kind of a hack.
Do you think this could be turned into a proper fix?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #10 from Andreas Krebbel  ---
Created attachment 57599
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57599=edit
Testcase - somewhat reduced from libecpint

Verified with rev 146f16c97f6

cc1plus -O2 t.cc

try_combine invocations:
x86:
3
27262
27603

s390x:
8
40439657
40440339

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-02-23 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Andreas Krebbel  changed:

   What|Removed |Added

 CC||stefansf at linux dot ibm.com

--- Comment #4 from Andreas Krebbel  ---
Hi Segher, any guidance on how to proceed with that? This recently was brought
up by distro people again because it is causing actual problems in their build
setups.

[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values

2023-12-13 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986

Andreas Krebbel  changed:

   What|Removed |Added

 CC||shinwogud12 at gmail dot com

--- Comment #7 from Andreas Krebbel  ---
*** Bug 112665 has been marked as a duplicate of this bug. ***

[Bug target/112665] I am getting incorrect output values at optimization level 2 in GCC for the s390x architecture.

2023-12-13 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112665

Andreas Krebbel  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE
 CC||krebbel at gcc dot gnu.org

--- Comment #2 from Andreas Krebbel  ---
I can confirm this when running the program with qemu but not on real hardware.
The code is also using the chrl instruction so I guess this is another instance
of PR112986

*** This bug has been marked as a duplicate of bug 112986 ***

[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values

2023-12-13 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986

Andreas Krebbel  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Andreas Krebbel  ---
No problem. Thanks for testing s390x!
I've requested the qemu fix to be included into Ubuntu 22.04. Closing the BZ
now.

[Bug target/112996] Improperly evaluated value of the s390x conditional expression

2023-12-13 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112996

Andreas Krebbel  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||iii at linux dot ibm.com
 Resolution|--- |DUPLICATE

--- Comment #2 from Andreas Krebbel  ---
Same as with PR112986. Confirmed with qemu. Runs fine on real hardware.

*** This bug has been marked as a duplicate of bug 112986 ***

[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values

2023-12-13 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986

--- Comment #3 from Andreas Krebbel  ---
*** Bug 112996 has been marked as a duplicate of this bug. ***

[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values

2023-12-13 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986

Andreas Krebbel  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-12-13
 CC||iii at linux dot ibm.com

--- Comment #2 from Andreas Krebbel  ---
I can confirm the failure when running the binaries with qemu.

However, the binaries run as expected on real hardware. So it might rather be a
qemu issue.

@Ilya:Ubuntu 22.04 is using qemu 6.2.0. Is this perhaps something you have
fixed already?

[Bug pch/112319] New: segfault with pch and #pragma GCC diagnostic

2023-10-31 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112319

Bug ID: 112319
   Summary: segfault with pch and #pragma GCC diagnostic
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: pch
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

touch s.h
touch u.h

main.cpp:
#include "s.h"
#include "u.h"

g++ s.h
g++ -c main.cpp --save-temps
In file included from main.cpp:2:
u.h:1:9: internal compiler error: Segmentation fault
1 | #pragma GCC diagnostic
  | ^~~
0x1764fbf crash_signal
/home/andreas/build/../gcc/gcc/toplev.cc:314
0x10201cd maybe_read_tokens_for_pragma_lex
/home/andreas/build/../gcc/gcc/cp/parser.cc:49713
0x10201cd pragma_lex(tree_node**, unsigned int*)
/home/andreas/build/../gcc/gcc/cp/parser.cc:49735
0x11a962c pragma_diagnostic_lex
/home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:851
0x11a9d78 handle_pragma_diagnostic_impl
/home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:879
0x11a9d78 handle_pragma_diagnostic_early_pp
/home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:1039
0x11abdbd c_pp_invoke_early_pragma_handler(unsigned int)
/home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:1769
0x11a8180 token_streamer::stream(cpp_reader*, cpp_token const*, unsigned int)
/home/andreas/build/../gcc/gcc/c-family/c-ppoutput.cc:293
0x11a8461 scan_translation_unit
/home/andreas/build/../gcc/gcc/c-family/c-ppoutput.cc:351
0x11a8461 preprocess_file(cpp_reader*)
/home/andreas/build/../gcc/gcc/c-family/c-ppoutput.cc:106
0x11a693d c_common_init()
/home/andreas/build/../gcc/gcc/c-family/c-opts.cc:1236
0xfa35be cxx_init()
/home/andreas/build/../gcc/gcc/cp/lex.cc:338
0xe94153 lang_dependent_init
/home/andreas/build/../gcc/gcc/toplev.cc:1816
0xe94153 do_compile
/home/andreas/build/../gcc/gcc/toplev.cc:2111
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Can be reproduced on x86 and s390x since:

e664ea960a200aac88ffc3c7fb9fe55ea4df2011 is the first bad commit
commit e664ea960a200aac88ffc3c7fb9fe55ea4df2011 
Author: Lewis Hyatt   
Date:   Fri Jun 30 18:23:24 2023 -0400  

c-family: Implement pragma_lex () for preprocess-only mode

[Bug tree-optimization/111039] New: Unable to coalesce ssa_names

2023-08-16 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111039

Bug ID: 111039
   Summary: Unable to coalesce ssa_names
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

compiler_corruption_function(flags) {
  int nowait = flags & 1048576, isexpand = flags & 8388608;
  abcd();
  _setjmp(flags);
  if (nowait && isexpand)
flags &= 0;
  abcde();
}

gcc -mbranch-cost=0 -O t.c
verified with recent GCC: 02ecc9a2632

t.c: In function ‘compiler_corruption_function’:
t.c:1:1: internal compiler error: SSA corruption
1 | compiler_corruption_function(flags) {
  | ^~~~
0x15a657c fail_abnormal_edge_coalesce
/home/andreas/build/../gcc/gcc/tree-ssa-coalesce.cc:1003
0x15a657c coalesce_partitions
/home/andreas/build/../gcc/gcc/tree-ssa-coalesce.cc:1425
0x15a657c coalesce_ssa_name(_var_map*)
/home/andreas/build/../gcc/gcc/tree-ssa-coalesce.cc:1755
0x153d6cf remove_ssa_form
/home/andreas/build/../gcc/gcc/tree-outof-ssa.cc:1065
0x153d6cf rewrite_out_of_ssa(ssaexpand*)
/home/andreas/build/../gcc/gcc/tree-outof-ssa.cc:1323
0xf42073 execute
/home/andreas/build/../gcc/gcc/cfgexpand.cc:6610

This very much looks like another instance of PR71020.

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Andreas Krebbel  changed:

   What|Removed |Added

Version|13.0|12.2.1

--- Comment #16 from Andreas Krebbel  ---
The testcase fails on GCC 12.2.1 as well. Should we apply it there as well
after giving it some time in mainline?

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Andreas Krebbel  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Andreas Krebbel  ---
Your patch fixes the problem for me. Thanks for the quick fix!

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2022-12-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

--- Comment #7 from Andreas Krebbel  ---
(In reply to Andrew Pinski from comment #6)
> (In reply to Andreas Krebbel from comment #5)
> > In:
> > 
> >   _1 = src_6(D)->a;
> >   dst$val_9 = _1;
> >   _2 = BIT_FIELD_REF ;
> >   _3 = _2 & 64;
> >   if (_3 != 0)
> 
> There is only 2 accesses going on in the above IR because SRA removed the
> 3rd when it replaced the access of dst.val with dst$val but didn't update
> BIT_FIELD_REF to remove the byteswap ...

Ok, got it. It isn't the removal of the assignment. As you say it happens in
early SRA when changing dst.val to dst$val and with that going from the union
with the storage order marker to a long int without it. The marker on the
BIT_FIELD_REF needs to be in sync with the marker on its inner reference.
Dropping one without adjusting the other is the problem here. Thanks for the
pointer!

The following change helps with that testcase:

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 8dfc923ed7e..6b1ce6e8b4a 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3815,8 +3815,13 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi,
bool write)
}
}
   else
-   *expr = repl;
-  sra_stats.exprs++;
+   {
+ if (bfr && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (*expr)))
+   REF_REVERSE_STORAGE_ORDER (bfr) = 0;
+
+ *expr = repl;
+ sra_stats.exprs++;
+   }
 }
   else if (write && access->grp_to_be_debug_replaced)
 {

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2022-12-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

--- Comment #5 from Andreas Krebbel  ---
In:

  _1 = src_6(D)->a;
  dst$val_9 = _1;
  _2 = BIT_FIELD_REF ;
  _3 = _2 & 64;
  if (_3 != 0)

src, dst and the BIT_FIELD_REF carry storage order flags which result in either
bswaps being emitted or, in case of the bitfield, the constant for the compare
to be adjusted. So from reading "src" to evaluating "_2" 3 "bswaps" will be
applied. After dropping the assignment to dst only two remain which cancel each
other out. So in the end we access the value without any adjustments.

Just to check I did:

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index b36dd97802b..b858194a432 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1820,6 +1820,7 @@ handle_scalar_storage_order_attribute (tree *node, tree
name, tree args,
}

   TYPE_REVERSE_STORAGE_ORDER (type) = reverse;
+  TYPE_VOLATILE (type) = reverse;
   return NULL_TREE;
 }

As expected this "fixes" the problem but is probably too big of a hammer here
since it basically voids many of the advantages of the attribute which is
folding away many of the bswaps.

[Bug tree-optimization/108199] Bitfields and storage_order_attribute

2022-12-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

--- Comment #3 from Andreas Krebbel  ---
Moving the local definition of dst out of the function to global scope prevents
the store from getting eliminated.

union DST dst;

As expected the store is still in the FRE dump:

  _1 = src_6(D)->a;
  dst.val = _1;<---
  _2 = BIT_FIELD_REF ;
  _3 = _2 & 64;
  if (_3 != 0)
...

and the first by is accessed:

bar:
movq(%rdi), %rax
movq%rax, dst(%rip)
testb   $64, %al
jne .L11

[Bug tree-optimization/108199] Bitfields and storage_order_attribute

2022-12-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Andreas Krebbel  changed:

   What|Removed |Added

 Target||x86_64
  Build||x86_64
   Keywords||wrong-code
   Host||x86_64

--- Comment #2 from Andreas Krebbel  ---
The testcase does an assigned between two struct with endianess differing from
host endianess (assumed to be little). Here the required byteswaps are supposed
to cancel each other out. After that a bitfield comparison on the target struct
is done. This comparison uses the wrong byte offset into the bitfield:

testb   $64, 7(%rdi)
jne .L11

On a big endian target the first bits in the bitfield are supposed to reside in
the first bytes in memory.

The problem appears to get introduced when dead store elimination removes the
assignment to the target struct in FRE.

Before FRE we  have the following:

  _1 = src_6(D)->a;   bswap
  dst$val_9 = _1; bswap
  _2 = BIT_FIELD_REF ;  bswap
  _3 = _2 & 64;
  if (_3 != 0)
...
This would result in 3 bswaps chained to each other. However, after FRE we have
only two because the dead store to dst$val is removed.

  _1 = src_6(D)->a;
  _2 = BIT_FIELD_REF <_1, 8, 0>;
  _3 = _2 & 64;
  if (_3 != 0)

Now we have only which cancel each other out.

Looks like we have to prevent depending stores/loads with different endianess
from getting removed - perhaps by making them also volatile? I think we have to
keep the number of memory accesses with foreign endianess constant over the
optimizations.

[Bug tree-optimization/108199] Bitfields and storage_order_attribute

2022-12-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

--- Comment #1 from Andreas Krebbel  ---
Created attachment 54150
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54150=edit
Testcase

[Bug tree-optimization/108199] New: Bitfields and storage_order_attribute

2022-12-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Bug ID: 108199
   Summary: Bitfields and storage_order_attribute
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

[Bug c++/107632] New: has_facet does not work with -mlong-double-64

2022-11-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107632

Bug ID: 107632
   Summary: has_facet does not work with -mlong-double-64
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

#include 
#include 

using namespace std;

int main(int argc, char *argv[]) {

  locale oGlobalLocale;

  if (!has_facet< num_get > >
>( oGlobalLocale ))
__builtin_abort ();
}

g++ t.cpp -o t && ./t   -> works as expected
g++ t.cpp -o t -mlong-double-64 && ./t   -> aborts

[Bug tree-optimization/107372] Loop distribution create memcpy between structs with different storage order

2022-10-24 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107372

--- Comment #1 from Andreas Krebbel  ---
Created attachment 53764
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53764=edit
Experimental Fix

Looks like the error while analyzing the data ref is not propagated to the
upper layers to actually prevent the optimization. This patch fixes this for
me.

[Bug tree-optimization/107372] New: Loop distribution create memcpy between structs with different storage order

2022-10-24 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107372

Bug ID: 107372
   Summary: Loop distribution create memcpy between structs with
different storage order
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

For t.c with "gcc -O3 t.c":

struct L
{
  unsigned int val[256];
} __attribute__((scalar_storage_order ("little-endian")));

struct B
{
  unsigned int val[256];
} __attribute__((scalar_storage_order ("big-endian")));

void
foo (struct L *restrict l, struct B *restrict b)
{
  int i;
  for (i = 0; i < 256; i++)
l->val[i] = b->val[i];
}


The loop distribution pass currently generates a memcpy although it recognizes
correctly that both sides of the assignment have different storage order:

Analyzing # of iterations of loop 1
  exit condition [255, + , 4294967295] != 0
  bounds on difference of bases: -255 ... -255
  result:
# of iterations 255, bounded by 255
Creating dr for *b_5(D).val[i_11]
analyze_innermost: t.c:16:23: missed: failed: reverse storage order.

...

void foo (struct L * restrict l, struct B * restrict b)
{
  int i;

   [local count: 10737416]:
  __builtin_memcpy (l_6(D), b_5(D), 1024);
  return;

}

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-25 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #22 from Andreas Krebbel  ---
The longer a have been looking at these STRICT_LOW_PART issue the more I think
that STRICT_LOW_PART is an awful way to express what we need:

- the information needed to understand what it is doing is distributed across 3
RTXs (strict_low_part (subreg:mode1 (reg:mode2 xx) OFS))
- the big problems arise since the involved RTXs are separately optimized and
we might end up with partial information without a clear definition of how to
deal with that
- actually it is really hard to handle the RTXs as one unit. Recursively
walking RTXs needs to record whether we are in a STRICT_LOW_PART or not.


I think it might make sense to explore other ways to express this:

1. SUBREG flag - Looks easy, but it would be hard to catch all places which
should care about that flag.

2. Introduce a new RTX code which has a mode and an offset attached but does
not require an additional SUBREG anymore.

3. Since a STRICT_LOW_PART is essentially a bit insertion operation we could
express it always with a ZERO_EXTRACT target operand and get rid of
STRICT_LOW_PART entirely. A ZERO_EXTRACT would be somewhat more cumbersome to
deal with, since it would always require to check the bit width and offset for
all the cases which just use mode boundaries. But at least most passes know how
to deal with them already.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-25 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #21 from Andreas Krebbel  ---
I have committed a patch now which accepts only SUBREGs before reload and then
also REGs to deal with how LRA operates right now.

I've continued a bit with the patch from Comment 18. It bootstraps on s390x and
x86-64. On s390x also the testsuite is clean. However, I see a few failures in
the arch specific tests on x86-64. The cases I looked at so far are the result
of several peepholes and splitters not being triggered anymore. I've fixed most
of them I think but there are also cases where I'm not sure what to do exactly.

In case of a matching constraint between a strict_low_part operand and a normal
operand. Reload now (with the patch from Comment 18) would remove the subreg on
the operand with the matching constraint and would leave it in for the
strict_low_part operand.

(insn 9 8 16 2 (parallel [
(set (strict_low_part (subreg:QI (reg/v:SI 0 ax [orig:86 a ] [86])
0))
(and:QI (reg:QI 0 ax [orig:86 a ] [86])
(reg:SI 4 si [92])))
(clobber (reg:CC 17 flags))
]) "/home/andreas/gcc/gcc/testsuite/gcc.target/i386/pr91188-1a.c":20:10
553 {*andqi_1_slp}
 (nil))

I think this should be addressed separately. Once we solved it I will adjust
the s390x backend again if necessary.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-24 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #18 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #17)
...
> Yes, but that says the high 48 bits of the hardware reg are untouched, which
> is not true (only the high 16 of the low 32 are guaranteed unmodified).

Right, if the original register mode does not match the mode of the full
hardreg, we continue to need that mode as the upper bound. So with the subreg
folding in reload we appear to loose information we need to interpret the
STRICT_LOW_PART correctly.

I'm testing the following patch in combination with my other fix now:

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 4ddbe477d92..9c125a9ce38 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -855,6 +855,7 @@ lra_final_code_change (void)

  for (i = id->insn_static_data->n_operands - 1; i >= 0; i--)
if ((DEBUG_INSN_P (insn) || ! static_id->operand[i].is_operator)
+   && ! static_id->operand[i].strict_low
&& alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
  {
lra_update_dup (id, i);

With that change the SUBREG folding from comment #11 happens later in final
(cleanup_subreg_operands). I'm not sure whether we would have to prevent it
there as well?!

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #16 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #15)
> (In reply to Andreas Krebbel from comment #14)
> > > So you are suggesting that every strict_low_part after reload can just be
> > > removed?  If that is true, should we not just do exactly that then?
> > 
> > I think we have 3 options:
> > (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
> > (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
> > (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 
...
> > (3) E.g. it means that the bits of hardreg x in its hardware mode (the mode
> > for UNITS_PER_WORD) which are not covered by MODE are not touched by the 
> > SET.
> 
> But say you have (strict_low_part (subreg:HI (reg:SI) 0)) and the hardware
> is 64-bit.  That only means the low 32 bits of the reg aren't clobbered, the
> high 32 bits are fair game.  That does not agree with your proposed
> semantics.

In that case I would have expected reload to turn this into 
(strict_low_part (reg:HI xx))
already.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #14 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #13)
> (Sorry I missed this)
> 
> (In reply to Andreas Krebbel from comment #11)
> > I've tried to change our movstrict backend patterns to use a predicate on
> > the dest operand which enforces a subreg. However, since reload strips the
> > subreg away when assigning hard regs we end up with a STRICT_LOW_PART of a
> > reg again. At least after reload something like this should be acceptable -
> > right?
> > 
> > 298r.ira:
> > (insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4))
> > (const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi}
> >  (nil))
> > 
> > 299r.reload:
> > (insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64]))
> > (mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32]))
> > "t.cc":37:17 1485 {movstrictsi}
> >  (nil))
> 
> So you are suggesting that every strict_low_part after reload can just be
> removed?  If that is true, should we not just do exactly that then?

I think we have 3 options:
(1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
(2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
(3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 

(1) For that, all passes after reload must be able to deal with these SUBREGs.
Since SUBREGs are rare after reload it is hard to say how robust that handling
is right now.

(2) Here the question to me is which passes after reload currently do something
with the strict-low-part info. Clearly a non-option if we would loose any
optimizations with that.

(3) E.g. it means that the bits of hardreg x in its hardware mode (the mode for
UNITS_PER_WORD) which are not covered by MODE are not touched by the SET.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-07-19 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #11 from Andreas Krebbel  ---
I've tried to change our movstrict backend patterns to use a predicate on the
dest operand which enforces a subreg. However, since reload strips the subreg
away when assigning hard regs we end up with a STRICT_LOW_PART of a reg again.
At least after reload something like this should be acceptable - right?

298r.ira:
(insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4))
(const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi}
 (nil))

299r.reload:
(insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64]))
(mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32]))
"t.cc":37:17 1485 {movstrictsi}
 (nil))

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-07-14 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #10 from Andreas Krebbel  ---
We generate the movstrict target operand with gen_lowpart. If the operand for
gen_lowpart is already a paradoxical subreg the two subregs cancel each other
out and we end up with a plain reg. I'm testing the following patch right now.
It falls back to a normal move in that case and fixes the testcase:

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 5aaf76a9490..d90ec1a6de1 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -6523,6 +6523,14 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
  rtx low_dest = gen_lowpart (smode, dest);
  rtx low_src = gen_lowpart (smode, src);

+ /* In case two subregs cancelled each other out, do a normal
+move.  */
+ if (!SUBREG_P (low_dest))
+   {
+ emit_move_insn (low_dest, low_src);
+ return true;
+   }
+
  switch (smode)
{
case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src));
return true;

[Bug tree-optimization/105175] [12 Regression] Pointless warning about missed vector optimization

2022-04-06 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105175

--- Comment #2 from Andreas Krebbel  ---
I would expect the vectorizer to only generate vector modes which would fit
into word mode if no hardware vector support is available. E.g. for:

struct {
  unsigned a, b, c, d;
} s;
foo() {
  s.a &= 42;
  s.b &= 42;
  s.c &= 42;
  s.d &= 42;
}

I see two "vector 2 unsigned" operations being generated when compiling with
-mno-sse but with sse I get a 4 element vector as expected.

[Bug rtl-optimization/105175] New: [12 Regression] Pointless warning about missed vector optimization

2022-04-06 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105175

Bug ID: 105175
   Summary: [12 Regression] Pointless warning about missed vector
optimization
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

For this code snippet extracted from Qemu source:

enum { QEMU_MIGRATION_COOKIE_PERSISTENT = 1 };
struct {
  unsigned flags;
  unsigned flagsMandatory
} qemuMigrationCookieGetPersistent_mig;
qemuMigrationCookieGetPersistent() {
  qemuMigrationCookieGetPersistent_mig.flags &=
  QEMU_MIGRATION_COOKIE_PERSISTENT;
  qemuMigrationCookieGetPersistent_mig.flagsMandatory &=
  QEMU_MIGRATION_COOKIE_PERSISTENT;
}

cc1 -O3 -mno-sse t.c -Wvector-operation-performance

gives me:

t.c: In function ‘qemuMigrationCookieGetPersistent’:
t.c:7:46: warning: vector operation will be expanded with a single scalar
operation [-Wvector-operation-performance]
7 |   qemuMigrationCookieGetPersistent_mig.flags &=

The generated code actually looks quite decent. Both integer AND operations are
merged into a 64 bit AND since
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=f31da42e047e8018ca6ad9809273bc7efb6ffcaf

This appears to be a nice optimization to me. However, in tree-vect-generic.cc
we then complain about this being implemented with just a scalar instruction.
Apart from this being pretty confusing for the programmer who never requested
anything to be vectorized I also don't see why it is a bad thing to implement a
vector operation with a scalar operation as long as it is able to cover the
entire vector with that.

With GCC 12 we have auto-vectorization enabled already with -O2, so I expect
this warning to surface much more frequently now. In particular on targets like
s390 where older distros still have to build everything without hardware vector
support this might be annoying. Also I'm not sure whether this warning ever
points at an actual problem. To me it looks like we should just drop it
altogether.

[Bug target/104327] [12 Regression] Inlining error on s390x since r12-1039

2022-02-03 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104327

--- Comment #8 from Andreas Krebbel  ---
I will work on a patch. Thanks for the hint!

I agree for HTM. VX is an ABI switch since it changes the calling conventions
for vector types.

[Bug target/104327] [12 Regression] Inlining error on s390x since r12-1039

2022-02-02 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104327

--- Comment #5 from Andreas Krebbel  ---
Yes, that's the right fix I think. Thanks!
MVCLE is a shorter version of a loop doing MVCs but has some startup overhead.

[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so

2022-01-17 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364

Andreas Krebbel  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

[Bug rtl-optimization/104034] Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x

2022-01-14 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104034

Andreas Krebbel  changed:

   What|Removed |Added

   Last reconfirmed||2022-01-14
 Ever confirmed|0   |1
   Priority|P3  |P2
   Keywords||wrong-code
   Host||s390x
 Status|UNCONFIRMED |NEW

[Bug rtl-optimization/104034] New: Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x

2022-01-14 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104034

Bug ID: 104034
   Summary: Miscompilation of LLVM on s390x with -march=z13
-mtune=z14 in GCC 8.x
   Product: gcc
   Version: 8.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Created attachment 52194
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52194=edit
Testcase

Initial analysis done by Jakub Jelinek as part of:
https://bugzilla.redhat.com/show_bug.cgi?id=2028609

The following testcase is miscompiled on s390x with
g++ -fPIC -fvisibility-inlines-hidden -ffunction-sections -fdata-sections -O2
-fPIC  -fno-exceptions -fno-rtti -std=c++14 -mlong-double-128 -march=z13
-mtune=z14
both with the RHEL gcc 8.x and with upstream 8.5.0.
When miscompiled, it prints something like
__insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0xdeadbeefcafebabe
0xdeadbeefcafebabe
__insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe
rather than
__insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe
__insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe

The interesting part is below, .cfi_* directives removed for brevity.
On entry, this function has 3 pointers in %r2, %r3 and %r4 registers, and
%r5 is pointer to the 16-byte function_ref - object with
trivially copyable class
containing 2 8-byte members.
_ZSt24__merge_sort_with_bufferIPPvS1_N4llvm12function_refIFbS0_S0_vT_S6_T0_T1_:
stmg%r6,%r15,48(%r15)
lgr %r14,%r15
lay %r15,-248(%r15)
aghi%r14,-32
std %f8,0(%r14)
std %f12,8(%r14)
std %f14,16(%r14)
std %f9,24(%r14)
sgrk%r11,%r3,%r2
lgr %r1,%r4
srag%r13,%r11,3
agr %r1,%r11
lmg %r8,%r9,0(%r5)
stmg%r8,%r9,160(%r15)
! The above stores the whole 16-byte function_ref correctly to %r15+160
cgijle  %r11,48,.L13
vlvgp   %v0,%r8,%r9
ldgr%f9,%r1
ldgr%f12,%r4
la  %r1,200(%r15)
lgr %r10,%r3
stg %r11,176(%r15)
ldgr%f8,%r2
lgr %r6,%r9
vlgvg   %r7,%v0,1
stmg%r8,%r9,184(%r15)
! So does the above
lgr %r8,%r1
.L14:
la  %r11,56(%r2)
lgr %r4,%r8
lgr %r3,%r11
stmg%r6,%r7,200(%r15)
! But this one actually stores both 8-byte words the same to %r15+160, and
%r15+200 is passed as %r4 to the function
brasl  
%r14,_ZSt16__insertion_sortIPPvN4llvm12function_refIFbS0_S0_vT_S6_T0_@PLT

In *.postreload, we have still correct:
(insn 16 12 166 2 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69])
(reg:TI 8 %r8)) 1268 {movti}
 (nil))
...
(insn 137 136 140 3 (set (reg/v:TI 6 %r6 [orig:69 __comp ] [69])
(reg/v:TI 16 %f0 [orig:69 __comp ] [69])) 1268 {movti}
 (nil))
The code spills it to 128-bit %f0 register and loads it back from it.
Next, split2 pass splits the latter (but not the former) into:
(insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ])
(reg:DI 16 %f0)) 1269 {*movdi_64}
 (nil))
(insn 168 167 140 3 (set (reg:DI 7 %r7 [orig:69 __comp+8 ] [69])
(unspec:DI [
(reg:V2DI 16 %f0)
(const_int 1 [0x1])
] UNSPEC_VEC_EXTRACT)) 402 {*vec_extractv2di}
 (nil))
and finally cprop_hardreg seeing
(insn 187 188 186 3 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69])
(reg:TI 8 %r8)) 1268 {movti}
 (nil))
changes insn 167 to:
(insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ])
(reg:DI 9 %r9 [16])) 1269 {*movdi_64}
 (nil))
I'm not sure if this is a bug in the
; Split a VR -> GPR TImode move into 2 vector load GR from VR element.
; For the higher order bits we do simply a DImode move while the
; second part is done via vec extract.  Both will end up as vlgvg.
(define_split
  [(set (match_operand:TI 0 "register_operand" "")
(match_operand:TI 1 "register_operand" ""))]
  "TARGET_VX && reload_completed
   && GENERAL_REG_P (operands[0])
   && VECTOR_REG_P (operands[1])"
  [(set (match_dup 2) (match_dup 4))
   (set (match_dup 3) (unspec:DI [(match_dup 5) (const_int 1)]
 UNSPEC_VEC_EXTRACT))]
{
  operands[2] = operand_subword (operands[0], 0, 0, TImode);
  operands[3] = operand_subword (operands[0], 1, 0, TImode);
  operands[4] = gen_rtx_REG (DImode, REGNO (operands[1]));
  operands[5] = gen_rtx_REG (V2DImode, REGNO (operands[1]));
})
splitter, in cprop_hardreg or the s390x representation of those TImodes in
floating point registers.

In GCC 9 it got "fixed" with https://gcc.gnu.org/r9-3763-gef976be1a23a517 but
that just means it went lat

[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so

2021-12-06 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364

--- Comment #22 from Andreas Krebbel  ---
(In reply to Sarah Julia Kriesch from comment #21)
> Did you use a mainframe as a local system?

I did run these commands on a z15 Lpar with Fedora33 installed.

[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so

2021-12-06 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364

--- Comment #20 from Andreas Krebbel  ---
(In reply to Sarah Julia Kriesch from comment #18)
...
> sudo zypper in osc build obs-service-format_spec_file bsdtar #also possible
> with other Linux distributions
> osc co openSUSE:Factory:zSystems/postgresql14
> cd openSUSE\:Factory\:zSystems/postgresql14/
> osc build --vm-type=kvm --vm-memory=4G

Tried with these commands. Build fails due to OOM killer with 4GB and 8GB.
Package builds fine starting with 12GB. In none of the cases I got the ld
error.

[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so

2021-12-03 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364

--- Comment #17 from Andreas Krebbel  ---
(In reply to Sarah Julia Kriesch from comment #12)
> that is happening during the build process in OBS with a really minimal
> openSUSE Tumbleweed. We are using VMs using QEMU and with 4GB of memory.

Why only 4GB? Isn't this way too low for building things like rust with lto and
everything?

I've successfully built rust1.54 and postgresql14 several times in an opensuse
tumbleweed container. So I would suspect either the kernel or the guest setup
you are using. Could it perhaps be that ld processes got oom killed and have
left half-complete binaries which triggered the error then?

In the current logs I don't see the ld issue anymore. Apparently you already
gave it more memory and the behavior changed due to that?

[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so

2021-12-02 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364

--- Comment #15 from Andreas Krebbel  ---
(In reply to Sarah Julia Kriesch from comment #0)
...
> Full PostgreSQL log:
> https://build.opensuse.org/build/openSUSE:Factory:zSystems/standard/s390x/
> postgresql14/_log
> 
> Full Rust log:
> https://build.opensuse.org/build/openSUSE:Factory:zSystems/standard/s390x/
> rust1.54/_log

No access

[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so

2021-12-02 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364

--- Comment #11 from Andreas Krebbel  ---
Could you please provide the steps to reproduce the issue. I just tried real
quick with a container image and couldn't reproduce it.

[Bug target/103028] ICE in extract_constrain_insn, at recog.c:2670

2021-11-04 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103028

--- Comment #3 from Andreas Krebbel  ---
So I think what is needed is something like this:

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 017944f4f79a..1f5b9476ac2e 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -4341,7 +4341,8 @@ find_if_header (basic_block test_bb, int pass)
   && cond_exec_find_if_block (_info))
 goto success;

-  if (targetm.have_trap ()
+  if (!reload_completed
+  && targetm.have_trap ()
   && optab_handler (ctrap_optab, word_mode) != CODE_FOR_nothing
   && find_cond_trap (test_bb, then_edge, else_edge))
 goto success;

[Bug target/103028] ICE in extract_constrain_insn, at recog.c:2670

2021-11-03 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103028

--- Comment #2 from Andreas Krebbel  ---
IF-convert generates the compare *after* reload. The operands get checked for
validity only by invoking the predicates. That means everything which is
accepted by TARGET_LEGITIMATE_CONSTANT_P is ok for a general_operand. However,
we have several patterns where the union of all constraints would accept less
operands than the predicate assuming that reload is able to sort this out. The
ICE is triggered by emitting a pattern which actually would need to be fixed by
reload.

The problem could easily be avoided by e.g. enforcing operand1 to satisfy the
constraint used in the pattern. However, I'm wondering how this is supposed to
work in general. Couldn't this trigger all sorts of problems? Are we the only
backend relying on LRA sorting out these kind of issues for us?

Btw. I couldn't trigger the problem without -fharden-conditional-branches so
far.

[Bug target/102222] ICE on s390 (internal compiler error: in extract_insn, at recog.c:2770)

2021-09-14 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10

--- Comment #6 from Andreas Krebbel  ---
(insn 9 8 10 2 (set (strict_low_part (reg:SI 66))
(mem/c:SI (plus:SI (reg/f:SI 64)
(const_int 4 [0x4])) [1 read_inode_val+0 S4 A32]))

With -mesa this should be a simple move. However, in that case it apparently is
emitted via insv.

[Bug target/102222] ICE on s390 (internal compiler error: in extract_insn, at recog.c:2770)

2021-09-14 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10

Andreas Krebbel  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |krebbel at gcc dot 
gnu.org

--- Comment #5 from Andreas Krebbel  ---
Created attachment 51461
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51461=edit
Experimental patch

[Bug target/96127] ICE in extract_insn, at recog.c:2294

2021-09-06 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96127

--- Comment #4 from Andreas Krebbel  ---
The testcase does not appear to fail on current GCC 10 branch. So I would just
close it as fixed in GCC 11.

[Bug rtl-optimization/101523] Huge number of combine attempts

2021-07-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #2 from Andreas Krebbel  ---
Created attachment 51174
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51174=edit
Experimental Fix

With that patch the number of combine attempts goes back to normal.

[Bug rtl-optimization/101523] Huge number of combine attempts

2021-07-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #1 from Andreas Krebbel  ---
This appears to be triggered by try_combine unnecessarily setting back the
position by returning the i2 insn.

When 866 is inserted into 973 866 still needs to be kept around for other
users. So try_combine first merges the two sets into a parallel and immediately
notices that this can't be recognized. Because none of the sets is a trivial
move it is split again into two separate insns. Although the new i2 pattern
exactly matches the input i2 combine considers this to be a new insn and
triggers all the scanning log link creation and eventually returns it what
let's the combine start all over at 866.

Due to that combine tries many of the substitutions more than 400x.

Trying 866 -> 973:
  866: r22393:DI=r22391:DI+r22392:DI
  973: r22499:DF=r22498:DF*[r22393:DI]
  REG_DEAD r22498:DF
Failed to match this instruction:
(parallel [
(set (reg:DF 22499)
(mult:DF (reg:DF 22498)
(mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
(reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
(set (reg/f:DI 22393 [ _85087 ])
(plus:DI (reg/f:DI 22391 [ _85085 ])
(reg:DI 22392 [ _85086 ])))
])
Failed to match this instruction:
(parallel [
(set (reg:DF 22499)
(mult:DF (reg:DF 22498)
(mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
(reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
(set (reg/f:DI 22393 [ _85087 ])
(plus:DI (reg/f:DI 22391 [ _85085 ])
(reg:DI 22392 [ _85086 ])))
])
Successfully matched this instruction:
(set (reg/f:DI 22393 [ _85087 ])
(plus:DI (reg/f:DI 22391 [ _85085 ])
(reg:DI 22392 [ _85086 ])))
Successfully matched this instruction:
(set (reg:DF 22499)
(mult:DF (reg:DF 22498)
(mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
(reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
allowing combination of insns 866 and 973
original costs 4 + 4 = 8
replacement costs 4 + 4 = 8
modifying insn i2   866: r22393:DI=r22391:DI+r22392:DI
deferring rescan insn with uid = 866.
modifying insn i3   973: r22499:DF=r22498:DF*[r22391:DI+r22392:DI]
  REG_DEAD r22498:DF
deferring rescan insn with uid = 973.

[Bug rtl-optimization/101523] New: Huge number of combine attempts

2021-07-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Bug ID: 101523
   Summary: Huge number of combine attempts
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Compiling the attached testcase on s390x with:

cc1plus -fpreprocessed t.ii -quiet -march=z196 -g -O2 -std=c++11

produces a huge amount of combine attempts compared to x86 consuming more than
11GB of memory:

x86:  27264 combine attempts for 170631 insns
s390x: 40009540 combine attempts for 164327 insns

gcc g:6d4da4aeef5b20f7f9693ddc27d26740d0dbe36c

[Bug target/86681] ICE in extract_insn, at recog.c:2304 on s390x

2021-07-16 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86681

--- Comment #6 from Andreas Krebbel  ---
Do you have the command line for the tattr-1.c test? The verbose options line
appears to contain the options for a different test. I could not reproduce the
problem with these options.

[Bug rtl-optimization/101426] Wrong code redirecting IPA thunk parms to tail-call

2021-07-12 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101426

--- Comment #1 from Andreas Krebbel  ---
Created attachment 51136
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51136=edit
Experimental Fix

With this patch the address is copied to a pseudo first. That way the register
allocator will sort out the dependencies resulting in the following code being
generated:

lgr %r2,%r3
lgr %r3,%r4
lgr %r4,%r5
jg 
_ZN1r6NSPACE6AShrOp5buildERNS_9OpBuilderERNS_14OperationStateENS_10ValueRangeEN6nspace8ArrayRefISt4pairINS_10IdentifierENS_9Attribute.constprop.0

[Bug rtl-optimization/101426] New: Wrong code redirecting IPA thunk parms to tail-call

2021-07-12 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101426

Bug ID: 101426
   Summary: Wrong code redirecting IPA thunk parms to tail-call
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Created attachment 51135
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51135=edit
Testcase

Building the attached testcase with:

cc1plus -fpreprocessed t.cc -quiet -m64 -mzarch -O2

Produces wrong code with GCC commits up to g:9725df0233b

A specialized clone of AShrOp::build without the first parameter is created.

The calls in the foo* functions to the clone have to shift the function
parameters one hard reg down to fit the signature of the clone:

void r::NSPACE::foo1 (struct OpBuilder & D.2900, struct OperationState & state,
struct ValueRange operands, struct ArrayRef attributes)
{
   [local count: 1073741824]:
  r::NSPACE::AShrOp::build.constprop (state_3(D), operands, attributes); [tail
call]
  return;

}

The generated code overwrites the 2. and the 3. parameter with the 4. of the
caller:

lgr %r2,%r3
lgr %r4,%r5
lgr %r3,%r5
jg 
_ZN1r6NSPACE6AShrOp5buildERNS_9OpBuilderERNS_14OperationStateENS_10ValueRangeEN6nspace8ArrayRefISt4pairINS_10IdentifierENS_9Attribute.constprop.0

The problem does not occur on head and 10.3 after this commit g:defafb78cbc
With this the parameters are always copied.

The fix was done for PR90448 to fix an ICE triggered when building and address
operand based on the DECL_RTL of the parameter which wasn't addressable at that
point. I think the situation is a bit different here. The code wires up the
incoming hardregs with the callee parms without considering that the resulting
moves might affect each other.

[Bug middle-end/100908] asan clobberes register asm variables

2021-06-04 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100908

--- Comment #1 from Andreas Krebbel  ---
https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html

[Bug middle-end/100908] New: asan clobberes register asm variables

2021-06-04 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100908

Bug ID: 100908
   Summary: asan clobberes register asm variables
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Created attachment 50933
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50933=edit
Testcase

Compiling the testcase with either:
gcc -O3 t1.c -o t -fsanitize=address --param
asan-instrumentation-with-call-threshold=0
or
gcc -O3 t1.c -o t -fsanitize=kernel-address -lasan

aborts because dereferencing y triggers the address sanitizer to
introduce a function call.

That a function call might clobber registers assigned with register asm
is a documented limitation of the register asm construct:
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html

However, in combination with the address sanitizer this becomes even
less obvious making even the most experienced kernel developers trip
over it:
https://lkml.org/lkml/2020/10/23/908

For IBM Z quite a few cases like this have been reported to me. Here just
one I could find quickly:
https://lore.kernel.org/patchwork/patch/1413907/


Btw. clang appears to handle this more gracefully and preserves the
value of the variable around function calls. The attached testcase
works fine with clang.


I think it would be much better to find a solution which allows to
directly name hard registers as inline assembly constraints.  I'll
post an RFC on the mailing list.

[Bug c++/100281] ICE with SImode pointer assignment in C++

2021-04-27 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100281

Andreas Krebbel  changed:

   What|Removed |Added

  Attachment #50685|0   |1
is obsolete||

--- Comment #5 from Andreas Krebbel  ---
Created attachment 50689
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50689=edit
Fixed patch with testcase

[Bug c++/100281] ICE with SImode pointer assignment in C++

2021-04-27 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100281

--- Comment #3 from Andreas Krebbel  ---
This is a hard requirement for the z/TPF operating system supported as part of
our IBM Z backend. It happens to work for many years already and they make
extensive use of it.

[Bug c++/100281] New: ICE with SImode pointer assignment in C++

2021-04-27 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100281

Bug ID: 100281
   Summary: ICE with SImode pointer assignment in C++
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Created attachment 50685
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50685=edit
Experimental Fix

typedef void * __attribute__((mode (SI))) __ptr32_t;

void foo(){
  unsigned int b = 100;
  __ptr32_t a;
  a = b;
}

Building with "cc1plus t.cpp" ICEs on s390x:

 void foo()
in strip_typedefs, at cp/tree.c:1770
6 |   a = b;
  |   ^
0x156f731 strip_typedefs(tree_node*, bool*, unsigned int)
/home2/andreas/build/../gcc/gcc/cp/tree.c:1770
0x135c827 type_to_string
/home2/andreas/build/../gcc/gcc/cp/error.c:3298
0x136c723 cxx_format_postprocessor::handle(pretty_printer*)
/home2/andreas/build/../gcc/gcc/cp/error.c:4242
0x291f171 pp_format(pretty_printer*, text_info*)
/home2/andreas/build/../gcc/gcc/pretty-print.c:1496
0x28ffecb diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
/home2/andreas/build/../gcc/gcc/diagnostic.c:1244
0x2902cef diagnostic_impl
/home2/andreas/build/../gcc/gcc/diagnostic.c:1406
0x2902cef permerror(rich_location*, char const*, ...)
/home2/andreas/build/../gcc/gcc/diagnostic.c:1688
0x12441f7 convert_like_internal
/home2/andreas/build/../gcc/gcc/cp/call.c:7581
0x12460e1 convert_like
/home2/andreas/build/../gcc/gcc/cp/call.c:8114
0x12463b3 convert_like
/home2/andreas/build/../gcc/gcc/cp/call.c:8126
0x12463b3 perform_implicit_conversion_flags(tree_node*, tree_node*, int, int)
/home2/andreas/build/../gcc/gcc/cp/call.c:12303
0x1599687 cp_build_modify_expr(unsigned int, tree_node*, tree_code, tree_node*,
int)
/home2/andreas/build/../gcc/gcc/cp/typeck.c:8887
0x159b66d build_x_modify_expr(unsigned int, tree_node*, tree_code, tree_node*,
int)
/home2/andreas/build/../gcc/gcc/cp/typeck.c:8978
0x1435d8d cp_parser_assignment_expression
/home2/andreas/build/../gcc/gcc/cp/parser.c:10184
0x1437661 cp_parser_expression
/home2/andreas/build/../gcc/gcc/cp/parser.c:10313
0x143b5c1 cp_parser_expression_statement
/home2/andreas/build/../gcc/gcc/cp/parser.c:12041
0x1449a71 cp_parser_statement
/home2/andreas/build/../gcc/gcc/cp/parser.c:11837
0x144bac7 cp_parser_statement_seq_opt
/home2/andreas/build/../gcc/gcc/cp/parser.c:12189
0x144bbc7 cp_parser_compound_statement
/home2/andreas/build/../gcc/gcc/cp/parser.c:12138
0x146ef03 cp_parser_function_body
/home2/andreas/build/../gcc/gcc/cp/parser.c:24080
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


The problem appears to be triggered by two locations in the front-end where
non-POINTER_SIZE pointers aren't handled right now.

1. An assertion in strip_typedefs is triggered because the alignment of the
types don't match. This in turn is caused by creating the new type with
build_pointer_type instead of taking the type of the original pointer into
account.
2. An assertion in cp_convert_to_pointer is triggered which expects the target
type to always have POINTER_SIZE.

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #5 from Andreas Krebbel  ---
Created attachment 50132
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50132=edit
RTL dump from store motion pass

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #4 from Andreas Krebbel  ---
The update of global variable c is moved out of the loop. Due to that c stays
at 8 although it should be counted down to 2.

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #3 from Andreas Krebbel  ---
Created attachment 50131
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50131=edit
RTL GCSE dump without -fgcse-sm

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #2 from Andreas Krebbel  ---
Created attachment 50130
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50130=edit
RTL GCSE dump with -fgcse-sm

[Bug rtl-optimization/98973] New: [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

Bug ID: 98973
   Summary: [11 regression] Wrong code with gcse store motion pass
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

This test aborts when compiled on IBM Z with:
gcc -O3 t.c -o t -fgcse-sm

it succeeds with -O2 or without -fgcse-sm

Tested with Commit ID: 072f20c5559
It works with GCC 10 branch: eb15f761bc7

long a;
int b, c;
short d;
int e[] = { 1, 1, 0, 1, 1, 1, 1, 1, 1, 1 };

void
f ()
{
g:
  c = 9;
  for (; c >= 3; c--)
{
  int h[5];
  for (; d; d--)
;
  for (; a;)
if (e[c])
  b = h[4];
  if (e[c])
continue;
  goto g;
}
}

int
main ()
{
  f ();
  if (c != 2)
__builtin_abort();
}

[Bug inline-asm/98847] Miscompilation with c++17, templates, and register keyword

2021-01-28 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98847

Andreas Krebbel  changed:

   What|Removed |Added

 CC||krebbel at gcc dot gnu.org

--- Comment #6 from Andreas Krebbel  ---
Thanks for fixing this. When I had a look at it in 2015 I found that template
instantiation explicitly zeroes out the asm name. Solution for me was to
prevent that for hard reg decls. Not sure what approach is preferable here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33661#c13

[Bug tree-optimization/98736] Wrong partition order generated in loop distribution pass

2021-01-18 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736

Andreas Krebbel  changed:

   What|Removed |Added

   Keywords||wrong-code
   Priority|P3  |P2
 Target||s390x

[Bug tree-optimization/98736] New: Wrong partition order generated in loop distribution pass

2021-01-18 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736

Bug ID: 98736
   Summary: Wrong partition order generated in loop distribution
pass
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

int a[6];
char b, c;
int main() {
  int d[4] = {0, 0, 0, 0};
  for (c = 0; c <= 5; c++) {
for (b = 2; b != 0; b++)
  a[c] = 8;
a[c] = d[3];
  }
  if (a[0] != 0)
__builtin_abort();
}

Aborts when compiled with:
gcc -Os -march=z13 t.c -o t

Succeeds with:
gcc -O3 -march=z13 t.c -o t

The outer loop is recognized as clearmem. Unfortunately it is generated before
the inner loop body.

[Bug target/98550] [11 Regression] ICE in exact_div, at poly-int.h:2219 on s390x-linux-gnu

2021-01-12 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98550

Andreas Krebbel  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #5 from Andreas Krebbel  ---
With the patch the sign extension of 6 shift count operands from int to long
int is now marked as vect_external_def. This makes the vectype field in the slp
node to be bumped from "vector 2 int" to a "vector 4 int" in:
vectorizable_conversion->vect_maybe_update_slp_op_vectype

This then triggers the ICE when trying to divide vf*group_size (which is 1*6
here) by the number of elements in the vector type (now 4) in
vect_slp_analyze_node_operations.

Is changing the vectype field of an slp node to a type with a different number
of elements actually valid?


slp1:


  bb$dh_5 = D.4123.dh;
  _10 = MEM[(int *)bb$dh_5];
  pretmp_62 = a.cp[1];
  pretmp_79 = a.cp[2];
  pretmp_31 = a.cp[3];
  pretmp_39 = a.cp[4];
  pretmp_16 = a.cp[5];
  pretmp_19 = a.cp[6];
  goto ; [100.00%]

   [local count: 1014686041]:
  _20 = prephitmp_78 >> _10;
  a.cp[1] = _20;
  _22 = prephitmp_80 >> _10;
  a.cp[2] = _22;
  _24 = prephitmp_32 >> _10;
  a.cp[3] = _24;
  _26 = prephitmp_40 >> _10;
  a.cp[4] = _26;
  _28 = prephitmp_17 >> _10;
  a.cp[5] = _28;
  _30 = prephitmp_11 >> _10;
  a.cp[6] = _30;
  cn ={v} {CLOBBER};

[Bug target/98550] [11 Regression] ICE in exact_div, at poly-int.h:2219 on s390x-linux-gnu

2021-01-12 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98550

--- Comment #4 from Andreas Krebbel  ---
The problem occurs starting with:

commit 1e1e1edf88a7c40ae4ae0de9e6077179e13ccf6d
Author: Richard Biener 
Date:   Thu Oct 29 08:48:15 2020 +0100

More BB vectorization tweaks

This tweaks the op build from splats to allow loads marked as not
vectorizable.  It also amends some dump prints with the address of
the SLP node or the instance to better be able to debug things.

2020-10-29  Richard Biener  

* tree-vect-slp.c (vect_build_slp_tree_2): Allow splatting
not vectorizable loads.
(vect_build_slp_instance): Amend dumping with address.
(vect_slp_convert_to_external): Likewise.

* gcc.dg/vect/bb-slp-pr65935.c: Adjust.

[Bug target/98550] [11 Regression] ICE in exact_div, at poly-int.h:2219 on s390x-linux-gnu

2021-01-12 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98550

--- Comment #3 from Andreas Krebbel  ---
Created attachment 49944
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49944=edit
Reduced testcase

This testcase fails on bcb3065b2ba with
cc1plus t.cpp -march=z13 -O3

[Bug rtl-optimization/78559] [7 Regression] wrong code due to tree if-conversion?

2020-12-15 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78559

Andreas Krebbel  changed:

   What|Removed |Added

 CC||stli at linux dot ibm.com

--- Comment #15 from Andreas Krebbel  ---
*** Bug 98269 has been marked as a duplicate of this bug. ***

[Bug c/98269] gcc 6.5.0 __builtin_add_overflow() with small uint32_t values incorrectly detects overflow

2020-12-15 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98269

Andreas Krebbel  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE
 CC||krebbel at gcc dot gnu.org

--- Comment #4 from Andreas Krebbel  ---
The problem is a CC mode mismatch generated by combine. After splitting the add
insn 135 generates a CCL1mode cc while the conditional jump consumes it as
CCUmode. This leads to the wrong condition code mask being generated in the
end.

(insn 135 56 136 7 (parallel [
(set (reg:CCL1 33 %cc)
(compare:CCL1 (plus:SI (reg:SI 108)
(mem:SI (plus:DI (reg:DI 88 [ ivtmp.10 ])
(const_int 12 [0xc])) [3 MEM[base: previous_25,
offset: 12B]+0 S4 A32]))
(reg:SI 108)))
(set (reg:SI 109)
(plus:SI (reg:SI 108)
(mem:SI (plus:DI (reg:DI 88 [ ivtmp.10 ])
(const_int 12 [0xc])) [3 MEM[base: previous_25,
offset: 12B]+0 S4 A32])))
]) t.c:31 1358 {*addsi3_carry1_cc}
 (expr_list:REG_DEAD (reg:SI 108)
(nil)))
(note 136 135 64 7 NOTE_INSN_DELETED)
(insn 64 136 65 7 (set (mem:SI (plus:DI (reg:DI 88 [ ivtmp.10 ])
(const_int 28 [0x1c])) [3 MEM[base: previous_25, offset: 28B]+0
S4 A32])
(reg:SI 109)) t.c:31 1077 {*movsi_zarch}
 (nil))
(note 65 64 66 7 NOTE_INSN_DELETED)
(jump_insn 66 65 67 7 (set (pc)
(if_then_else (geu (reg:CCU 33 %cc)
(const_int 0 [0]))
(label_ref 78)
(pc))) t.c:31 1661 {*cjump_64}
 (int_list:REG_BR_PROB 9500 (expr_list:REG_DEAD (reg:CCZ 33 %cc)
(nil)))


The failure disappears with:

commit bf7499197fbb065123257c374064f6bb715c951b
Author: Dominik Vogt 
Date:   Mon Jul 4 14:25:22 2016 +

S/390: Add support for z13 instructions lochi and locghi.

The attached patch adds patterns to make use of the z13 LOCHI and
LOCGHI instructions.
...


But that one only hides the problem. The mere presence of the lochi
alternatives lead to different RTL being emitted (although the alternative is
not enabled for -march=z196). The split then doesn't happen anymore.

Reverting the patch and continue bisecting. The failure finally disappears
with:

3f54004b095d1cd513e63753ee0f8f9f13698347 is the first bad commit
commit 3f54004b095d1cd513e63753ee0f8f9f13698347
Author: Bin Cheng 
Date:   Fri Jan 27 14:42:23 2017 +

re PR rtl-optimization/78559 (wrong code due to tree if-conversion?)

PR rtl-optimization/78559
* combine.c (try_combine): Discard REG_EQUAL and REG_EQUIV for
other_insn in combine.


This looks like the actual fix to me. The wrong CC mode survives as part of a
REG_EQUAL note:

Successfully matched this instruction:
(set (reg:SI 93 [ _27+4 ])
(if_then_else:SI (geu (reg:CCL1 33 %cc)
(const_int 0 [0]))
(reg:SI 93 [ _27+4 ])
(reg:SI 118)))
allowing combination of insns 56 and 135
original costs 4 + 4 = 16
replacement cost 8
deferring deletion of insn with uid = 56.
modifying other_insn   136: r93:SI={(geu(%cc:CCL1,0))?r93:SI:r118:SI}
  REG_DEAD %cc:CCU
  REG_EQUAL ltu(%cc:CCU,0)
deferring rescan insn with uid = 136.
modifying insn i3   135:
{%cc:CCL1=cmp(r108:SI+[r88:DI+0xc],r108:SI);r109:SI=r108:SI+[r88:DI+0xc];}
  REG_DEAD r108:SI
deferring rescan insn with uid = 135.

So we probably should mark it as duplicate of PR78559.

*** This bug has been marked as a duplicate of bug 78559 ***

[Bug tree-optimization/98221] [10/11 regression] Wrong unpack operation emitted in tree-ssa-forwprop.c

2020-12-10 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98221

--- Comment #3 from Andreas Krebbel  ---
tree-vect-loop-manip.c: vect_maybe_permute_loop_masks also emits
VEC_UNPACKS_HI/LO dependent on BYTES_BIG_ENDIAN.

What is the expectation wrt the meaning of hi/lo in RTL standard names? I
couldn't find it clearly documented for this either. Well, for things like
'smulm3_highpart' we say it is about the 'most significant half' but I don't
see anything for the vector hi/lo.

[Bug tree-optimization/98221] [11 regression] Wrong unpack operation emitted in tree-ssa-forwprop.c

2020-12-10 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98221

Andreas Krebbel  changed:

   What|Removed |Added

   Priority|P3  |P2
   Keywords||wrong-code
 Target||s390x

[Bug tree-optimization/98221] New: [11 regression] Wrong unpack operation emitted in tree-ssa-forwprop.c

2020-12-10 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98221

Bug ID: 98221
   Summary: [11 regression] Wrong unpack operation emitted in
tree-ssa-forwprop.c
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49728
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49728=edit
Fix

The vec-abi-varargs-1.c testcase on IBM Z currently fails.

While adding an SI mode vector to a DI mode vector the first is unpacked using:

  _28 = BIT_INSERT_EXPR <{ 0, 0, 0, 0 }, _2, 0>;
  _34 = [vec_unpack_lo_expr] _28;

However, on big endian targets lo refers to the right hand side of the vector -
in this case the zeroes.


This appears to be triggered with that patch:


commit 78307657cf9675bc4aa2e77561c823834714b4c8 
Author: Richard Biener   
Date:   Thu Nov 28 12:22:04 2019 +  

re PR tree-optimization/92645 (Hand written vector code is 450 times slower
when compiled with GCC compared to Clang)   

2019-11-28  Richard Biener   

PR tree-optimization/92645  
* tree-ssa-forwprop.c (get_bit_field_ref_def): Also handle  
conversions inside a mode class.  Remove restriction on 
preserving the element size.
(simplify_vector_constructor): Deal with the above and for  
identity permutes also try using VEC_UNPACK_[FLOAT_]LO_EXPR 
and VEC_PACK_TRUNC_EXPR.

* gcc.target/i386/pr92645-4.c: New testcase.

[Bug target/98124] Z: Load and test LTDBR instruction gets not used for comparison against 0.0

2020-12-03 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98124

--- Comment #2 from Andreas Krebbel  ---
(In reply to Andreas Krebbel from comment #1)
> LTDBR turns SNaNs into QNaNs and that's not supposed to happen in your
> testcase. We emit LTDBR only with -fno-trapping-math

... or if the result of LTDBR isn't used.

[Bug target/98124] Z: Load and test LTDBR instruction gets not used for comparison against 0.0

2020-12-03 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98124

Andreas Krebbel  changed:

   What|Removed |Added

 CC||krebbel at gcc dot gnu.org
 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andreas Krebbel  ---
LTDBR turns SNaNs into QNaNs and that's not supposed to happen in your
testcase. We emit LTDBR only with -fno-trapping-math

[Bug target/97326] [11 Regression] s390: ICE in do_store_flag after 10843f830350

2020-11-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97326

Andreas Krebbel  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Andreas Krebbel  ---
Fixed

[Bug target/97326] [11 Regression] s390: ICE in do_store_flag after 10843f830350

2020-11-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97326

Andreas Krebbel  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #5 from Andreas Krebbel  ---
closing

[Bug middle-end/97326] [11 Regression] s390: ICE in do_store_flag after 10843f830350

2020-11-09 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97326

--- Comment #2 from Andreas Krebbel  ---
Probably my fault. I did forget supporting floats in vec_cmp. I'm testing a
patch.

[Bug target/96456] [10/11 Regression] ICE in expand_insn, at optabs.c:7511 on s390x-linux-gnu

2020-10-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96456

Andreas Krebbel  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #5 from Andreas Krebbel  ---
closing

[Bug target/96456] [10/11 Regression] ICE in expand_insn, at optabs.c:7511 on s390x-linux-gnu

2020-10-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96456

Andreas Krebbel  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Andreas Krebbel  ---
Fixed for trunk and GCC 10

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

Andreas Krebbel  changed:

   What|Removed |Added

   Last reconfirmed||2020-10-21
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #7 from Andreas Krebbel  ---
The vec_cmp* expanders in vx-builtins.md are only supposed to be used for
expanding the builtins. Unfortunately the names appear to collide with the rtx
standard names to some degree. I will try to implement the standard name
patterns and direct builtin expansion to them instead.

[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers

2020-10-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #6 from Andreas Krebbel  ---
Alternatively I could also mark r12 as preserved across function calls for
-fpic in the backend. In fact all the bits we care about are preserved. Since
the register is fixed all the accesses do come from the backend itself.

That's similar to what I was trying with the fixed_regs hack. But I agree that
this might not be correct in general.

The full fix is probably to track the exact parts of partially clobbered regs
which stay live but this would be a major change.

[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers

2020-10-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #4 from Andreas Krebbel  ---
Reading from symbol t uses the GOT pointer in r12. The call then partially
clobbers r12 but does not affect the lower 32 bits where the GOT pointer
resides. So the GOT pointer stays in fact live across the call.

The way this is currently handled in gcse the second access of t reads a
different value than the first and this is wrong I think. This leads to a
disagreement between the pre_delete_map and the insert locations. The later
read of t is removed because it is an anticipated use but no copy of the value
is inserted after the first read of t because the expression is not considered
to be available anymore at the second location. The availability of the entire
expression is broken by the set of r12 at the call insns.

I didn't know how to solve this without being able to keep track of what parts
of hard regs are clobbered.

The idea behind the fixed_reg check is to trust the backend that it does not
emit uninitialized uses of hard regs in the first place.

t.c.250r.cprop1:

(insn 6 3 7 2 (set (reg/f:SI 65)
(mem/u/c:SI (plus:SI (reg:SI 12 %r12)
(const:SI (unspec:SI [
(symbol_ref:SI ("t") [flags 0x6c0]  )
] UNSPEC_GOT))) [0  S4 A8])) "t.c":8:3 1387
{*movsi_zarch}
 (nil))
(insn 7 6 8 2 (set (reg:SI 3 %r3)
(mem/f/c:SI (reg/f:SI 65) [1 t+0 S4 A32])) "t.c":8:3 1387
{*movsi_zarch}
 (expr_list:REG_DEAD (reg/f:SI 65)
(nil)))
(insn 8 7 9 2 (set (reg:SI 2 %r2)
(const_int 1 [0x1])) "t.c":8:3 1387 {*movsi_zarch}
 (nil))
(call_insn 9 8 10 2 (parallel [
(call (mem:QI (const:SI (unspec:SI [
(symbol_ref:SI ("bar") [flags 0x41] 
)
] UNSPEC_PLT)) [0 bar S1 A8])
(const_int 0 [0]))
(clobber (reg:SI 14 %r14))
]) "t.c":8:3 2053 {*brasl}
 (expr_list:REG_DEAD (reg:SI 3 %r3)
(expr_list:REG_DEAD (reg:SI 2 %r2)
(expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41] 
)
(nil
(expr_list (use (reg:SI 12 %r12))
(expr_list:SI (use (reg:SI 2 %r2))
(expr_list:SI (use (reg:SI 3 %r3))
(nil)


...

(insn 13 12 14 4 (set (reg/f:SI 66)
(mem/u/c:SI (plus:SI (reg:SI 12 %r12)
(const:SI (unspec:SI [
(symbol_ref:SI ("t") [flags 0x6c0]  )
] UNSPEC_GOT))) [0  S4 A8])) "t.c":10:5 1387
{*movsi_zarch}
 (nil))

[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers

2020-10-19 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #3 from Andreas Krebbel  ---
Created attachment 49405
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49405=edit
testcase

[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers

2020-10-19 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #1 from Andreas Krebbel  ---
Created attachment 49402
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49402=edit
Proposed fix

With the patch only regs are considered which aren't "fixed" assuming that for
fixed_regs the backend takes care of only actually using the well-defined part
of the hard regs.

[Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers

2020-10-19 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

Bug ID: 97497
   Summary: gcse wrong code generation with partial register
clobbers
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Compiling the attached testcase produces wrong code on IBM Z:

cc1 t.c -m31 -mzarch -march=z900 -O2 -fpic -o t.s

foo:
stm %r11,%r15,44(%r15)
larl%r12,_GLOBAL_OFFSET_TABLE_
lr  %r11,%r2
l   %r1,t@GOT(%r12)
ahi %r15,-96
lhi %r2,1
l   %r3,0(%r1)
brasl   %r14,bar@PLT
ltr %r11,%r11
jne .L8
lhi %r2,1
l   %r3,0 <--- dereference address 0
brasl   %r14,bar@PLT
l   %r4,152(%r15)
lm  %r11,%r15,140(%r15)
br  %r4
.L8:
lhi %r3,1
l   %r2,0 <--- dereference address 0
brasl   %r14,baz@PLT
lhi %r2,1
l   %r3,0
brasl   %r14,bar@PLT
l   %r4,152(%r15)
lm  %r11,%r15,140(%r15)
br  %r4


gcse decides to remove the load from t in the subsequent bbs but does not
generate the load into a temp reg in the first bb leaving the bbs loading from
an uninitialized pseudo.

With -mzarch -m31 we have the GOT pointer marked as partially clobbered. The
loads from t use the GOT pointer explicitly in the RTX. Since this patch r12 is
considered to be fully clobbered by call insns:

commit a4dfaad2e5594d871fe00a1116005e28f95d644e (refs/bisect/bad)
Author: Richard Sandiford 
Date:   Mon Sep 30 16:20:44 2019 +

Remove global call sets: gcse.c

This is another case in which we can conservatively treat partial
kills as full kills.  Again this is in principle a bug fix for
TARGET_HARD_REGNO_CALL_PART_CLOBBERED targets, but in practice
it probably doesn't make a difference.

2019-09-30  Richard Sandiford  

gcc/
* gcse.c: Include function-abi.h.
(compute_hash_table_work): Use insn_callee_abi to get the ABI of
the call insn target.  Invalidate partially call-clobbered
registers as well as fully call-clobbered ones.

From-SVN: r276323

Now the RTX for t which references r12 is considered to be not available
anymore in the later bbs due to r12 being clobbered by the calls. Hence no load
of the original expression is being emitted.

[Bug rtl-optimization/97439] Wrong min value generated for DFP numbers

2020-10-15 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97439

--- Comment #1 from Andreas Krebbel  ---
Created attachment 49375
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49375=edit
Fix

decimal_real_maxval misses to set the sign flag in the REAL_VALUE_TYPE.

[Bug rtl-optimization/97439] New: Wrong min value generated for DFP numbers

2020-10-15 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97439

Bug ID: 97439
   Summary: Wrong min value generated for DFP numbers
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49374
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49374=edit
Testcase

The attached testcase aborts on IBM Z when compiled with:
gcc -O1 t.c -o t

commit id: c1c62aec6751

The x > -Inf comparison is folded by match.pd to x >= DFP128 MIN

However the sign bit is lost when generating DFP128 MIN in decimal_real_maxval

[Bug target/96456] [10/11 Regression] ICE in expand_insn, at optabs.c:7511 on s390x-linux-gnu

2020-08-07 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96456

--- Comment #1 from Andreas Krebbel  ---
Confirmed for current gcc 10 branch. Does not appear to happen on trunk though.
I'll have a look.

  1   2   3   4   5   6   7   >