[Committed] IBM Z: Fix ICE in expand_perm_as_replicate

2024-06-10 Thread Andreas Krebbel via Gcc
The current implementation assumes to always be invoked with register
operands. For memory operands we even have an instruction
though (vlrep). With the patch we try this first and only if it fails
force the input into a register and continue.

vec_splats generation fails for single element 128bit types which are
allowed for vec_splat. This is something to sort out with another
patch I guess.

Bootstrapped and regtested on IBM Z. Committed to mainline. Needs to
be committed to GCC 14 branch as well.

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_as_replicate): Handle memory
operands.
* config/s390/vx-builtins.md (vec_splats): Turn into 
parameterized expander.
(@vec_splats): New expander.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vshuf-mem.C: New test.
---
 gcc/config/s390/s390.cc  | 17 +--
 gcc/config/s390/vx-builtins.md   |  2 +-
 gcc/testsuite/g++.dg/torture/vshuf-mem.C | 27 
 3 files changed, 43 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/vshuf-mem.C

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index fa517bd3e77..ec836ec3cd4 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -17940,7 +17940,8 @@ expand_perm_as_replicate (const struct 
expand_vec_perm_d &d)
   unsigned char i;
   unsigned char elem;
   rtx base = d.op0;
-  rtx insn;
+  rtx insn = NULL_RTX;
+
   /* Needed to silence maybe-uninitialized warning.  */
   gcc_assert (d.nelt > 0);
   elem = d.perm[0];
@@ -17954,7 +17955,19 @@ expand_perm_as_replicate (const struct 
expand_vec_perm_d &d)
  base = d.op1;
  elem -= d.nelt;
}
-  insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem));
+  if (memory_operand (base, d.vmode))
+   {
+ /* Try to use vector load and replicate.  */
+ rtx new_base = adjust_address (base, GET_MODE_INNER (d.vmode),
+elem * GET_MODE_UNIT_SIZE (d.vmode));
+ insn = maybe_gen_vec_splats (d.vmode, d.target, new_base);
+   }
+  if (insn == NULL_RTX)
+   {
+ base = force_reg (d.vmode, base);
+ insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem));
+   }
+
   if (insn == NULL_RTX)
return false;
   emit_insn (insn);
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 93c0d408a43..bb271c09a7d 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -145,7 +145,7 @@
   DONE;
 })
 
-(define_expand "vec_splats"
+(define_expand "@vec_splats"
   [(set (match_operand:VEC_HW  0 "register_operand" "")
(vec_duplicate:VEC_HW (match_operand: 1 "general_operand"  
"")))]
   "TARGET_VX")
diff --git a/gcc/testsuite/g++.dg/torture/vshuf-mem.C 
b/gcc/testsuite/g++.dg/torture/vshuf-mem.C
new file mode 100644
index 000..5f1ebf65665
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/vshuf-mem.C
@@ -0,0 +1,27 @@
+// { dg-options "-std=c++11" }
+// { dg-do run }
+// { dg-additional-options "-march=z14" { target s390*-*-* } }
+
+/* This used to trigger (2024-05-28) the vectorize_vec_perm_const
+   backend hook to be invoked with a MEM source operand.  Extracted
+   from onnxruntime's mlas library.  */
+
+typedef float V4SF __attribute__((vector_size (16)));
+typedef int V4SI __attribute__((vector_size (16)));
+
+template < unsigned I0, unsigned I1, unsigned I2, unsigned I3 > V4SF
+MlasShuffleFloat32x4 (V4SF Vector)
+{
+  return __builtin_shuffle (Vector, Vector, V4SI{I0, I1, I2, I3});
+}
+
+int
+main ()
+{
+  V4SF f = { 1.0f, 2.0f, 3.0f, 4.0f };
+  if (MlasShuffleFloat32x4 < 1, 1, 1, 1 > (f)[3] != 2.0f)
+__builtin_abort ();
+  if (MlasShuffleFloat32x4 < 3, 3, 3, 3 > (f)[1] != 4.0f)
+__builtin_abort ();
+  return 0;
+}
-- 
2.45.1



Re: CFI for saved argument registers

2022-05-31 Thread Andreas Krebbel via Gcc
On 5/16/22 08:29, Andreas Krebbel via Gcc wrote:
> Hi,
> 
> I'm trying to provide a simple dwarf unwinder with access to the
> argument register content. The goal is to make this information
> available for optimized code without having to access debug
> information for things like call site args. The extra overhead
> of saving the values to the stack is acceptable in that case.
> 
> For that purpose I save the argument registers to the stack as we
> would do for a variable argument lists. But this time I also provide
> the CFI to allow the unwinder to locate the save slots.  Since I never
> actually intend to restore the content there is no matching
> cfi_restore for the cfi_offset and dwarf2cfi complains about the
> traces being inconsistent because of that. I couldn't find a way to
> prevent this.
> 
> The only way I see right now is adding a new reg note to invalidate
> the save information in the reg_save array in dwarf2cfi.
> 
> Would this be acceptable? Is there perhaps an easier way to achieve that?

Attached is a small patch adding a new reg note REG_CFA_NORESTORE.

I would attach that new note to the register save insn to indicate that this 
register will not be
restored. I'll have to emit the REG_CFA_OFFSET note as well then.

An insn saving r2-r6 without restoring them would look like this then:

(insn/f 31 21 32 2 (parallel [
(set/f (mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 16 [0x10])) [4  S8 A8])
(reg:DI 2 %r2))
(set/f (mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 24 [0x18])) [4  S8 A8])
(reg:DI 3 %r3))
(set/f (mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 32 [0x20])) [4  S8 A8])
(reg:DI 4 %r4))
(set/f (mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 40 [0x28])) [4  S8 A8])
(reg:DI 5 %r5))
(set/f (mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 48 [0x30])) [4  S8 A8])
(reg:DI 6 %r6))
]) "/home/andreas/preserveargs/reduce2/t.c":2:51 -1
 (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 48 [0x30])) [4  S8 A8])
(reg:DI 6 %r6))
(expr_list:REG_CFA_NO_RESTORE (reg:DI 6 %r6)
(expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 40 [0x28])) [4  S8 A8])
(reg:DI 5 %r5))
(expr_list:REG_CFA_NO_RESTORE (reg:DI 5 %r5)
(expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 
15 %r15)
(const_int 32 [0x20])) [4  S8 A8])
(reg:DI 4 %r4))
(expr_list:REG_CFA_NO_RESTORE (reg:DI 4 %r4)
(expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI 
(reg/f:DI 15 %r15)
(const_int 24 [0x18])) [4  S8 A8])
(reg:DI 3 %r3))
(expr_list:REG_CFA_NO_RESTORE (reg:DI 3 %r3)
(expr_list:REG_CFA_OFFSET (set (mem/c:DI 
(plus:DI (reg/f:DI 15 %r15)
(const_int 16 [0x10])) [4  
S8 A8])
(reg:DI 2 %r2))
(expr_list:REG_CFA_NO_RESTORE (reg:DI 2 
%r2)
(nil

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 362ff3fdac27..2cbc2465c3a7 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1346,7 +1346,7 @@ dwarf2out_frame_debug_cfa_val_expression (rtx set)
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */

 static void
-dwarf2out_frame_debug_cfa_restore (rtx reg)
+dwarf2out_frame_debug_cfa_restore (rtx reg, bool emit_cfi)
 {
   gcc_assert (REG_P (reg));

@@ -1354,7 +1354,8 @@ dwarf2out_frame_debug_cfa_restore (rtx reg)
   if (!span)
 {
   unsigned int regno = dwf_regno (reg);
-  add_cfi_restore (regno);
+  if (emit_cfi)
+   add_cfi_restore (regno);
   update_row_reg_save (cur_row, regno, NULL);
 }
   else
@@ -1369,7 +1370,8 @@ dwarf2out_frame_debug_cfa_restore (rtx reg)
  reg = XVECEXP (span, 0, par_index);
  gcc_assert (REG_P (reg));
  unsigned int regno = dwf_regno (reg);
- add_cfi_restore (regno);
+ if (emit_cfi)
+   add_cfi_restore (regno);
  update_row_reg_save (cur_row, regno, NULL);
}
 }
@@ -2155,6 +2157,7 @@ dwarf2out_frame_debug (rtx_insn *insn)
break;

   case REG_CFA_RESTORE:
+  case REG_CFA_NO_RESTORE:
n = XEXP (note, 0);
if (n == NULL)
  {
@@ -2163,7 +2166,7 @@ dwarf2out_fram

Re: CFI for saved argument registers

2022-05-16 Thread Andreas Krebbel via Gcc
On 5/16/22 16:39, Andreas Schwab wrote:
> On Mai 16 2022, Andreas Krebbel via Gcc wrote:
> 
>> The only way I see right now is adding a new reg note to invalidate
>> the save information in the reg_save array in dwarf2cfi.
>>
>> Would this be acceptable? Is there perhaps an easier way to achieve that?
> 
> Doesn't it work to use .cfi_remember_state/.cfi_restore_state?
> 
This might work but I think also for that I would have to implement means to 
trigger it explicitely
with an RTX somehow.

Also I rather would avoid emitting special CFI for that purpose. I think the 
CFI I currently have
(with more .cfi_offset's than .cfi_restore's) is valid. I only have to convince 
the validation step
in dwarf2cfi to accept this. Therefore the idea was to introduce a new regnote 
to say that at a
certain point it is intentional that there is no .cfi_restore for a register. 
The action in
dwarf2cfi should only be to remove the reg from reg_save and be done with it.

Bye,

Andreas



CFI for saved argument registers

2022-05-15 Thread Andreas Krebbel via Gcc
Hi,

I'm trying to provide a simple dwarf unwinder with access to the
argument register content. The goal is to make this information
available for optimized code without having to access debug
information for things like call site args. The extra overhead
of saving the values to the stack is acceptable in that case.

For that purpose I save the argument registers to the stack as we
would do for a variable argument lists. But this time I also provide
the CFI to allow the unwinder to locate the save slots.  Since I never
actually intend to restore the content there is no matching
cfi_restore for the cfi_offset and dwarf2cfi complains about the
traces being inconsistent because of that. I couldn't find a way to
prevent this.

The only way I see right now is adding a new reg note to invalidate
the save information in the reg_save array in dwarf2cfi.

Would this be acceptable? Is there perhaps an easier way to achieve that?

Bye,

Andreas


Re: GCC 11.2.1 Status Report (2022-04-13), branch frozen for release

2022-04-14 Thread Andreas Krebbel via Gcc
On 4/13/22 09:30, Richard Biener via Gcc wrote:
> 
> Status
> ==
> 
> The gcc-11 branch is now frozen in preparation for a GCC 11.3 release
> candidate and the GCC 11.3 release next week.  All changes now require
> release manager approval.

Hi,

I would like to push:

https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593103.html

to GCC 11 branch before 11.3 release. Ok?

Bye,

Andreas


Re: Urgent GCC ABI backend maintainer ping re zero width bitfield passing (PR102024)

2022-03-21 Thread Andreas Krebbel via Gcc
On 3/21/22 17:28, Jakub Jelinek wrote:
> Hi!
> 
> I'd like to ping port maintainers about
> https://gcc.gnu.org/PR102024
> 
> As I wrote, the int : 0 bitfields are present early in the TYPE_FIELDS
> during structure layout and intentionally affect the layout.
> We had some code to remove those from TYPE_FIELDS chains in the C and C++
> FEs, but for C that removal never worked correctly (never removed any)
> and the non-working removal was eventually removed.  For C++ it
> didn't initially work either, but for GCC 4.5 that was fixed in PR42217,
> so on various backends where TYPE_FIELDS are analyzed for how to pass or
> return certain aggregates starting with GCC 4.5 the C++ and C ABI diverged.
> In August, I have removed that zero width bitfield removal from C++ FE
> as the FE needs to take those bitfields into account later on as well.
> 
> The x86_64 backend was changed in r12-6418-g3159da6c46 to match recently
> approved clarification of the x86-64 psABI and the zero width bitfields
> are now ignored for both C and C++ (so an ABI change for C from 11.x and
> earlier to 12.x and for C++ from GCC 4.4 and earlier to 4.5 and later)
> with a -Wpsabi diagnostics about it.
> 
> The rs6000 backend was changed in r12-3843-g16e3d6b8b2 to never ignore
> those bitfields (so no ABI change for C, for C++ ...-4.4 and 12+ are
> ABI incompatible with 4.5 through 11.x; note, it affects I think just
> ppc64le ABI, which didn't really exist before 4.8 I think) and diagnostics
> has been added about the ABI change.
> 
> As I wrote in the PR, I believe most of the GCC backends are unaffected,
> x86_64 and rs6000 are handled, riscv was changed already in GCC 10 to
> ignore those bitfields and emit a -Wpsabi diagnostics.
> 
> I can see code-generation differences certainly on armv7hl and aarch64.
> ia64, iq2000, mips, s390 and sparc are maybe affected, haven't checked.

On s390 we walk the field chain to figure out whether it is a struct with a 
single floating point
field. Consequently I see different code being generated by g++ for { float a; 
int :0; } which is
passed in a floating point register with g++ 11 and in memory with g++ 12.

I'm not sure what is best for our target. I'll try to come back with an answer 
soon.

Bye,

Andreas

> 
> Simple testcase could be e.g.:
> struct S { float a; int : 0; float b; };
> 
> __attribute__((noipa)) struct S
> foo (struct S x)
> {
>   return x;
> }
> 
> void
> bar (void)
> {
>   struct S s = { 0.0f, 0.0f };
>   foo (s);
> }
> where one should look at the argument and return value passing
> in GCC 11 C, GCC 11 C++, GCC trunk C, GCC trunk C++.
> 
> The FE now sets bits on the bitfields that make it possible to
> differentiate between the different cases, so each port may decide to do
> one of the 3 things:
> 1) keep ABI exactly compatible between GCC 11 and 12, which means
>C and C++ will continue to be incompatible
> 2) keep the G++ 4.5 through 11 ABI of ignoring zero width bitfields and
>change C ABI
> 3) keep the GCC < 11 C ABI of not ignoring zero width bitfields and
>change the C++ ABI (which means restoring ABI compatibility in
>this regard between G++ 4.4 and earlier with G++ 12 and later)
> Furthermore, it would be very nice to emit -Wpsabi diagnostics for the
> changed ABI unless 1) is decided.
> One should take into account psABI as well as what other compilers do.
> 
> The current state of GCC trunk is that 3) is done except that x86_64
> did 2) and riscv did 2 already for GCC 10 and all of x86_64, riscv and
> rs6000 emit -Wpsabi diagnostics (though I think rs6000 doesn't guard
> it with -Wpsabi).
> 
> I can help with the backend implementations if needed, but I can't
> decide which possibility you want to choose for each backend.
> It would be really nice to decide about this soon, because changing
> the ABI in GCC 12 only to change it again in GCC 13 doesn't look much
> desirable and even if 3) is the choice, it is really nice to have
> some diagnostics about ABI changes.
> 
> Thanks
> 
>   Jakub
> 



Re: RFC: New mechanism for hard reg operands to inline asm

2021-06-04 Thread Andreas Krebbel via Gcc
On 6/4/21 8:18 PM, Paul Koning wrote:
...
> Yes, I would think this should be made a general mechanism that any target 
> could use.
> 
> I wonder if instead of creating a new mechanism you could do this simply by 
> creating new constraint names, where each name matches exactly one hard 
> register.  That's roughly what this amounts to, isn't it? 

I thought about this as well but I'm not sure this would work well without 
changing LRA. It would
mean to introduce many register classes with single registers. The presence of 
many "sparse"
register classes to my understanding would affect register allocation.

Perhaps it would work to just emit all the moves into or out of hard regs at 
expand time.

Andreas


RFC: New mechanism for hard reg operands to inline asm

2021-06-04 Thread Andreas Krebbel via Gcc
Hi,

I wonder if we could replace the register asm construct for
inline assemblies with something a bit nicer and more obvious.
E.g. turning this (real world example from IBM Z kernel code):

int diag8_response(int cmdlen, char *response, int *rlen)
{
register unsigned long reg2 asm ("2") = (addr_t) cpcmd_buf;
register unsigned long reg3 asm ("3") = (addr_t) response;
register unsigned long reg4 asm ("4") = cmdlen | 0x4000L;
register unsigned long reg5 asm ("5") = *rlen; /* <-- */
asm volatile(
"   diag%2,%0,0x8\n"
"   brc 8,1f\n"
"   agr %1,%4\n"
"1:\n"
: "+d" (reg4), "+d" (reg5)
: "d" (reg2), "d" (reg3), "d" (*rlen): "cc");
*rlen = reg5;
return reg4;
}

into this:

int diag8_response(int cmdlen, char *response, int *rlen)
{
unsigned long len = cmdlen | 0x4000L;

asm volatile(
"   diag%2,%0,0x8\n"
"   brc 8,1f\n"
"   agr %1,%4\n"
"1:\n"
: "+{r4}" (len), "+{r5}" (*rlen)
: "{r2}" ((addr_t)cpcmd_buf), "{r3}" ((addr_t)response), "d" 
(*rlen): "cc");
return len;
}

Apart from being much easier to read because the hard regs become part
of the inline assembly it solves also a couple of other issues:

- function calls might clobber register asm variables see BZ100908
- the constraints for the register asm operands are superfluous
- one register asm variable cannot be used for 2 different inline
  assemblies if the value is expected in different hard regs

I've started with a hackish implementation for IBM Z using the
TARGET_MD_ASM_ADJUST hook and let all the places parsing constraints
skip over the {} parts.  But perhaps it would be useful to make this a
generic mechanism for all targets?!

Andreas