Various bswapsi tests are failing for rv64. More importantly, we're generating crappy code.

Let's take the first test from bswapsi-1.c as an example.

typedef unsigned int uint32_t;

#define __const_swab32(x) ((uint32_t)(                                \
        (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) |            \
        (((uint32_t)(x) & (uint32_t)0x0000ff00UL) <<  8) |            \
        (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >>  8) |            \
        (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))

/* This byte swap implementation is used by the Linux kernel and the
   GNU C library.  */

uint32_t
swap32_a (uint32_t in)
{
  return __const_swab32 (in);
}





We generate this for rv64gc_zba_zbb_zbs:

        srliw   a1,a0,24
        slliw   a5,a0,24
        slliw   a3,a0,8
        li      a2,16711680
        li      a4,65536
        or      a5,a5,a1
        and     a3,a3,a2
        addi    a4,a4,-256
        srliw   a0,a0,8
        or      a5,a5,a3
        and     a0,a0,a4
        or      a0,a5,a0
        ret
Urgh!

After this patch we generate:

        rev8    a0,a0
        srai    a0,a0,32
        ret

Clearly better.


The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that the RTL expanders already know how to widen a bswap, which is definitely true. But it's the case that failure to expose a bswapsi will cause the 32bit bswap optimizations in gimple store merging to not trigger. Thus we get crappy code.

To fix this we expose bswapsi on TARGET_64BIT. gimple-store-merging then detects the 32bit bswap idioms and generates suitable __builtin calls. The expander will "FAIL" expansion for TARGET_64BIT which forces the generic expander code to synthesize the operation (we could synthesize in here, but that'd result in duplicate code).

Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the testsuite without any regressions.

Pushed to the trunk,

Jeff


commit fbc01748ba46eb26074388a8fb7b44d25a414a72
Author: Jeff Law <j...@ventanamicro.com>
Date:   Tue Sep 5 15:39:16 2023 -0600

    RISC-V: Expose bswapsi for TARGET_64BIT
    
    Various bswapsi tests are failing for rv64.  More importantly, we're 
generating
    crappy code.
    
    Let's take the first test from bswapsi-1.c as an example.
    
    > typedef unsigned int uint32_t;
    >
    > #define __const_swab32(x) ((uint32_t)(                                \
    >         (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) |            \
    >         (((uint32_t)(x) & (uint32_t)0x0000ff00UL) <<  8) |            \
    >         (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >>  8) |            \
    >         (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
    >
    > /* This byte swap implementation is used by the Linux kernel and the
    >    GNU C library.  */
    >
    > uint32_t
    > swap32_a (uint32_t in)
    > {
    >   return __const_swab32 (in);
    > }
    >
    >
    >
    
    We generate this for rv64gc_zba_zbb_zbs:
    
    >         srliw   a1,a0,24
    >         slliw   a5,a0,24
    >         slliw   a3,a0,8
    >         li      a2,16711680
    >         li      a4,65536
    >         or      a5,a5,a1
    >         and     a3,a3,a2
    >         addi    a4,a4,-256
    >         srliw   a0,a0,8
    >         or      a5,a5,a3
    >         and     a0,a0,a4
    >         or      a0,a5,a0
    >         retUrgh!
    
    After this patch we generate:
    
    >         rev8    a0,a0
    >         srai    a0,a0,32
    >         ret
    Clearly better.
    
    The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that 
the
    RTL expanders already know how to widen a bswap, which is definitely true.  
But
    it's the case that failure to expose a bswapsi will cause the 32bit bswap
    optimizations in gimple store merging to not trigger.  Thus we get crappy 
code.
    
    To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging then
    detects the 32bit bswap idioms and generates suitable __builtin calls.  The
    expander will "FAIL" expansion for TARGET_64BIT which forces the generic
    expander code to synthesize the operation (we could synthesize in here, but
    that'd result in duplicate code).
    
    Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the 
testsuite
    without any regressions.
    
    gcc/
            * config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7b55528ee49..1544ef4e125 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -454,7 +454,16 @@ (define_expand "bswapdi2"
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "register_operand")
        (bswap:SI (match_operand:SI 1 "register_operand")))]
-  "(!TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)) || TARGET_XTHEADBB")
+  "TARGET_ZBB || TARGET_ZBKB || TARGET_XTHEADBB"
+{
+  /* Expose bswapsi2 on TARGET_64BIT so that the gimple store
+     merging pass will create suitable bswap insns.  We can actually
+     just FAIL that case when generating RTL and let the generic code
+     handle it.  */
+  if (TARGET_64BIT && !TARGET_XTHEADBB)
+    FAIL;
+})
+
 
 (define_insn "*bswap<mode>2"
   [(set (match_operand:X 0 "register_operand" "=r")

Reply via email to