Re: Rework constant subreg folds and handle more variable-length cases
Jeff Law writes: > On 7/12/19 1:44 AM, Richard Sandiford wrote: >> Richard Sandiford writes: >>> This patch rewrites the way simplify_subreg handles constants. >>> It uses similar native_encode/native_decode routines to the >>> tree-level handling of VIEW_CONVERT_EXPR, meaning that we can >>> move between rtx constants and the target memory image of them. >>> >>> The main point of this patch is to support subregs of constant-length >>> vectors for VLA vectors, beyond the very simple cases that were already >>> handled. Many of the new tests failed before the patch for variable- >>> length vectors. >>> >>> The boolean side is tested more by the upcoming SVE ACLE work. >>> >>> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. >>> OK to install? >> I made a last-minute change after testing, to use uintNN_t types >> for target_unit rather than the original unsigned char/short/int. >> Of course, that doesn't survive a libgcc build since >> isn't included there. >> >> Fixed below, and posted as tested this time. >> >> Richard >> >> >> 2019-07-12 Richard Sandiford >> >> gcc/ >> * defaults.h (TARGET_UNIT): New macro. >> (target_unit): New type. >> * rtl.h (native_encode_rtx, native_decode_rtx) >> (native_decode_vector_rtx, subreg_size_lsb): Declare. >> (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb. >> * rtlanal.c (subreg_lsb_1): Delete. >> (subreg_size_lsb): New function. >> * simplify-rtx.c: Include rtx-vector-builder.h >> (simplify_immed_subreg): Delete. >> (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx) >> (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New >> functions. >> (simplify_subreg): Use them. >> (test_vector_subregs_modes, test_vector_subregs_repeating) >> (test_vector_subregs_fore_back, test_vector_subregs_stepped) >> (test_vector_subregs): New functions. >> (test_vector_ops): Call test_vector_subregs for integer vector >> modes with at least 2 elements. > This just turns out to be amazingly painful to work through and I don't > particularly see any good breakdown which would make it obvious where > the behavioral changes are vs just refactoring. > > Given your long history with GCC and your expertise in RTL as well as > the SVE space I'm inclined to say go for it and we'll cope with any fallout. Thanks. Here's what I (very) belatedly applied. Further cross-target testing showed I needed some tweaks: (1) Keep: /* Some ports misuse CCmode. */ if (GET_MODE_CLASS (outermode) == MODE_CC && CONST_INT_P (op)) return op; which unfortunately is still needed. (2) The old version filled the undefined upper bits of a paradoxical subreg with zeros, but some ports expected it to be sign-extended for integers. (3) In the self tests, skip over non-IEEE floating-point modes, since bitcasting to and from others can drop bits. Richard 2019-09-19 Richard Sandiford gcc/ * defaults.h (TARGET_UNIT): New macro. (target_unit): New type. * rtl.h (native_encode_rtx, native_decode_rtx) (native_decode_vector_rtx, subreg_size_lsb): Declare. (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb. * rtlanal.c (subreg_lsb_1): Delete. (subreg_size_lsb): New function. * simplify-rtx.c: Include rtx-vector-builder.h (simplify_immed_subreg): Delete. (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx) (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New functions. (simplify_subreg): Use them. (test_vector_subregs_modes, test_vector_subregs_repeating) (test_vector_subregs_fore_back, test_vector_subregs_stepped) (test_vector_subregs): New functions. (test_vector_ops): Call test_vector_subregs for integer vector modes with at least 2 elements. Index: gcc/defaults.h === *** gcc/defaults.h 2019-07-12 08:53:06.0 +0100 --- gcc/defaults.h 2019-09-19 09:56:43.873352025 +0100 *** #define TARGET_VTABLE_USES_DESCRIPTORS 0 *** 1459,1462 --- 1459,1476 #define DWARF_GNAT_ENCODINGS_DEFAULT DWARF_GNAT_ENCODINGS_GDB #endif + #ifndef USED_FOR_TARGET + /* Done this way to keep gengtype happy. */ + #if BITS_PER_UNIT == 8 + #define TARGET_UNIT uint8_t + #elif BITS_PER_UNIT == 16 + #define TARGET_UNIT uint16_t + #elif BITS_PER_UNIT == 32 + #define TARGET_UNIT uint32_t + #else + #error Unknown BITS_PER_UNIT + #endif + typedef TARGET_UNIT target_unit; + #endif + #endif /* ! GCC_DEFAULTS_H */ Index: gcc/rtl.h === *** gcc/rtl.h 2019-09-12 10:52:56.0 +0100 --- gcc/rtl.h 2019-09-19 09:56:43.877351995 +0100 *** extern int rtx_cost (rtx, machine_mode, ***
Re: Rework constant subreg folds and handle more variable-length cases
On 7/12/19 1:44 AM, Richard Sandiford wrote: > Richard Sandiford writes: >> This patch rewrites the way simplify_subreg handles constants. >> It uses similar native_encode/native_decode routines to the >> tree-level handling of VIEW_CONVERT_EXPR, meaning that we can >> move between rtx constants and the target memory image of them. >> >> The main point of this patch is to support subregs of constant-length >> vectors for VLA vectors, beyond the very simple cases that were already >> handled. Many of the new tests failed before the patch for variable- >> length vectors. >> >> The boolean side is tested more by the upcoming SVE ACLE work. >> >> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. >> OK to install? > I made a last-minute change after testing, to use uintNN_t types > for target_unit rather than the original unsigned char/short/int. > Of course, that doesn't survive a libgcc build since > isn't included there. > > Fixed below, and posted as tested this time. > > Richard > > > 2019-07-12 Richard Sandiford > > gcc/ > * defaults.h (TARGET_UNIT): New macro. > (target_unit): New type. > * rtl.h (native_encode_rtx, native_decode_rtx) > (native_decode_vector_rtx, subreg_size_lsb): Declare. > (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb. > * rtlanal.c (subreg_lsb_1): Delete. > (subreg_size_lsb): New function. > * simplify-rtx.c: Include rtx-vector-builder.h > (simplify_immed_subreg): Delete. > (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx) > (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New > functions. > (simplify_subreg): Use them. > (test_vector_subregs_modes, test_vector_subregs_repeating) > (test_vector_subregs_fore_back, test_vector_subregs_stepped) > (test_vector_subregs): New functions. > (test_vector_ops): Call test_vector_subregs for integer vector > modes with at least 2 elements. This just turns out to be amazingly painful to work through and I don't particularly see any good breakdown which would make it obvious where the behavioral changes are vs just refactoring. Given your long history with GCC and your expertise in RTL as well as the SVE space I'm inclined to say go for it and we'll cope with any fallout. Jeff
Re: Rework constant subreg folds and handle more variable-length cases
Richard Sandiford writes: > This patch rewrites the way simplify_subreg handles constants. > It uses similar native_encode/native_decode routines to the > tree-level handling of VIEW_CONVERT_EXPR, meaning that we can > move between rtx constants and the target memory image of them. > > The main point of this patch is to support subregs of constant-length > vectors for VLA vectors, beyond the very simple cases that were already > handled. Many of the new tests failed before the patch for variable- > length vectors. > > The boolean side is tested more by the upcoming SVE ACLE work. > > Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. > OK to install? I made a last-minute change after testing, to use uintNN_t types for target_unit rather than the original unsigned char/short/int. Of course, that doesn't survive a libgcc build since isn't included there. Fixed below, and posted as tested this time. Richard 2019-07-12 Richard Sandiford gcc/ * defaults.h (TARGET_UNIT): New macro. (target_unit): New type. * rtl.h (native_encode_rtx, native_decode_rtx) (native_decode_vector_rtx, subreg_size_lsb): Declare. (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb. * rtlanal.c (subreg_lsb_1): Delete. (subreg_size_lsb): New function. * simplify-rtx.c: Include rtx-vector-builder.h (simplify_immed_subreg): Delete. (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx) (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New functions. (simplify_subreg): Use them. (test_vector_subregs_modes, test_vector_subregs_repeating) (test_vector_subregs_fore_back, test_vector_subregs_stepped) (test_vector_subregs): New functions. (test_vector_ops): Call test_vector_subregs for integer vector modes with at least 2 elements. Index: gcc/defaults.h === *** gcc/defaults.h 2019-07-12 08:39:57.0 +0100 --- gcc/defaults.h 2019-07-12 08:40:15.788314307 +0100 *** #define TARGET_VTABLE_USES_DESCRIPTORS 0 *** 1459,1462 --- 1459,1476 #define DWARF_GNAT_ENCODINGS_DEFAULT DWARF_GNAT_ENCODINGS_GDB #endif + #ifndef USED_FOR_TARGET + /* Done this way to keep gengtype happy. */ + #if BITS_PER_UNIT == 8 + #define TARGET_UNIT uint8_t + #elif BITS_PER_UNIT == 16 + #define TARGET_UNIT uint16_t + #elif BITS_PER_UNIT == 32 + #define TARGET_UNIT uint32_t + #else + #error Unknown BITS_PER_UNIT + #endif + typedef TARGET_UNIT target_unit; + #endif + #endif /* ! GCC_DEFAULTS_H */ Index: gcc/rtl.h === *** gcc/rtl.h 2019-07-12 08:39:57.0 +0100 --- gcc/rtl.h 2019-07-12 08:40:15.788314307 +0100 *** extern int rtx_cost (rtx, machine_mode, *** 2400,2411 extern int address_cost (rtx, machine_mode, addr_space_t, bool); extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int, struct full_rtx_costs *); extern poly_uint64 subreg_lsb (const_rtx); ! extern poly_uint64 subreg_lsb_1 (machine_mode, machine_mode, poly_uint64); extern poly_uint64 subreg_size_offset_from_lsb (poly_uint64, poly_uint64, poly_uint64); extern bool read_modify_subreg_p (const_rtx); /* Return the subreg byte offset for a subreg whose outer mode is OUTER_MODE, whose inner mode is INNER_MODE, and where there are LSB_SHIFT *bits* between the lsb of the outer value and the lsb of --- 2400,2429 extern int address_cost (rtx, machine_mode, addr_space_t, bool); extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int, struct full_rtx_costs *); + extern bool native_encode_rtx (machine_mode, rtx, vec &, + unsigned int, unsigned int); + extern rtx native_decode_rtx (machine_mode, vec, + unsigned int); + extern rtx native_decode_vector_rtx (machine_mode, vec, +unsigned int, unsigned int, unsigned int); extern poly_uint64 subreg_lsb (const_rtx); ! extern poly_uint64 subreg_size_lsb (poly_uint64, poly_uint64, poly_uint64); extern poly_uint64 subreg_size_offset_from_lsb (poly_uint64, poly_uint64, poly_uint64); extern bool read_modify_subreg_p (const_rtx); + /* Given a subreg's OUTER_MODE, INNER_MODE, and SUBREG_BYTE, return the +bit offset at which the subreg begins (counting from the least significant +bit of the operand). */ + + inline poly_uint64 + subreg_lsb_1 (machine_mode outer_mode, machine_mode inner_mode, + poly_uint64 subreg_byte) + { + return subreg_size_lsb (GET_MODE_SIZE (outer_mode), + GET_MODE_SIZE (inner_mode), subreg_byte); + } +
Re: Rework constant subreg folds and handle more variable-length cases
Richard Biener writes: > On Thu, Jul 11, 2019 at 10:03 AM Richard Sandiford > wrote: >> >> This patch rewrites the way simplify_subreg handles constants. >> It uses similar native_encode/native_decode routines to the >> tree-level handling of VIEW_CONVERT_EXPR, meaning that we can >> move between rtx constants and the target memory image of them. >> >> The main point of this patch is to support subregs of constant-length >> vectors for VLA vectors, beyond the very simple cases that were already >> handled. Many of the new tests failed before the patch for variable- >> length vectors. >> >> The boolean side is tested more by the upcoming SVE ACLE work. >> >> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. >> OK to install? > > Hmm. So is subreg [offset] defined in terms of memory order or in > terms of register order? Memory order, except for the special case that paradoxical subregs always have an offset of zero (rather than the natural negative value for big-endian). > I wonder if you need to handle FLOAT_WORDS_BIG_ENDIAN, > REG_WORDS_BIG_ENDIAN and whether BYTES/WORDS_BIG_ENDIAN have any > meaning here at all? > > I'm always struggling with this when working on BIT_FIELD_REFs > on GIMPLE [registers]... Yeah. Yhe subreg_size_lsb stuff copes with BYTES/WORDS_BIG_ENDIAN. I used that in the patch exactly because I didn't want to have to redo the logic here :-) Because the offset is (mostly) the memory offset, REG_WORDS_BIG_ENDIAN doesn't affect this code. It instead affects how we divide up a multi-register hard register. (See subreg_get_info.) FLOAT_WORDS_BIG_ENDIAN is explicitly documented as not affecting subregs: @cindex @code{FLOAT_WORDS_BIG_ENDIAN}, (lack of) effect on @code{subreg} On a few targets, @code{FLOAT_WORDS_BIG_ENDIAN} disagrees with @code{WORDS_BIG_ENDIAN}. However, most parts of the compiler treat floating point values as if they had the same endianness as integer values. This works because they handle them solely as a collection of integer values, with no particular numerical value. Only real.c and the runtime libraries care about @code{FLOAT_WORDS_BIG_ENDIAN}. Thanks, Richard
Re: Rework constant subreg folds and handle more variable-length cases
On Thu, Jul 11, 2019 at 10:03 AM Richard Sandiford wrote: > > This patch rewrites the way simplify_subreg handles constants. > It uses similar native_encode/native_decode routines to the > tree-level handling of VIEW_CONVERT_EXPR, meaning that we can > move between rtx constants and the target memory image of them. > > The main point of this patch is to support subregs of constant-length > vectors for VLA vectors, beyond the very simple cases that were already > handled. Many of the new tests failed before the patch for variable- > length vectors. > > The boolean side is tested more by the upcoming SVE ACLE work. > > Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. > OK to install? Hmm. So is subreg [offset] defined in terms of memory order or in terms of register order? I wonder if you need to handle FLOAT_WORDS_BIG_ENDIAN, REG_WORDS_BIG_ENDIAN and whether BYTES/WORDS_BIG_ENDIAN have any meaning here at all? I'm always struggling with this when working on BIT_FIELD_REFs on GIMPLE [registers]... Richard. > Richard > > > 2019-07-11 Richard Sandiford > > gcc/ > * defaults.h (TARGET_UNIT): New macro. > (target_unit): New type. > * rtl.h (native_encode_rtx, native_decode_rtx) > (native_decode_vector_rtx, subreg_size_lsb): Declare. > (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb. > * rtlanal.c (subreg_lsb_1): Delete. > (subreg_size_lsb): New function. > * simplify-rtx.c: Include rtx-vector-builder.h > (simplify_immed_subreg): Delete. > (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx) > (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New > functions. > (simplify_subreg): Use them. > (test_vector_subregs_modes, test_vector_subregs_repeating) > (test_vector_subregs_fore_back, test_vector_subregs_stepped) > (test_vector_subregs): New functions. > (test_vector_ops): Call test_vector_subregs for integer vector > modes with at least 2 elements. > > Index: gcc/defaults.h > === > *** gcc/defaults.h 2019-07-11 08:33:57.0 +0100 > --- gcc/defaults.h 2019-07-11 08:33:58.069250175 +0100 > *** #define TARGET_VTABLE_USES_DESCRIPTORS 0 > *** 1459,1462 > --- 1459,1474 > #define DWARF_GNAT_ENCODINGS_DEFAULT DWARF_GNAT_ENCODINGS_GDB > #endif > > + /* Done this way to keep gengtype happy. */ > + #if BITS_PER_UNIT == 8 > + #define TARGET_UNIT uint8_t > + #elif BITS_PER_UNIT == 16 > + #define TARGET_UNIT uint16_t > + #elif BITS_PER_UNIT == 32 > + #define TARGET_UNIT uint32_t > + #else > + #error Unknown BITS_PER_UNIT > + #endif > + typedef TARGET_UNIT target_unit; > + > #endif /* ! GCC_DEFAULTS_H */ > Index: gcc/rtl.h > === > *** gcc/rtl.h 2019-07-11 08:33:57.0 +0100 > --- gcc/rtl.h 2019-07-11 08:33:58.069250175 +0100 > *** extern int rtx_cost (rtx, machine_mode, > *** 2400,2411 > extern int address_cost (rtx, machine_mode, addr_space_t, bool); > extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int, >struct full_rtx_costs *); > extern poly_uint64 subreg_lsb (const_rtx); > ! extern poly_uint64 subreg_lsb_1 (machine_mode, machine_mode, poly_uint64); > extern poly_uint64 subreg_size_offset_from_lsb (poly_uint64, poly_uint64, > poly_uint64); > extern bool read_modify_subreg_p (const_rtx); > > /* Return the subreg byte offset for a subreg whose outer mode is > OUTER_MODE, whose inner mode is INNER_MODE, and where there are > LSB_SHIFT *bits* between the lsb of the outer value and the lsb of > --- 2400,2429 > extern int address_cost (rtx, machine_mode, addr_space_t, bool); > extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int, >struct full_rtx_costs *); > + extern bool native_encode_rtx (machine_mode, rtx, vec &, > + unsigned int, unsigned int); > + extern rtx native_decode_rtx (machine_mode, vec, > + unsigned int); > + extern rtx native_decode_vector_rtx (machine_mode, vec, > +unsigned int, unsigned int, unsigned > int); > extern poly_uint64 subreg_lsb (const_rtx); > ! extern poly_uint64 subreg_size_lsb (poly_uint64, poly_uint64, poly_uint64); > extern poly_uint64 subreg_size_offset_from_lsb (poly_uint64, poly_uint64, > poly_uint64); > extern bool read_modify_subreg_p (const_rtx); > > + /* Given a subreg's OUTER_MODE, INNER_MODE, and SUBREG_BYTE, return the > +bit offset at which the subreg begins (counting from the least > significant > +bit of the operand). */ > + > + inline
Rework constant subreg folds and handle more variable-length cases
This patch rewrites the way simplify_subreg handles constants. It uses similar native_encode/native_decode routines to the tree-level handling of VIEW_CONVERT_EXPR, meaning that we can move between rtx constants and the target memory image of them. The main point of this patch is to support subregs of constant-length vectors for VLA vectors, beyond the very simple cases that were already handled. Many of the new tests failed before the patch for variable- length vectors. The boolean side is tested more by the upcoming SVE ACLE work. Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. OK to install? Richard 2019-07-11 Richard Sandiford gcc/ * defaults.h (TARGET_UNIT): New macro. (target_unit): New type. * rtl.h (native_encode_rtx, native_decode_rtx) (native_decode_vector_rtx, subreg_size_lsb): Declare. (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb. * rtlanal.c (subreg_lsb_1): Delete. (subreg_size_lsb): New function. * simplify-rtx.c: Include rtx-vector-builder.h (simplify_immed_subreg): Delete. (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx) (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New functions. (simplify_subreg): Use them. (test_vector_subregs_modes, test_vector_subregs_repeating) (test_vector_subregs_fore_back, test_vector_subregs_stepped) (test_vector_subregs): New functions. (test_vector_ops): Call test_vector_subregs for integer vector modes with at least 2 elements. Index: gcc/defaults.h === *** gcc/defaults.h 2019-07-11 08:33:57.0 +0100 --- gcc/defaults.h 2019-07-11 08:33:58.069250175 +0100 *** #define TARGET_VTABLE_USES_DESCRIPTORS 0 *** 1459,1462 --- 1459,1474 #define DWARF_GNAT_ENCODINGS_DEFAULT DWARF_GNAT_ENCODINGS_GDB #endif + /* Done this way to keep gengtype happy. */ + #if BITS_PER_UNIT == 8 + #define TARGET_UNIT uint8_t + #elif BITS_PER_UNIT == 16 + #define TARGET_UNIT uint16_t + #elif BITS_PER_UNIT == 32 + #define TARGET_UNIT uint32_t + #else + #error Unknown BITS_PER_UNIT + #endif + typedef TARGET_UNIT target_unit; + #endif /* ! GCC_DEFAULTS_H */ Index: gcc/rtl.h === *** gcc/rtl.h 2019-07-11 08:33:57.0 +0100 --- gcc/rtl.h 2019-07-11 08:33:58.069250175 +0100 *** extern int rtx_cost (rtx, machine_mode, *** 2400,2411 extern int address_cost (rtx, machine_mode, addr_space_t, bool); extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int, struct full_rtx_costs *); extern poly_uint64 subreg_lsb (const_rtx); ! extern poly_uint64 subreg_lsb_1 (machine_mode, machine_mode, poly_uint64); extern poly_uint64 subreg_size_offset_from_lsb (poly_uint64, poly_uint64, poly_uint64); extern bool read_modify_subreg_p (const_rtx); /* Return the subreg byte offset for a subreg whose outer mode is OUTER_MODE, whose inner mode is INNER_MODE, and where there are LSB_SHIFT *bits* between the lsb of the outer value and the lsb of --- 2400,2429 extern int address_cost (rtx, machine_mode, addr_space_t, bool); extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int, struct full_rtx_costs *); + extern bool native_encode_rtx (machine_mode, rtx, vec &, + unsigned int, unsigned int); + extern rtx native_decode_rtx (machine_mode, vec, + unsigned int); + extern rtx native_decode_vector_rtx (machine_mode, vec, +unsigned int, unsigned int, unsigned int); extern poly_uint64 subreg_lsb (const_rtx); ! extern poly_uint64 subreg_size_lsb (poly_uint64, poly_uint64, poly_uint64); extern poly_uint64 subreg_size_offset_from_lsb (poly_uint64, poly_uint64, poly_uint64); extern bool read_modify_subreg_p (const_rtx); + /* Given a subreg's OUTER_MODE, INNER_MODE, and SUBREG_BYTE, return the +bit offset at which the subreg begins (counting from the least significant +bit of the operand). */ + + inline poly_uint64 + subreg_lsb_1 (machine_mode outer_mode, machine_mode inner_mode, + poly_uint64 subreg_byte) + { + return subreg_size_lsb (GET_MODE_SIZE (outer_mode), + GET_MODE_SIZE (inner_mode), subreg_byte); + } + /* Return the subreg byte offset for a subreg whose outer mode is OUTER_MODE, whose inner mode is INNER_MODE, and where there are LSB_SHIFT *bits* between the lsb of the outer value and the lsb of Index: gcc/rtlanal.c === *** gcc/rtlanal.c 2019-07-11