Re: RFA: tweak integer type used for memcpy folding
On Tue, Apr 22, 2014 at 11:51 AM, Richard Sandiford wrote: > Richard Biener writes: >> On Tue, Apr 22, 2014 at 11:15 AM, Richard Sandiford >> wrote: >>> Richard Biener writes: On Tue, Apr 22, 2014 at 10:43 AM, Richard Sandiford wrote: > Richard Biener writes: >> On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford >> wrote: >>> wide-int fails to build libitm because of a bad interaction between: >>> >>> /* Keep the OI and XI modes from confusing the compiler into thinking >>>that these modes could actually be used for computation. They are >>>only holders for vectors during data movement. */ >>> #define MAX_BITSIZE_MODE_ANY_INT (128) >>> >>> and the memcpy folding code: >>> >>> /* Make sure we are not copying using a floating-point mode or >>> a type whose size possibly does not match its precision. */ >>> if (FLOAT_MODE_P (TYPE_MODE (desttype)) >>> || TREE_CODE (desttype) == BOOLEAN_TYPE >>> || TREE_CODE (desttype) == ENUMERAL_TYPE) >>> { >>> /* A more suitable int_mode_for_mode would return a vector >>> integer mode for a vector float mode or a integer complex >>> mode for a float complex mode if there isn't a regular >>> integer mode covering the mode of desttype. */ >>> enum machine_mode mode = int_mode_for_mode (TYPE_MODE >>> (desttype)); >>> if (mode == BLKmode) >>> desttype = NULL_TREE; >>> else >>> desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE >>> (mode), >>>1); >>> } >>> if (FLOAT_MODE_P (TYPE_MODE (srctype)) >>> || TREE_CODE (srctype) == BOOLEAN_TYPE >>> || TREE_CODE (srctype) == ENUMERAL_TYPE) >>> { >>> enum machine_mode mode = int_mode_for_mode (TYPE_MODE >>> (srctype)); >>> if (mode == BLKmode) >>> srctype = NULL_TREE; >>> else >>> srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE >>> (mode), >>> 1); >>> } >>> >>> The failure occurs for complex long double, which we try to copy as >>> a 256-bit integer type (OImode). >>> >>> This patch tries to do what the comment suggests by introducing a new >>> form of int_mode_for_mode that replaces vector modes with vector modes >>> and complex modes with complex modes. The fallback case of using a >>> MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above >>> 128 bits on x86_64. >>> >>> The question then is what to do about 128-bit types for i386. >>> MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be >>> used for optimisation. However, gcc.target/i386/pr49168-1.c only passes >>> for -m32 -msse2 because we use int128_t to copy a float128_t. >>> >>> I handled that by allowing MODE_VECTOR_INT to be used instead of >>> MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, >>> even if the original type wasn't a vector. >> >> Hmm. Sounds reasonable unless there are very weird targets that >> cannot efficiently load/store vectors unaligned but can handle >> efficient load/store of unaligned scalars. > > Yeah, in general there's no guarantee that even int_mode_for_mode > will return a mode with the same alignment as the original. Callers > need to check that (like the memcpy folder does). > >>> It might be that other callers to int_mode_for_mode should use >>> the new function too, but I'll look at that separately. >>> >>> I used the attached testcase (with printfs added to gcc) to check that >>> the right modes and types were being chosen. The patch fixes the >>> complex float and complex double cases, since the integer type that we >>> previously picked had a larger alignment than the original complex type. >> >> As of complex int modes - are we positively sure that targets even >> try to do sth "optimal" for loads/stores of those? > > Complex modes usually aren't handled directly by .md patterns, > either int or float. They're really treated as a pair of values. > So IMO it still makes sense to fold this case. > >>> One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) >>> is that vectors are copied as integer vectors if the target supports >>> them directly but are copied as float vectors otherwise, since in the >>> latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the >>> test are copied as vector floats and vector doubles both before and >>> after the patch. >> >> That wasn't intended ... the fo
Re: RFA: tweak integer type used for memcpy folding
Richard Biener writes: > On Tue, Apr 22, 2014 at 11:15 AM, Richard Sandiford > wrote: >> Richard Biener writes: >>> On Tue, Apr 22, 2014 at 10:43 AM, Richard Sandiford >>> wrote: Richard Biener writes: > On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford > wrote: >> wide-int fails to build libitm because of a bad interaction between: >> >> /* Keep the OI and XI modes from confusing the compiler into thinking >>that these modes could actually be used for computation. They are >>only holders for vectors during data movement. */ >> #define MAX_BITSIZE_MODE_ANY_INT (128) >> >> and the memcpy folding code: >> >> /* Make sure we are not copying using a floating-point mode or >> a type whose size possibly does not match its precision. */ >> if (FLOAT_MODE_P (TYPE_MODE (desttype)) >> || TREE_CODE (desttype) == BOOLEAN_TYPE >> || TREE_CODE (desttype) == ENUMERAL_TYPE) >> { >> /* A more suitable int_mode_for_mode would return a vector >> integer mode for a vector float mode or a integer complex >> mode for a float complex mode if there isn't a regular >> integer mode covering the mode of desttype. */ >> enum machine_mode mode = int_mode_for_mode (TYPE_MODE >> (desttype)); >> if (mode == BLKmode) >> desttype = NULL_TREE; >> else >> desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE >> (mode), >>1); >> } >> if (FLOAT_MODE_P (TYPE_MODE (srctype)) >> || TREE_CODE (srctype) == BOOLEAN_TYPE >> || TREE_CODE (srctype) == ENUMERAL_TYPE) >> { >> enum machine_mode mode = int_mode_for_mode (TYPE_MODE >> (srctype)); >> if (mode == BLKmode) >> srctype = NULL_TREE; >> else >> srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE >> (mode), >> 1); >> } >> >> The failure occurs for complex long double, which we try to copy as >> a 256-bit integer type (OImode). >> >> This patch tries to do what the comment suggests by introducing a new >> form of int_mode_for_mode that replaces vector modes with vector modes >> and complex modes with complex modes. The fallback case of using a >> MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above >> 128 bits on x86_64. >> >> The question then is what to do about 128-bit types for i386. >> MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be >> used for optimisation. However, gcc.target/i386/pr49168-1.c only passes >> for -m32 -msse2 because we use int128_t to copy a float128_t. >> >> I handled that by allowing MODE_VECTOR_INT to be used instead of >> MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, >> even if the original type wasn't a vector. > > Hmm. Sounds reasonable unless there are very weird targets that > cannot efficiently load/store vectors unaligned but can handle > efficient load/store of unaligned scalars. Yeah, in general there's no guarantee that even int_mode_for_mode will return a mode with the same alignment as the original. Callers need to check that (like the memcpy folder does). >> It might be that other callers to int_mode_for_mode should use >> the new function too, but I'll look at that separately. >> >> I used the attached testcase (with printfs added to gcc) to check that >> the right modes and types were being chosen. The patch fixes the >> complex float and complex double cases, since the integer type that we >> previously picked had a larger alignment than the original complex type. > > As of complex int modes - are we positively sure that targets even > try to do sth "optimal" for loads/stores of those? Complex modes usually aren't handled directly by .md patterns, either int or float. They're really treated as a pair of values. So IMO it still makes sense to fold this case. >> One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) >> is that vectors are copied as integer vectors if the target supports >> them directly but are copied as float vectors otherwise, since in the >> latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the >> test are copied as vector floats and vector doubles both before and >> after the patch. > > That wasn't intended ... the folding should have failed if we can't > copy using an integer mode ... Does that mean that the fold give up if TYPE_MODE is BLKmode? I can do t
Re: RFA: tweak integer type used for memcpy folding
On Tue, Apr 22, 2014 at 11:15 AM, Richard Sandiford wrote: > Richard Biener writes: >> On Tue, Apr 22, 2014 at 10:43 AM, Richard Sandiford >> wrote: >>> Richard Biener writes: On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford wrote: > wide-int fails to build libitm because of a bad interaction between: > > /* Keep the OI and XI modes from confusing the compiler into thinking >that these modes could actually be used for computation. They are >only holders for vectors during data movement. */ > #define MAX_BITSIZE_MODE_ANY_INT (128) > > and the memcpy folding code: > > /* Make sure we are not copying using a floating-point mode or > a type whose size possibly does not match its precision. */ > if (FLOAT_MODE_P (TYPE_MODE (desttype)) > || TREE_CODE (desttype) == BOOLEAN_TYPE > || TREE_CODE (desttype) == ENUMERAL_TYPE) > { > /* A more suitable int_mode_for_mode would return a vector > integer mode for a vector float mode or a integer complex > mode for a float complex mode if there isn't a regular > integer mode covering the mode of desttype. */ > enum machine_mode mode = int_mode_for_mode (TYPE_MODE > (desttype)); > if (mode == BLKmode) > desttype = NULL_TREE; > else > desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE > (mode), >1); > } > if (FLOAT_MODE_P (TYPE_MODE (srctype)) > || TREE_CODE (srctype) == BOOLEAN_TYPE > || TREE_CODE (srctype) == ENUMERAL_TYPE) > { > enum machine_mode mode = int_mode_for_mode (TYPE_MODE > (srctype)); > if (mode == BLKmode) > srctype = NULL_TREE; > else > srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE > (mode), > 1); > } > > The failure occurs for complex long double, which we try to copy as > a 256-bit integer type (OImode). > > This patch tries to do what the comment suggests by introducing a new > form of int_mode_for_mode that replaces vector modes with vector modes > and complex modes with complex modes. The fallback case of using a > MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above > 128 bits on x86_64. > > The question then is what to do about 128-bit types for i386. > MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be > used for optimisation. However, gcc.target/i386/pr49168-1.c only passes > for -m32 -msse2 because we use int128_t to copy a float128_t. > > I handled that by allowing MODE_VECTOR_INT to be used instead of > MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, > even if the original type wasn't a vector. Hmm. Sounds reasonable unless there are very weird targets that cannot efficiently load/store vectors unaligned but can handle efficient load/store of unaligned scalars. >>> >>> Yeah, in general there's no guarantee that even int_mode_for_mode >>> will return a mode with the same alignment as the original. Callers >>> need to check that (like the memcpy folder does). >>> > It might be that other callers to int_mode_for_mode should use > the new function too, but I'll look at that separately. > > I used the attached testcase (with printfs added to gcc) to check that > the right modes and types were being chosen. The patch fixes the > complex float and complex double cases, since the integer type that we > previously picked had a larger alignment than the original complex type. As of complex int modes - are we positively sure that targets even try to do sth "optimal" for loads/stores of those? >>> >>> Complex modes usually aren't handled directly by .md patterns, >>> either int or float. They're really treated as a pair of values. >>> So IMO it still makes sense to fold this case. >>> > One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) > is that vectors are copied as integer vectors if the target supports > them directly but are copied as float vectors otherwise, since in the > latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the > test are copied as vector floats and vector doubles both before and > after the patch. That wasn't intended ... the folding should have failed if we can't copy using an integer mode ... >>> >>> Does that mean that the fold give up if TYPE_MODE is BLKmode? >>> I can do that as a separate patch if so. >> >> Looking at the code again it should always choose an integer mode/type >> via setting destty
Re: RFA: tweak integer type used for memcpy folding
Richard Biener writes: > On Tue, Apr 22, 2014 at 10:43 AM, Richard Sandiford > wrote: >> Richard Biener writes: >>> On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford >>> wrote: wide-int fails to build libitm because of a bad interaction between: /* Keep the OI and XI modes from confusing the compiler into thinking that these modes could actually be used for computation. They are only holders for vectors during data movement. */ #define MAX_BITSIZE_MODE_ANY_INT (128) and the memcpy folding code: /* Make sure we are not copying using a floating-point mode or a type whose size possibly does not match its precision. */ if (FLOAT_MODE_P (TYPE_MODE (desttype)) || TREE_CODE (desttype) == BOOLEAN_TYPE || TREE_CODE (desttype) == ENUMERAL_TYPE) { /* A more suitable int_mode_for_mode would return a vector integer mode for a vector float mode or a integer complex mode for a float complex mode if there isn't a regular integer mode covering the mode of desttype. */ enum machine_mode mode = int_mode_for_mode (TYPE_MODE (desttype)); if (mode == BLKmode) desttype = NULL_TREE; else desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1); } if (FLOAT_MODE_P (TYPE_MODE (srctype)) || TREE_CODE (srctype) == BOOLEAN_TYPE || TREE_CODE (srctype) == ENUMERAL_TYPE) { enum machine_mode mode = int_mode_for_mode (TYPE_MODE (srctype)); if (mode == BLKmode) srctype = NULL_TREE; else srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1); } The failure occurs for complex long double, which we try to copy as a 256-bit integer type (OImode). This patch tries to do what the comment suggests by introducing a new form of int_mode_for_mode that replaces vector modes with vector modes and complex modes with complex modes. The fallback case of using a MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above 128 bits on x86_64. The question then is what to do about 128-bit types for i386. MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be used for optimisation. However, gcc.target/i386/pr49168-1.c only passes for -m32 -msse2 because we use int128_t to copy a float128_t. I handled that by allowing MODE_VECTOR_INT to be used instead of MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, even if the original type wasn't a vector. >>> >>> Hmm. Sounds reasonable unless there are very weird targets that >>> cannot efficiently load/store vectors unaligned but can handle >>> efficient load/store of unaligned scalars. >> >> Yeah, in general there's no guarantee that even int_mode_for_mode >> will return a mode with the same alignment as the original. Callers >> need to check that (like the memcpy folder does). >> It might be that other callers to int_mode_for_mode should use the new function too, but I'll look at that separately. I used the attached testcase (with printfs added to gcc) to check that the right modes and types were being chosen. The patch fixes the complex float and complex double cases, since the integer type that we previously picked had a larger alignment than the original complex type. >>> >>> As of complex int modes - are we positively sure that targets even >>> try to do sth "optimal" for loads/stores of those? >> >> Complex modes usually aren't handled directly by .md patterns, >> either int or float. They're really treated as a pair of values. >> So IMO it still makes sense to fold this case. >> One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) is that vectors are copied as integer vectors if the target supports them directly but are copied as float vectors otherwise, since in the latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the test are copied as vector floats and vector doubles both before and after the patch. >>> >>> That wasn't intended ... the folding should have failed if we can't >>> copy using an integer mode ... >> >> Does that mean that the fold give up if TYPE_MODE is BLKmode? >> I can do that as a separate patch if so. > > Looking at the code again it should always choose an integer mode/type > via setting desttype/srctype to NULL for BLKmode and > > if (!srctype) > srctype = desttype; > if (!desttype) > desttype = srctype; > if (!srctype) >
Re: RFA: tweak integer type used for memcpy folding
On Tue, Apr 22, 2014 at 10:43 AM, Richard Sandiford wrote: > Richard Biener writes: >> On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford >> wrote: >>> wide-int fails to build libitm because of a bad interaction between: >>> >>> /* Keep the OI and XI modes from confusing the compiler into thinking >>>that these modes could actually be used for computation. They are >>>only holders for vectors during data movement. */ >>> #define MAX_BITSIZE_MODE_ANY_INT (128) >>> >>> and the memcpy folding code: >>> >>> /* Make sure we are not copying using a floating-point mode or >>> a type whose size possibly does not match its precision. */ >>> if (FLOAT_MODE_P (TYPE_MODE (desttype)) >>> || TREE_CODE (desttype) == BOOLEAN_TYPE >>> || TREE_CODE (desttype) == ENUMERAL_TYPE) >>> { >>> /* A more suitable int_mode_for_mode would return a vector >>> integer mode for a vector float mode or a integer complex >>> mode for a float complex mode if there isn't a regular >>> integer mode covering the mode of desttype. */ >>> enum machine_mode mode = int_mode_for_mode (TYPE_MODE (desttype)); >>> if (mode == BLKmode) >>> desttype = NULL_TREE; >>> else >>> desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE >>> (mode), >>>1); >>> } >>> if (FLOAT_MODE_P (TYPE_MODE (srctype)) >>> || TREE_CODE (srctype) == BOOLEAN_TYPE >>> || TREE_CODE (srctype) == ENUMERAL_TYPE) >>> { >>> enum machine_mode mode = int_mode_for_mode (TYPE_MODE (srctype)); >>> if (mode == BLKmode) >>> srctype = NULL_TREE; >>> else >>> srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE >>> (mode), >>> 1); >>> } >>> >>> The failure occurs for complex long double, which we try to copy as >>> a 256-bit integer type (OImode). >>> >>> This patch tries to do what the comment suggests by introducing a new >>> form of int_mode_for_mode that replaces vector modes with vector modes >>> and complex modes with complex modes. The fallback case of using a >>> MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above >>> 128 bits on x86_64. >>> >>> The question then is what to do about 128-bit types for i386. >>> MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be >>> used for optimisation. However, gcc.target/i386/pr49168-1.c only passes >>> for -m32 -msse2 because we use int128_t to copy a float128_t. >>> >>> I handled that by allowing MODE_VECTOR_INT to be used instead of >>> MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, >>> even if the original type wasn't a vector. >> >> Hmm. Sounds reasonable unless there are very weird targets that >> cannot efficiently load/store vectors unaligned but can handle >> efficient load/store of unaligned scalars. > > Yeah, in general there's no guarantee that even int_mode_for_mode > will return a mode with the same alignment as the original. Callers > need to check that (like the memcpy folder does). > >>> It might be that other callers to int_mode_for_mode should use >>> the new function too, but I'll look at that separately. >>> >>> I used the attached testcase (with printfs added to gcc) to check that >>> the right modes and types were being chosen. The patch fixes the >>> complex float and complex double cases, since the integer type that we >>> previously picked had a larger alignment than the original complex type. >> >> As of complex int modes - are we positively sure that targets even >> try to do sth "optimal" for loads/stores of those? > > Complex modes usually aren't handled directly by .md patterns, > either int or float. They're really treated as a pair of values. > So IMO it still makes sense to fold this case. > >>> One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) >>> is that vectors are copied as integer vectors if the target supports >>> them directly but are copied as float vectors otherwise, since in the >>> latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the >>> test are copied as vector floats and vector doubles both before and >>> after the patch. >> >> That wasn't intended ... the folding should have failed if we can't >> copy using an integer mode ... > > Does that mean that the fold give up if TYPE_MODE is BLKmode? > I can do that as a separate patch if so. Looking at the code again it should always choose an integer mode/type via setting desttype/srctype to NULL for BLKmode and if (!srctype) srctype = desttype; if (!desttype) desttype = srctype; if (!srctype) return NULL_TREE; no? Thus if we can't get a integer type for either src or dest then we fail. But we should never end up with srctype or dest
Re: RFA: tweak integer type used for memcpy folding
Richard Biener writes: > On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford > wrote: >> wide-int fails to build libitm because of a bad interaction between: >> >> /* Keep the OI and XI modes from confusing the compiler into thinking >>that these modes could actually be used for computation. They are >>only holders for vectors during data movement. */ >> #define MAX_BITSIZE_MODE_ANY_INT (128) >> >> and the memcpy folding code: >> >> /* Make sure we are not copying using a floating-point mode or >> a type whose size possibly does not match its precision. */ >> if (FLOAT_MODE_P (TYPE_MODE (desttype)) >> || TREE_CODE (desttype) == BOOLEAN_TYPE >> || TREE_CODE (desttype) == ENUMERAL_TYPE) >> { >> /* A more suitable int_mode_for_mode would return a vector >> integer mode for a vector float mode or a integer complex >> mode for a float complex mode if there isn't a regular >> integer mode covering the mode of desttype. */ >> enum machine_mode mode = int_mode_for_mode (TYPE_MODE (desttype)); >> if (mode == BLKmode) >> desttype = NULL_TREE; >> else >> desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE >> (mode), >>1); >> } >> if (FLOAT_MODE_P (TYPE_MODE (srctype)) >> || TREE_CODE (srctype) == BOOLEAN_TYPE >> || TREE_CODE (srctype) == ENUMERAL_TYPE) >> { >> enum machine_mode mode = int_mode_for_mode (TYPE_MODE (srctype)); >> if (mode == BLKmode) >> srctype = NULL_TREE; >> else >> srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE >> (mode), >> 1); >> } >> >> The failure occurs for complex long double, which we try to copy as >> a 256-bit integer type (OImode). >> >> This patch tries to do what the comment suggests by introducing a new >> form of int_mode_for_mode that replaces vector modes with vector modes >> and complex modes with complex modes. The fallback case of using a >> MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above >> 128 bits on x86_64. >> >> The question then is what to do about 128-bit types for i386. >> MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be >> used for optimisation. However, gcc.target/i386/pr49168-1.c only passes >> for -m32 -msse2 because we use int128_t to copy a float128_t. >> >> I handled that by allowing MODE_VECTOR_INT to be used instead of >> MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, >> even if the original type wasn't a vector. > > Hmm. Sounds reasonable unless there are very weird targets that > cannot efficiently load/store vectors unaligned but can handle > efficient load/store of unaligned scalars. Yeah, in general there's no guarantee that even int_mode_for_mode will return a mode with the same alignment as the original. Callers need to check that (like the memcpy folder does). >> It might be that other callers to int_mode_for_mode should use >> the new function too, but I'll look at that separately. >> >> I used the attached testcase (with printfs added to gcc) to check that >> the right modes and types were being chosen. The patch fixes the >> complex float and complex double cases, since the integer type that we >> previously picked had a larger alignment than the original complex type. > > As of complex int modes - are we positively sure that targets even > try to do sth "optimal" for loads/stores of those? Complex modes usually aren't handled directly by .md patterns, either int or float. They're really treated as a pair of values. So IMO it still makes sense to fold this case. >> One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) >> is that vectors are copied as integer vectors if the target supports >> them directly but are copied as float vectors otherwise, since in the >> latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the >> test are copied as vector floats and vector doubles both before and >> after the patch. > > That wasn't intended ... the folding should have failed if we can't > copy using an integer mode ... Does that mean that the fold give up if TYPE_MODE is BLKmode? I can do that as a separate patch if so. Thanks, Richard
Re: RFA: tweak integer type used for memcpy folding
On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford wrote: > wide-int fails to build libitm because of a bad interaction between: > > /* Keep the OI and XI modes from confusing the compiler into thinking >that these modes could actually be used for computation. They are >only holders for vectors during data movement. */ > #define MAX_BITSIZE_MODE_ANY_INT (128) > > and the memcpy folding code: > > /* Make sure we are not copying using a floating-point mode or > a type whose size possibly does not match its precision. */ > if (FLOAT_MODE_P (TYPE_MODE (desttype)) > || TREE_CODE (desttype) == BOOLEAN_TYPE > || TREE_CODE (desttype) == ENUMERAL_TYPE) > { > /* A more suitable int_mode_for_mode would return a vector > integer mode for a vector float mode or a integer complex > mode for a float complex mode if there isn't a regular > integer mode covering the mode of desttype. */ > enum machine_mode mode = int_mode_for_mode (TYPE_MODE (desttype)); > if (mode == BLKmode) > desttype = NULL_TREE; > else > desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE > (mode), >1); > } > if (FLOAT_MODE_P (TYPE_MODE (srctype)) > || TREE_CODE (srctype) == BOOLEAN_TYPE > || TREE_CODE (srctype) == ENUMERAL_TYPE) > { > enum machine_mode mode = int_mode_for_mode (TYPE_MODE (srctype)); > if (mode == BLKmode) > srctype = NULL_TREE; > else > srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), > 1); > } > > The failure occurs for complex long double, which we try to copy as > a 256-bit integer type (OImode). > > This patch tries to do what the comment suggests by introducing a new > form of int_mode_for_mode that replaces vector modes with vector modes > and complex modes with complex modes. The fallback case of using a > MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above > 128 bits on x86_64. > > The question then is what to do about 128-bit types for i386. > MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be > used for optimisation. However, gcc.target/i386/pr49168-1.c only passes > for -m32 -msse2 because we use int128_t to copy a float128_t. > > I handled that by allowing MODE_VECTOR_INT to be used instead of > MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, > even if the original type wasn't a vector. Hmm. Sounds reasonable unless there are very weird targets that cannot efficiently load/store vectors unaligned but can handle efficient load/store of unaligned scalars. > It might be that other callers to int_mode_for_mode should use > the new function too, but I'll look at that separately. > > I used the attached testcase (with printfs added to gcc) to check that > the right modes and types were being chosen. The patch fixes the > complex float and complex double cases, since the integer type that we > previously picked had a larger alignment than the original complex type. As of complex int modes - are we positively sure that targets even try to do sth "optimal" for loads/stores of those? > One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) > is that vectors are copied as integer vectors if the target supports > them directly but are copied as float vectors otherwise, since in the > latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the > test are copied as vector floats and vector doubles both before and > after the patch. That wasn't intended ... the folding should have failed if we can't copy using an integer mode ... Richard. > > Tested against trunk with x86_64-linux-gnu {,-m32}. OK to install? > > Thanks, > Richard > > > gcc/ > * machmode.h (bitwise_mode_for_size): Declare. > * stor-layout.h (bitwise_type_for_mode): Likewise. > * stor-layout.c (bitwise_mode_for_size): New function. > (bitwise_type_for_mode): Likewise. > * builtins.c (fold_builtin_memory_op): Use it instead of > int_mode_for_mode and build_nonstandard_integer_type. > > gcc/testsuite/ > * gcc.dg/memcpy-5.c: New test. > > Index: gcc/machmode.h > === > --- gcc/machmode.h 2014-04-18 11:16:12.706092658 +0100 > +++ gcc/machmode.h 2014-04-18 11:16:38.179299261 +0100 > @@ -253,6 +253,8 @@ extern enum machine_mode smallest_mode_f > > extern enum machine_mode int_mode_for_mode (enum machine_mode); > > +extern enum machine_mode bitwise_mode_for_size (enum machine_mode); > + > /* Return a mode that is suitable for representing a vector, > or BLKmode on failure. */ > > Index: gcc/stor-layout.h > ==
Re: RFA: tweak integer type used for memcpy folding
Richard Sandiford writes: > wide-int fails to build libitm because of a bad interaction between: > > /* Keep the OI and XI modes from confusing the compiler into thinking >that these modes could actually be used for computation. They are >only holders for vectors during data movement. */ > #define MAX_BITSIZE_MODE_ANY_INT (128) > > and the memcpy folding code: > > /* Make sure we are not copying using a floating-point mode or > a type whose size possibly does not match its precision. */ > if (FLOAT_MODE_P (TYPE_MODE (desttype)) > || TREE_CODE (desttype) == BOOLEAN_TYPE > || TREE_CODE (desttype) == ENUMERAL_TYPE) > { > /* A more suitable int_mode_for_mode would return a vector >integer mode for a vector float mode or a integer complex >mode for a float complex mode if there isn't a regular >integer mode covering the mode of desttype. */ > enum machine_mode mode = int_mode_for_mode (TYPE_MODE (desttype)); > if (mode == BLKmode) > desttype = NULL_TREE; > else > desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), > 1); > } > if (FLOAT_MODE_P (TYPE_MODE (srctype)) > || TREE_CODE (srctype) == BOOLEAN_TYPE > || TREE_CODE (srctype) == ENUMERAL_TYPE) > { > enum machine_mode mode = int_mode_for_mode (TYPE_MODE (srctype)); > if (mode == BLKmode) > srctype = NULL_TREE; > else > srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), > 1); > } > > The failure occurs for complex long double, which we try to copy as > a 256-bit integer type (OImode). > > This patch tries to do what the comment suggests by introducing a new > form of int_mode_for_mode that replaces vector modes with vector modes > and complex modes with complex modes. The fallback case of using a > MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above > 128 bits on x86_64. > > The question then is what to do about 128-bit types for i386. > MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be > used for optimisation. However, gcc.target/i386/pr49168-1.c only passes > for -m32 -msse2 because we use int128_t to copy a float128_t. > > I handled that by allowing MODE_VECTOR_INT to be used instead of > MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, > even if the original type wasn't a vector. > > It might be that other callers to int_mode_for_mode should use > the new function too, but I'll look at that separately. > > I used the attached testcase (with printfs added to gcc) to check that > the right modes and types were being chosen. The patch fixes the > complex float and complex double cases, since the integer type that we > previously picked had a larger alignment than the original complex type. > > One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) > is that vectors are copied as integer vectors if the target supports > them directly but are copied as float vectors otherwise, since in the > latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the > test are copied as vector floats and vector doubles both before and > after the patch. > > Tested against trunk with x86_64-linux-gnu {,-m32}. OK to install? There was a typo in the declaration of the mode->mode function, should have been as follows. Thanks, Richard gcc/ * machmode.h (bitwise_mode_for_mode): Declare. * stor-layout.h (bitwise_type_for_mode): Likewise. * stor-layout.c (bitwise_mode_for_mode): New function. (bitwise_type_for_mode): Likewise. * builtins.c (fold_builtin_memory_op): Use it instead of int_mode_for_mode and build_nonstandard_integer_type. gcc/testsuite/ * gcc.dg/memcpy-5.c: New test. Index: gcc/machmode.h === --- gcc/machmode.h 2014-04-21 10:35:17.611603989 +0100 +++ gcc/machmode.h 2014-04-21 13:58:59.403884452 +0100 @@ -253,6 +253,8 @@ extern enum machine_mode smallest_mode_f extern enum machine_mode int_mode_for_mode (enum machine_mode); +extern enum machine_mode bitwise_mode_for_mode (enum machine_mode); + /* Return a mode that is suitable for representing a vector, or BLKmode on failure. */ Index: gcc/stor-layout.h === --- gcc/stor-layout.h 2014-04-21 10:35:17.611603989 +0100 +++ gcc/stor-layout.h 2014-04-21 13:58:59.405878960 +0100 @@ -98,6 +98,8 @@ extern tree make_unsigned_type (int); mode_for_size, but is passed a tree. */ extern enum machine_mode mode_for_size_tree (const_tree, enum mode_class, int); +extern tree bitwise_type_for_mode (enum machine_mode); + /* Given a VAR_DECL, PARM_DECL or RESULT_DECL, clears the resul
RFA: tweak integer type used for memcpy folding
wide-int fails to build libitm because of a bad interaction between: /* Keep the OI and XI modes from confusing the compiler into thinking that these modes could actually be used for computation. They are only holders for vectors during data movement. */ #define MAX_BITSIZE_MODE_ANY_INT (128) and the memcpy folding code: /* Make sure we are not copying using a floating-point mode or a type whose size possibly does not match its precision. */ if (FLOAT_MODE_P (TYPE_MODE (desttype)) || TREE_CODE (desttype) == BOOLEAN_TYPE || TREE_CODE (desttype) == ENUMERAL_TYPE) { /* A more suitable int_mode_for_mode would return a vector integer mode for a vector float mode or a integer complex mode for a float complex mode if there isn't a regular integer mode covering the mode of desttype. */ enum machine_mode mode = int_mode_for_mode (TYPE_MODE (desttype)); if (mode == BLKmode) desttype = NULL_TREE; else desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1); } if (FLOAT_MODE_P (TYPE_MODE (srctype)) || TREE_CODE (srctype) == BOOLEAN_TYPE || TREE_CODE (srctype) == ENUMERAL_TYPE) { enum machine_mode mode = int_mode_for_mode (TYPE_MODE (srctype)); if (mode == BLKmode) srctype = NULL_TREE; else srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1); } The failure occurs for complex long double, which we try to copy as a 256-bit integer type (OImode). This patch tries to do what the comment suggests by introducing a new form of int_mode_for_mode that replaces vector modes with vector modes and complex modes with complex modes. The fallback case of using a MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above 128 bits on x86_64. The question then is what to do about 128-bit types for i386. MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be used for optimisation. However, gcc.target/i386/pr49168-1.c only passes for -m32 -msse2 because we use int128_t to copy a float128_t. I handled that by allowing MODE_VECTOR_INT to be used instead of MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, even if the original type wasn't a vector. It might be that other callers to int_mode_for_mode should use the new function too, but I'll look at that separately. I used the attached testcase (with printfs added to gcc) to check that the right modes and types were being chosen. The patch fixes the complex float and complex double cases, since the integer type that we previously picked had a larger alignment than the original complex type. One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) is that vectors are copied as integer vectors if the target supports them directly but are copied as float vectors otherwise, since in the latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the test are copied as vector floats and vector doubles both before and after the patch. Tested against trunk with x86_64-linux-gnu {,-m32}. OK to install? Thanks, Richard gcc/ * machmode.h (bitwise_mode_for_size): Declare. * stor-layout.h (bitwise_type_for_mode): Likewise. * stor-layout.c (bitwise_mode_for_size): New function. (bitwise_type_for_mode): Likewise. * builtins.c (fold_builtin_memory_op): Use it instead of int_mode_for_mode and build_nonstandard_integer_type. gcc/testsuite/ * gcc.dg/memcpy-5.c: New test. Index: gcc/machmode.h === --- gcc/machmode.h 2014-04-18 11:16:12.706092658 +0100 +++ gcc/machmode.h 2014-04-18 11:16:38.179299261 +0100 @@ -253,6 +253,8 @@ extern enum machine_mode smallest_mode_f extern enum machine_mode int_mode_for_mode (enum machine_mode); +extern enum machine_mode bitwise_mode_for_size (enum machine_mode); + /* Return a mode that is suitable for representing a vector, or BLKmode on failure. */ Index: gcc/stor-layout.h === --- gcc/stor-layout.h 2014-04-18 11:16:12.707092667 +0100 +++ gcc/stor-layout.h 2014-04-18 11:16:12.830093665 +0100 @@ -98,6 +98,8 @@ extern tree make_unsigned_type (int); mode_for_size, but is passed a tree. */ extern enum machine_mode mode_for_size_tree (const_tree, enum mode_class, int); +extern tree bitwise_type_for_mode (enum machine_mode); + /* Given a VAR_DECL, PARM_DECL or RESULT_DECL, clears the results of a previous call to layout_decl and calls it again. */ extern void relayout_decl (tree); Index: gcc/stor-layout.c === --- gcc/stor-