Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
On 21 Feb 18:35, Uros Bizjak wrote: On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? Please submit the patch anyway. We can relax release constraints on non-algorithmic patch a bit, weighting in benefits of having gcc release that fully conforms to some published specification. Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, and uses them for prefetchwt1 instruction. Bootstraps/passes testing. Ok for trunk? * gcc.target/i386/avx-1.c: Update __builtin_prefetch. Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and g++.dg/other/i386-{2,3} and new options to gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and repost the patch. I've added new switch to those tests. However when I add prefetchwt1 to pragma GCC target (sse) sse-22a.c test fails with: pmmintrin.h: In function ‘_mm_loaddup_pd’: emmintrin.h:119:1: error: inlining failed in call to always_inline ‘_mm_load1_pd’: target specific option mismatch I've checked and this isn't a problem with prefetchwt1. I get the same error when I add any other option (e. g. sha) to #pragma GCC target (sse). So I haven't added anything there. As that was the only fail, I'm reposting this patch. ChangeLog for GCC: * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. (ix86_handle_option): Handle OPT_mprefetchwt1. * config/i386/cpuid.h (bit_PREFETCHWT1): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect PREFETCHWT1 CPUID. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_PREFETCHWT1. * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. (PTA_PREFETCHWT1): New. (ix86_option_override_internal): Handle PTA_PREFETCHWT1. (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): New. * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 (*prefetch_avx512pf_mode_: Change into ... (*prefetch_prefetchwt1_mode: This. * config/i386/i386.opt (mprefetchwt1): New. * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. (_mm_prefetch): Handle intent to write. * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. ChangeLog for tests: * gcc.target/i386/avx-1.c: Update __builtin_prefetch. * gcc.target/i386/prefetchwt1-1.c: New. * g++.dg/other/i386-2.C: Add new option. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/sse-12.c: Ditto. * gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option. * gcc.target/i386/sse-22.c: Add new option. * gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option. --- gcc/common/config/i386/i386-common.c | 15 +++ gcc/config/i386/cpuid.h | 4 gcc/config/i386/driver-i386.c | 7 +-- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c| 6 ++ gcc/config/i386/i386.h| 2 ++ gcc/config/i386/i386.md | 13 ++--- gcc/config/i386/i386.opt | 4 gcc/config/i386/xmmintrin.h | 6 -- gcc/doc/invoke.texi | 4 +++- gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-1.c | 2 +- gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++ gcc/testsuite/gcc.target/i386/sse-12.c| 2 +- gcc/testsuite/gcc.target/i386/sse-13.c| 4 ++-- gcc/testsuite/gcc.target/i386/sse-14.c| 2 +- gcc/testsuite/gcc.target/i386/sse-22.c| 2 +- gcc/testsuite/gcc.target/i386/sse-23.c| 4 ++-- 19 files changed, 75 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index b7f9ff6..a6ab555 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW #define OPTION_MASK_ISA_RDSEED_SET
Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
On Tue, Feb 25, 2014 at 10:13 AM, Ilya Tocar tocarip.in...@gmail.com wrote: Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? Please submit the patch anyway. We can relax release constraints on non-algorithmic patch a bit, weighting in benefits of having gcc release that fully conforms to some published specification. Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, and uses them for prefetchwt1 instruction. Bootstraps/passes testing. Ok for trunk? * gcc.target/i386/avx-1.c: Update __builtin_prefetch. Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and g++.dg/other/i386-{2,3} and new options to gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and repost the patch. I've added new switch to those tests. However when I add prefetchwt1 to pragma GCC target (sse) sse-22a.c test fails with: pmmintrin.h: In function '_mm_loaddup_pd': emmintrin.h:119:1: error: inlining failed in call to always_inline '_mm_load1_pd': target specific option mismatch I've checked and this isn't a problem with prefetchwt1. I get the same error when I add any other option (e. g. sha) to #pragma GCC target (sse). So I haven't added anything there. As that was the only fail, I'm reposting this patch. ChangeLog for GCC: * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. (ix86_handle_option): Handle OPT_mprefetchwt1. * config/i386/cpuid.h (bit_PREFETCHWT1): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect PREFETCHWT1 CPUID. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_PREFETCHWT1. * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. (PTA_PREFETCHWT1): New. (ix86_option_override_internal): Handle PTA_PREFETCHWT1. (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): New. * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 (*prefetch_avx512pf_mode_: Change into ... (*prefetch_prefetchwt1_mode: This. * config/i386/i386.opt (mprefetchwt1): New. * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. (_mm_prefetch): Handle intent to write. * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. ChangeLog for tests: * gcc.target/i386/avx-1.c: Update __builtin_prefetch. * gcc.target/i386/prefetchwt1-1.c: New. * g++.dg/other/i386-2.C: Add new option. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/sse-12.c: Ditto. * gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option. * gcc.target/i386/sse-22.c: Add new option. * gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option. The patch is OK for mainline. Thanks, Uros.
Re: [PATCH][i386][AVX512] Match latest spec.
On 20 Feb 17:23, Uros Bizjak wrote: On Thu, Feb 20, 2014 at 4:39 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 2)Currently for scatter/gather prefetches intrinsics we accept 1 as possible hint parameter. This is consistent with ICC. However as GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as hint. We can either change gather prefetches to accept 1 instead of 3 and hope that everyone will use _MM_HINT_T0 and not the raw value, or we can change _MM_HINT_T0 to be consistent with ICC. What solution do you prefer? Builtins, including __builtin_prefetch, are considered as internal implementation detail, so we can pass to them wharever we like. The published interface is in *.h files, and this includes _MM_HINT_T0. For now, I suggest to change prefetches, so they will accept _MM_HINT_T0, as this is the least invasive change. Patch bellow changes prefetches to accept 3 (_MM_HINT_T0), and replaces all hint's values in tests with corresponding _MM_HINT. Testing passes. Ok for trunk? ChangeLog: 2014-02-25 Ilya Tocar ilya.to...@intel.com * common/config/i386/predicates.md (const1256_operand): Remove. (const2356_operand): New. (const_1_to_2_operand): Remove. * config/i386/sse.md (avx512pf_gatherpfmodesf): Change hint value. (*avx512pf_gatherpfmodesf_mask): Ditto. (*avx512pf_gatherpfmodesf): Ditto. (avx512pf_gatherpfmodedf): Ditto. (*avx512pf_gatherpfmodedf_mask): Ditto. (*avx512pf_gatherpfmodedf): Ditto. (avx512pf_scatterpfmodesf): Ditto. (*avx512pf_scatterpfmodesf_mask): Ditto. (*avx512pf_scatterpfmodesf): Ditto. (avx512pf_scatterpfmodedf): Ditto. (*avx512pf_scatterpfmodedf_mask): Ditto. (*avx512pf_scatterpfmodedf): Ditto. * common/config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET0. And for tests: 2014-02-25 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx-1.c: Use _MM_HINT_T0 in __builtin_ia32_gatherpfdps, __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdps, __builtin_ia32_scatterpfqps, __builtin_ia32_gatherpfdpd, __builtin_ia32_gatherpfqpd, __builtin_ia32_scatterpfdpd, __builtin_ia32_scatterpfqpd. * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Use enum values instead of raw ints. * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. --- gcc/config/i386/predicates.md | 11 ++ gcc/config/i386/sse.md | 40 +++--- gcc/config/i386/xmmintrin.h| 1 + gcc/testsuite/gcc.target/i386/avx-1.c | 16 - .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf0dps-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf0qps-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1dps-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1qps-1.c | 2 +- .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf0dps-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf0qps-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1dps-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1qps-1.c| 4 +-- gcc/testsuite/gcc.target/i386/sse-14.c | 16 -
Re: [PATCH][i386][AVX512] Match latest spec.
On Tue, Feb 25, 2014 at 5:04 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 2)Currently for scatter/gather prefetches intrinsics we accept 1 as possible hint parameter. This is consistent with ICC. However as GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as hint. We can either change gather prefetches to accept 1 instead of 3 and hope that everyone will use _MM_HINT_T0 and not the raw value, or we can change _MM_HINT_T0 to be consistent with ICC. What solution do you prefer? Builtins, including __builtin_prefetch, are considered as internal implementation detail, so we can pass to them wharever we like. The published interface is in *.h files, and this includes _MM_HINT_T0. For now, I suggest to change prefetches, so they will accept _MM_HINT_T0, as this is the least invasive change. Patch bellow changes prefetches to accept 3 (_MM_HINT_T0), and replaces all hint's values in tests with corresponding _MM_HINT. Testing passes. Ok for trunk? ChangeLog: 2014-02-25 Ilya Tocar ilya.to...@intel.com * common/config/i386/predicates.md (const1256_operand): Remove. (const2356_operand): New. (const_1_to_2_operand): Remove. * config/i386/sse.md (avx512pf_gatherpfmodesf): Change hint value. (*avx512pf_gatherpfmodesf_mask): Ditto. (*avx512pf_gatherpfmodesf): Ditto. (avx512pf_gatherpfmodedf): Ditto. (*avx512pf_gatherpfmodedf_mask): Ditto. (*avx512pf_gatherpfmodedf): Ditto. (avx512pf_scatterpfmodesf): Ditto. (*avx512pf_scatterpfmodesf_mask): Ditto. (*avx512pf_scatterpfmodesf): Ditto. (avx512pf_scatterpfmodedf): Ditto. (*avx512pf_scatterpfmodedf_mask): Ditto. (*avx512pf_scatterpfmodedf): Ditto. * common/config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET0. And for tests: 2014-02-25 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx-1.c: Use _MM_HINT_T0 in __builtin_ia32_gatherpfdps, __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdps, __builtin_ia32_scatterpfqps, __builtin_ia32_gatherpfdpd, __builtin_ia32_gatherpfqpd, __builtin_ia32_scatterpfdpd, __builtin_ia32_scatterpfqpd. * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Use enum values instead of raw ints. * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. OK for mainline with a small change below. --- a/gcc/config/i386/xmmintrin.h +++ b/gcc/config/i386/xmmintrin.h @@ -55,6 +55,7 @@ enum _mm_hint { /* _MM_HINT_ET is _MM_HINT_T with set 3rd bit. */ _MM_HINT_ET1 = 6, + _MM_HINT_ET0 = 5, Please put new hint above HINT_ET1, to be consistent with the part below. _MM_HINT_T0 = 3, _MM_HINT_T1 = 2, _MM_HINT_T2 = 1, Uros.
Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? Please submit the patch anyway. We can relax release constraints on non-algorithmic patch a bit, weighting in benefits of having gcc release that fully conforms to some published specification. Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, and uses them for prefetchwt1 instruction. Bootstraps/passes testing. Ok for trunk? ChangeLog: 2014-02-21 Ilya Tocar ilya.to...@intel.com * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. (ix86_handle_option): Handle OPT_mprefetchwt1. * config/i386/cpuid.h (bit_PREFETCHWT1): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect PREFETCHWT1 CPUID. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_PREFETCHWT1. * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. (PTA_PREFETCHWT1): New. (ix86_option_override_internal): Handle PTA_PREFETCHWT1. (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): New. * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 (*prefetch_avx512pf_mode_: Change into ... (*prefetch_prefetchwt1_mode: This. * config/i386/i386.opt (mprefetchwt1): New. * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. (_mm_prefetch): Handle intent to write. * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. And for tests: 2014-02-22 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx-1.c: Update __builtin_prefetch. * gcc.target/i386/prefetchwt1-1.c: New. * gcc.target/i386/sse-13.c: Update __builtin_prefetch. * gcc.target/i386/sse-23.c: Ditto. --- gcc/common/config/i386/i386-common.c | 15 +++ gcc/config/i386/cpuid.h | 4 gcc/config/i386/driver-i386.c | 7 +-- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c| 6 ++ gcc/config/i386/i386.h| 2 ++ gcc/config/i386/i386.md | 13 ++--- gcc/config/i386/i386.opt | 4 gcc/config/i386/xmmintrin.h | 6 -- gcc/doc/invoke.texi | 4 +++- gcc/testsuite/gcc.target/i386/avx-1.c | 2 +- gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++ gcc/testsuite/gcc.target/i386/sse-13.c| 2 +- gcc/testsuite/gcc.target/i386/sse-23.c| 2 +- 14 files changed, 68 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index b7f9ff6..a6ab555 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED #define OPTION_MASK_ISA_ADX_SET OPTION_MASK_ISA_ADX +#define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -154,6 +155,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED #define OPTION_MASK_ISA_ADX_UNSET OPTION_MASK_ISA_ADX +#define OPTION_MASK_ISA_PREFETCHWT1_UNSET OPTION_MASK_ISA_PREFETCHWT1 /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -757,6 +759,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; +case OPT_mprefetchwt1: + if (value) + { + opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1_SET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_SET; + } + else + { + opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PREFETCHWT1_UNSET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_UNSET; + } + return true; + /* Comes from final.c -- no real reason to change it. */ #define MAX_CODE_ALIGN 16 diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index c7a53dd..8c323ae 100644 ---
Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? Please submit the patch anyway. We can relax release constraints on non-algorithmic patch a bit, weighting in benefits of having gcc release that fully conforms to some published specification. Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, and uses them for prefetchwt1 instruction. Bootstraps/passes testing. Ok for trunk? ChangeLog: 2014-02-21 Ilya Tocar ilya.to...@intel.com * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. (ix86_handle_option): Handle OPT_mprefetchwt1. * config/i386/cpuid.h (bit_PREFETCHWT1): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect PREFETCHWT1 CPUID. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_PREFETCHWT1. * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. (PTA_PREFETCHWT1): New. (ix86_option_override_internal): Handle PTA_PREFETCHWT1. (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): New. * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 (*prefetch_avx512pf_mode_: Change into ... (*prefetch_prefetchwt1_mode: This. * config/i386/i386.opt (mprefetchwt1): New. * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. (_mm_prefetch): Handle intent to write. * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. And for tests: 2014-02-22 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx-1.c: Update __builtin_prefetch. * gcc.target/i386/prefetchwt1-1.c: New. * gcc.target/i386/sse-13.c: Update __builtin_prefetch. * gcc.target/i386/sse-23.c: Ditto. Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and g++.dg/other/i386-{2,3} and new options to gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and repost the patch. @@ -17867,8 +17867,8 @@ supported by SSE counterpart or the SSE prefetch is not available (K6 machines). Otherwise use SSE prefetch as it allows specifying of locality. */ - if (TARGET_AVX512PF write) -operands[2] = const1_rtx; + if (TARGET_PREFETCHWT1 write) +operands[2] = GEN_INT (2); you can use const2_rtx here. Uros.
[PATCH][i386][AVX512] Match latest spec.
Hi, Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. This patch fixes first of them: Vptestnmd and vptestnmq instructions now have CPUID AVX512F instead of AVX512CD. This path changes thier CPUID accordingly. However I have a question about other changes: 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? 2)Currently for scatter/gather prefetches intrinsics we accept 1 as possible hint parameter. This is consistent with ICC. However as GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as hint. We can either change gather prefetches to accept 1 instead of 3 and hope that everyone will use _MM_HINT_T0 and not the raw value, or we can change _MM_HINT_T0 to be consistent with ICC. What solution do you prefer? Patch bellow changes CPUID of vptestnmq/vptestnmd and changes some bogus %v to v. Bootstraps, passes make check. Ok for trunk? ChangeLog 2014-02-20 Ilya Tocar ilya.to...@intel.com * config/i386/avx512fintrin.h (_mm512_testn_epi32_mask), (_mm512_mask_testn_epi32_mask), (_mm512_testn_epi64_mask), (_mm512_mask_testn_epi64_mask): Move to ... * config/i386/avx512cdintrin.h: Here. * config/i386/i386.c (bdesc_args): Change MASK_ISA for testnm. * config/i386/sse.md (avx512f_vmscalefmoderound_name): Remove %. (avx512f_scalefmodemask_nameround_name): Ditto. (avx512f_testnmmode3mask_scalar_merge_name): Change conditon to TARGET_AVX512F from TARGET_AVX512CD. And for testsuite 2014-02-20 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx512cd-vptestnmd-1.c: Change into ... * gcc.target/i386/avx512f-vptestnmd-1.c: This. * gcc.target/i386/avx512cd-vptestnmq-1.c: Change into ... * gcc.target/i386/avx512f-vptestnmq-1.c: This. * gcc.target/i386/avx512cd-vptestnmd-2.c: Change into ... * gcc.target/i386/avx512f-vptestnmd-2.c: This. * gcc.target/i386/avx512cd-vptestnmq-2.c: Change into ... * gcc.target/i386/avx512f-vptestnmq-2.c: This. --- gcc/config/i386/avx512cdintrin.h | 34 -- gcc/config/i386/avx512fintrin.h| 34 ++ gcc/config/i386/i386.c | 4 +- gcc/config/i386/sse.md | 8 ++-- .../gcc.target/i386/avx512cd-vptestnmd-1.c | 16 --- .../gcc.target/i386/avx512cd-vptestnmd-2.c | 52 -- .../gcc.target/i386/avx512cd-vptestnmq-1.c | 16 --- .../gcc.target/i386/avx512cd-vptestnmq-2.c | 52 -- .../gcc.target/i386/avx512f-vptestnmd-1.c | 16 +++ .../gcc.target/i386/avx512f-vptestnmd-2.c | 52 ++ .../gcc.target/i386/avx512f-vptestnmq-1.c | 16 +++ .../gcc.target/i386/avx512f-vptestnmq-2.c | 52 ++ 12 files changed, 176 insertions(+), 176 deletions(-) delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmd-1.c delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmd-2.c delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmq-1.c delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmq-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmd-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmq-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmq-2.c diff --git a/gcc/config/i386/avx512cdintrin.h b/gcc/config/i386/avx512cdintrin.h index 3935b77..a4939f7a 100644 --- a/gcc/config/i386/avx512cdintrin.h +++ b/gcc/config/i386/avx512cdintrin.h @@ -176,40 +176,6 @@ _mm512_broadcastmw_epi32 (__mmask16 __A) return (__m512i) __builtin_ia32_broadcastmw512 (__A); } -extern __inline __mmask16 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_testn_epi32_mask (__m512i __A, __m512i __B) -{ - return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A, -(__v16si) __B, -(__mmask16) -1); -} - -extern __inline __mmask16 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B) -{ - return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A, -(__v16si) __B, __U); -} - -extern __inline __mmask8 -__attribute__
Re: [PATCH][i386][AVX512] Match latest spec.
On Thu, Feb 20, 2014 at 4:39 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. This patch fixes first of them: Vptestnmd and vptestnmq instructions now have CPUID AVX512F instead of AVX512CD. This path changes thier CPUID accordingly. However I have a question about other changes: 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? Please submit the patch anyway. We can relax release constraints on non-algorithmic patch a bit, weighting in benefits of having gcc release that fully conforms to some published specification. 2)Currently for scatter/gather prefetches intrinsics we accept 1 as possible hint parameter. This is consistent with ICC. However as GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as hint. We can either change gather prefetches to accept 1 instead of 3 and hope that everyone will use _MM_HINT_T0 and not the raw value, or we can change _MM_HINT_T0 to be consistent with ICC. What solution do you prefer? Builtins, including __builtin_prefetch, are considered as internal implementation detail, so we can pass to them wharever we like. The published interface is in *.h files, and this includes _MM_HINT_T0. For now, I suggest to change prefetches, so they will accept _MM_HINT_T0, as this is the least invasive change. FWIW, we can change _MM_HINT_T0 in the future, as intrinsic headers correspond to the compiler, but it will raise maintenance burden (you can't just recompile sources involving builtins with different versions of the compiler anymore due to difference in constant arguments). Patch bellow changes CPUID of vptestnmq/vptestnmd and changes some bogus %v to v. Bootstraps, passes make check. Ok for trunk? ChangeLog 2014-02-20 Ilya Tocar ilya.to...@intel.com * config/i386/avx512fintrin.h (_mm512_testn_epi32_mask), (_mm512_mask_testn_epi32_mask), (_mm512_testn_epi64_mask), (_mm512_mask_testn_epi64_mask): Move to ... * config/i386/avx512cdintrin.h: Here. * config/i386/i386.c (bdesc_args): Change MASK_ISA for testnm. * config/i386/sse.md (avx512f_vmscalefmoderound_name): Remove %. (avx512f_scalefmodemask_nameround_name): Ditto. (avx512f_testnmmode3mask_scalar_merge_name): Change conditon to TARGET_AVX512F from TARGET_AVX512CD. And for testsuite 2014-02-20 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx512cd-vptestnmd-1.c: Change into ... * gcc.target/i386/avx512f-vptestnmd-1.c: This. * gcc.target/i386/avx512cd-vptestnmq-1.c: Change into ... * gcc.target/i386/avx512f-vptestnmq-1.c: This. * gcc.target/i386/avx512cd-vptestnmd-2.c: Change into ... * gcc.target/i386/avx512f-vptestnmd-2.c: This. * gcc.target/i386/avx512cd-vptestnmq-2.c: Change into ... * gcc.target/i386/avx512f-vptestnmq-2.c: This. This is OK for mainline. Thanks, Uros.