Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-25 Thread Ilya Tocar
On 21 Feb 18:35, Uros Bizjak wrote:
 On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
   Latest version of AVX512 spec
   http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
   Has a few changes.
  
   1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
   We can either support new CPUID or disable PREFETCHWT1 from generating,
   without removing code, and enable it in 4.9.1/latest version.
   I am not sure that adding new -m flag and related stuff this late
   is a good idea. Should still add it?
 
  Please submit the patch anyway. We can relax release constraints on
  non-algorithmic patch a bit, weighting in benefits of having gcc
  release that fully conforms to some published specification.
 
  Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
  and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
  Ok for trunk?
 

  * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
 
 Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
 g++.dg/other/i386-{2,3} and new options to
 gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
 repost the patch.


I've added new switch to those tests. However when I add prefetchwt1
to pragma GCC target (sse) sse-22a.c test fails with:
pmmintrin.h: In function ‘_mm_loaddup_pd’:
emmintrin.h:119:1: error: inlining failed in call to always_inline
‘_mm_load1_pd’: target specific option mismatch

I've checked and this isn't a problem with prefetchwt1. I get the same
error when I add any other option (e. g. sha) to #pragma GCC target (sse).
So I haven't added anything there. As that was the only fail,
I'm reposting this patch.

ChangeLog for GCC:

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
(OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
(ix86_handle_option): Handle OPT_mprefetchwt1.
* config/i386/cpuid.h (bit_PREFETCHWT1): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PREFETCHWT1 CPUID.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_PREFETCHWT1.
* config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
(PTA_PREFETCHWT1): New.
(ix86_option_override_internal): Handle PTA_PREFETCHWT1.
(ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
* config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
  New.
* config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
(*prefetch_avx512pf_mode_: Change into ...
 (*prefetch_prefetchwt1_mode: This.
* config/i386/i386.opt (mprefetchwt1): New.
* config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
(_mm_prefetch): Handle intent to write.
* doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.

ChangeLog for tests:

* gcc.target/i386/avx-1.c: Update __builtin_prefetch.
* gcc.target/i386/prefetchwt1-1.c: New.
* g++.dg/other/i386-2.C: Add new option.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/sse-12.c: Ditto.
* gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option.
* gcc.target/i386/sse-22.c: Add new option.
* gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option.

---
 gcc/common/config/i386/i386-common.c  | 15 +++
 gcc/config/i386/cpuid.h   |  4 
 gcc/config/i386/driver-i386.c |  7 +--
 gcc/config/i386/i386-c.c  |  2 ++
 gcc/config/i386/i386.c|  6 ++
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/i386.md   | 13 ++---
 gcc/config/i386/i386.opt  |  4 
 gcc/config/i386/xmmintrin.h   |  6 --
 gcc/doc/invoke.texi   |  4 +++-
 gcc/testsuite/g++.dg/other/i386-2.C   |  2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |  2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/sse-12.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|  4 ++--
 gcc/testsuite/gcc.target/i386/sse-14.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  4 ++--
 19 files changed, 75 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index b7f9ff6..a6ab555 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET 

Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-25 Thread Uros Bizjak
On Tue, Feb 25, 2014 at 10:13 AM, Ilya Tocar tocarip.in...@gmail.com wrote:

   Latest version of AVX512 spec
   http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
   Has a few changes.
  
   1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
   We can either support new CPUID or disable PREFETCHWT1 from generating,
   without removing code, and enable it in 4.9.1/latest version.
   I am not sure that adding new -m flag and related stuff this late
   is a good idea. Should still add it?
 
  Please submit the patch anyway. We can relax release constraints on
  non-algorithmic patch a bit, weighting in benefits of having gcc
  release that fully conforms to some published specification.
 
  Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
  and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
  Ok for trunk?
 

  * gcc.target/i386/avx-1.c: Update __builtin_prefetch.

 Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
 g++.dg/other/i386-{2,3} and new options to
 gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
 repost the patch.


 I've added new switch to those tests. However when I add prefetchwt1
 to pragma GCC target (sse) sse-22a.c test fails with:
 pmmintrin.h: In function '_mm_loaddup_pd':
 emmintrin.h:119:1: error: inlining failed in call to always_inline
 '_mm_load1_pd': target specific option mismatch

 I've checked and this isn't a problem with prefetchwt1. I get the same
 error when I add any other option (e. g. sha) to #pragma GCC target (sse).
 So I haven't added anything there. As that was the only fail,
 I'm reposting this patch.

 ChangeLog for GCC:

 * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
 (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
 (ix86_handle_option): Handle OPT_mprefetchwt1.
 * config/i386/cpuid.h (bit_PREFETCHWT1): New.
 * config/i386/driver-i386.c (host_detect_local_cpu): Detect
 PREFETCHWT1 CPUID.
 * config/i386/i386-c.c (ix86_target_macros_internal): Handle
 OPTION_MASK_ISA_PREFETCHWT1.
 * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
 (PTA_PREFETCHWT1): New.
 (ix86_option_override_internal): Handle PTA_PREFETCHWT1.
 (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
 * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
   New.
 * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
 (*prefetch_avx512pf_mode_: Change into ...
  (*prefetch_prefetchwt1_mode: This.
 * config/i386/i386.opt (mprefetchwt1): New.
 * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
 (_mm_prefetch): Handle intent to write.
 * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.

 ChangeLog for tests:

 * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
 * gcc.target/i386/prefetchwt1-1.c: New.
 * g++.dg/other/i386-2.C: Add new option.
 * g++.dg/other/i386-3.C: Ditto.
 * gcc.target/i386/sse-12.c: Ditto.
 * gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option.
 * gcc.target/i386/sse-22.c: Add new option.
 * gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option.

The patch is OK for mainline.

Thanks,
Uros.


Re: [PATCH][i386][AVX512] Match latest spec.

2014-02-25 Thread Ilya Tocar
On 20 Feb 17:23, Uros Bizjak wrote:
 On Thu, Feb 20, 2014 at 4:39 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.
 
  2)Currently for scatter/gather prefetches intrinsics we accept 1 as
  possible hint parameter. This is consistent with ICC. However as
  GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC
  (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces
  are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as
  hint. We can either change gather prefetches to accept 1 instead of 3 and
  hope that everyone will use _MM_HINT_T0 and not the raw value, or we can
  change _MM_HINT_T0 to be consistent with ICC. What solution do you
  prefer?
 
 Builtins, including __builtin_prefetch, are considered as internal
 implementation detail, so we can pass to them wharever we like. The
 published interface is in *.h files, and this includes _MM_HINT_T0.
 For now, I suggest to change prefetches, so they will accept
 _MM_HINT_T0, as this is the least invasive change.

Patch bellow changes prefetches to accept 3 (_MM_HINT_T0),
and replaces all hint's values in tests with corresponding _MM_HINT.
Testing passes. Ok for trunk?

ChangeLog:

2014-02-25  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/predicates.md (const1256_operand): Remove.
(const2356_operand): New.
(const_1_to_2_operand): Remove.
* config/i386/sse.md (avx512pf_gatherpfmodesf): Change hint value.
(*avx512pf_gatherpfmodesf_mask): Ditto.
(*avx512pf_gatherpfmodesf): Ditto.
(avx512pf_gatherpfmodedf): Ditto.
(*avx512pf_gatherpfmodedf_mask): Ditto.
(*avx512pf_gatherpfmodedf): Ditto.
(avx512pf_scatterpfmodesf): Ditto.
(*avx512pf_scatterpfmodesf_mask): Ditto.
(*avx512pf_scatterpfmodesf): Ditto.
(avx512pf_scatterpfmodedf): Ditto.
(*avx512pf_scatterpfmodedf_mask): Ditto.
(*avx512pf_scatterpfmodedf): Ditto.
* common/config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET0.

And for tests:

2014-02-25  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx-1.c: Use _MM_HINT_T0 in 
__builtin_ia32_gatherpfdps,
__builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdps,
__builtin_ia32_scatterpfqps, __builtin_ia32_gatherpfdpd,
__builtin_ia32_gatherpfqpd, __builtin_ia32_scatterpfdpd,
__builtin_ia32_scatterpfqpd.
* gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Use enum values instead
of raw ints.
* gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

---
 gcc/config/i386/predicates.md  | 11 ++
 gcc/config/i386/sse.md | 40 +++---
 gcc/config/i386/xmmintrin.h|  1 +
 gcc/testsuite/gcc.target/i386/avx-1.c  | 16 -
 .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf0dps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf0qps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1dps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1qps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf0dps-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf0qps-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1dps-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1qps-1.c|  4 +--
 gcc/testsuite/gcc.target/i386/sse-14.c | 16 -
 

Re: [PATCH][i386][AVX512] Match latest spec.

2014-02-25 Thread Uros Bizjak
On Tue, Feb 25, 2014 at 5:04 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.

  2)Currently for scatter/gather prefetches intrinsics we accept 1 as
  possible hint parameter. This is consistent with ICC. However as
  GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC
  (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces
  are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as
  hint. We can either change gather prefetches to accept 1 instead of 3 and
  hope that everyone will use _MM_HINT_T0 and not the raw value, or we can
  change _MM_HINT_T0 to be consistent with ICC. What solution do you
  prefer?

 Builtins, including __builtin_prefetch, are considered as internal
 implementation detail, so we can pass to them wharever we like. The
 published interface is in *.h files, and this includes _MM_HINT_T0.
 For now, I suggest to change prefetches, so they will accept
 _MM_HINT_T0, as this is the least invasive change.

 Patch bellow changes prefetches to accept 3 (_MM_HINT_T0),
 and replaces all hint's values in tests with corresponding _MM_HINT.
 Testing passes. Ok for trunk?

 ChangeLog:

 2014-02-25  Ilya Tocar  ilya.to...@intel.com

 * common/config/i386/predicates.md (const1256_operand): Remove.
 (const2356_operand): New.
 (const_1_to_2_operand): Remove.
 * config/i386/sse.md (avx512pf_gatherpfmodesf): Change hint value.
 (*avx512pf_gatherpfmodesf_mask): Ditto.
 (*avx512pf_gatherpfmodesf): Ditto.
 (avx512pf_gatherpfmodedf): Ditto.
 (*avx512pf_gatherpfmodedf_mask): Ditto.
 (*avx512pf_gatherpfmodedf): Ditto.
 (avx512pf_scatterpfmodesf): Ditto.
 (*avx512pf_scatterpfmodesf_mask): Ditto.
 (*avx512pf_scatterpfmodesf): Ditto.
 (avx512pf_scatterpfmodedf): Ditto.
 (*avx512pf_scatterpfmodedf_mask): Ditto.
 (*avx512pf_scatterpfmodedf): Ditto.
 * common/config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET0.

 And for tests:

 2014-02-25  Ilya Tocar  ilya.to...@intel.com

 * gcc.target/i386/avx-1.c: Use _MM_HINT_T0 in 
 __builtin_ia32_gatherpfdps,
 __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdps,
 __builtin_ia32_scatterpfqps, __builtin_ia32_gatherpfdpd,
 __builtin_ia32_gatherpfqpd, __builtin_ia32_scatterpfdpd,
 __builtin_ia32_scatterpfqpd.
 * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Use enum values instead
 of raw ints.
 * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto.
 * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto.
 * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto.
 * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto.
 * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto.
 * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto.
 * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto.
 * gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto.
 * gcc.target/i386/sse-14.c: Ditto.
 * gcc.target/i386/sse-22.c: Ditto.
 * gcc.target/i386/sse-23.c: Ditto.

OK for mainline with a small change below.

 --- a/gcc/config/i386/xmmintrin.h
 +++ b/gcc/config/i386/xmmintrin.h
 @@ -55,6 +55,7 @@ enum _mm_hint
  {
/* _MM_HINT_ET is _MM_HINT_T with set 3rd bit.  */
_MM_HINT_ET1 = 6,
 +  _MM_HINT_ET0 = 5,

Please put new hint above HINT_ET1, to be consistent with the part below.

_MM_HINT_T0 = 3,
_MM_HINT_T1 = 2,
_MM_HINT_T2 = 1,

Uros.


Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-21 Thread Ilya Tocar
  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.
 
  1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
  We can either support new CPUID or disable PREFETCHWT1 from generating,
  without removing code, and enable it in 4.9.1/latest version.
  I am not sure that adding new -m flag and related stuff this late
  is a good idea. Should still add it?
 
 Please submit the patch anyway. We can relax release constraints on
 non-algorithmic patch a bit, weighting in benefits of having gcc
 release that fully conforms to some published specification.

Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
Ok for trunk?

ChangeLog:

2014-02-21  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
(OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
(ix86_handle_option): Handle OPT_mprefetchwt1.
* config/i386/cpuid.h (bit_PREFETCHWT1): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PREFETCHWT1 CPUID.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_PREFETCHWT1.
* config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
(PTA_PREFETCHWT1): New.
(ix86_option_override_internal): Handle PTA_PREFETCHWT1.
(ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
* config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
  New.
* config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
(*prefetch_avx512pf_mode_: Change into ...
 (*prefetch_prefetchwt1_mode: This.
* config/i386/i386.opt (mprefetchwt1): New.
* config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
(_mm_prefetch): Handle intent to write.
* doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. 

And for tests:

2014-02-22  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx-1.c: Update __builtin_prefetch.
* gcc.target/i386/prefetchwt1-1.c: New.
* gcc.target/i386/sse-13.c: Update __builtin_prefetch.
* gcc.target/i386/sse-23.c: Ditto. 

---
 gcc/common/config/i386/i386-common.c  | 15 +++
 gcc/config/i386/cpuid.h   |  4 
 gcc/config/i386/driver-i386.c |  7 +--
 gcc/config/i386/i386-c.c  |  2 ++
 gcc/config/i386/i386.c|  6 ++
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/i386.md   | 13 ++---
 gcc/config/i386/i386.opt  |  4 
 gcc/config/i386/xmmintrin.h   |  6 --
 gcc/doc/invoke.texi   |  4 +++-
 gcc/testsuite/gcc.target/i386/avx-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/sse-13.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  2 +-
 14 files changed, 68 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index b7f9ff6..a6ab555 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_SET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -154,6 +155,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_UNSET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_UNSET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
as -mno-sse4.1. */
@@ -757,6 +759,19 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
+case OPT_mprefetchwt1:
+  if (value)
+   {
+ opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+   }
+  else
+   {
+ opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+   }
+  return true;
+
   /* Comes from final.c -- no real reason to change it.  */
 #define MAX_CODE_ALIGN 16
 
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index c7a53dd..8c323ae 100644
--- 

Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-21 Thread Uros Bizjak
On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.
 
  1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
  We can either support new CPUID or disable PREFETCHWT1 from generating,
  without removing code, and enable it in 4.9.1/latest version.
  I am not sure that adding new -m flag and related stuff this late
  is a good idea. Should still add it?

 Please submit the patch anyway. We can relax release constraints on
 non-algorithmic patch a bit, weighting in benefits of having gcc
 release that fully conforms to some published specification.

 Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
 and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
 Ok for trunk?

 ChangeLog:

 2014-02-21  Ilya Tocar  ilya.to...@intel.com

 * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
 (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
 (ix86_handle_option): Handle OPT_mprefetchwt1.
 * config/i386/cpuid.h (bit_PREFETCHWT1): New.
 * config/i386/driver-i386.c (host_detect_local_cpu): Detect
 PREFETCHWT1 CPUID.
 * config/i386/i386-c.c (ix86_target_macros_internal): Handle
 OPTION_MASK_ISA_PREFETCHWT1.
 * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
 (PTA_PREFETCHWT1): New.
 (ix86_option_override_internal): Handle PTA_PREFETCHWT1.
 (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
 * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
   New.
 * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
 (*prefetch_avx512pf_mode_: Change into ...
  (*prefetch_prefetchwt1_mode: This.
 * config/i386/i386.opt (mprefetchwt1): New.
 * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
 (_mm_prefetch): Handle intent to write.
 * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.

 And for tests:

 2014-02-22  Ilya Tocar  ilya.to...@intel.com

 * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
 * gcc.target/i386/prefetchwt1-1.c: New.
 * gcc.target/i386/sse-13.c: Update __builtin_prefetch.
 * gcc.target/i386/sse-23.c: Ditto.

Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
g++.dg/other/i386-{2,3} and new options to
gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
repost the patch.

 @@ -17867,8 +17867,8 @@
   supported by SSE counterpart or the SSE prefetch is not available
   (K6 machines).  Otherwise use SSE prefetch as it allows specifying
   of locality.  */
 -  if (TARGET_AVX512PF  write)
 -operands[2] = const1_rtx;
 +  if (TARGET_PREFETCHWT1  write)
 +operands[2] = GEN_INT (2);

you can use const2_rtx here.

Uros.


[PATCH][i386][AVX512] Match latest spec.

2014-02-20 Thread Ilya Tocar
Hi,
Latest version of AVX512 spec
http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
Has a few changes.
This patch fixes first of them:
Vptestnmd and vptestnmq instructions now have CPUID AVX512F instead of
AVX512CD. This path changes thier CPUID accordingly.
However I have a question about other changes:

1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
We can either support new CPUID or disable PREFETCHWT1 from generating,
without removing code, and enable it in 4.9.1/latest version.
I am not sure that adding new -m flag and related stuff this late
is a good idea. Should still add it?

2)Currently for scatter/gather prefetches intrinsics we accept 1 as
possible hint parameter. This is consistent with ICC. However as
GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces
are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as
hint. We can either change gather prefetches to accept 1 instead of 3 and
hope that everyone will use _MM_HINT_T0 and not the raw value, or we can
change _MM_HINT_T0 to be consistent with ICC. What solution do you
prefer?

Patch bellow changes CPUID of vptestnmq/vptestnmd and changes some bogus
%v to v. Bootstraps, passes make check. Ok for trunk?

ChangeLog

2014-02-20  Ilya Tocar  ilya.to...@intel.com
 
* config/i386/avx512fintrin.h (_mm512_testn_epi32_mask),
(_mm512_mask_testn_epi32_mask), (_mm512_testn_epi64_mask),
(_mm512_mask_testn_epi64_mask): Move to ...
* config/i386/avx512cdintrin.h: Here.
* config/i386/i386.c (bdesc_args): Change MASK_ISA for testnm.
* config/i386/sse.md (avx512f_vmscalefmoderound_name): Remove %.
(avx512f_scalefmodemask_nameround_name): Ditto.
(avx512f_testnmmode3mask_scalar_merge_name): Change conditon to
TARGET_AVX512F from TARGET_AVX512CD.

And for testsuite

2014-02-20  Ilya Tocar  ilya.to...@intel.com
 
* gcc.target/i386/avx512cd-vptestnmd-1.c: Change into ...
* gcc.target/i386/avx512f-vptestnmd-1.c: This.
* gcc.target/i386/avx512cd-vptestnmq-1.c: Change into ...
* gcc.target/i386/avx512f-vptestnmq-1.c: This.
* gcc.target/i386/avx512cd-vptestnmd-2.c: Change into ...
* gcc.target/i386/avx512f-vptestnmd-2.c: This.
* gcc.target/i386/avx512cd-vptestnmq-2.c: Change into ...
* gcc.target/i386/avx512f-vptestnmq-2.c: This.


---
 gcc/config/i386/avx512cdintrin.h   | 34 --
 gcc/config/i386/avx512fintrin.h| 34 ++
 gcc/config/i386/i386.c |  4 +-
 gcc/config/i386/sse.md |  8 ++--
 .../gcc.target/i386/avx512cd-vptestnmd-1.c | 16 ---
 .../gcc.target/i386/avx512cd-vptestnmd-2.c | 52 --
 .../gcc.target/i386/avx512cd-vptestnmq-1.c | 16 ---
 .../gcc.target/i386/avx512cd-vptestnmq-2.c | 52 --
 .../gcc.target/i386/avx512f-vptestnmd-1.c  | 16 +++
 .../gcc.target/i386/avx512f-vptestnmd-2.c  | 52 ++
 .../gcc.target/i386/avx512f-vptestnmq-1.c  | 16 +++
 .../gcc.target/i386/avx512f-vptestnmq-2.c  | 52 ++
 12 files changed, 176 insertions(+), 176 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmd-1.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmd-2.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmq-1.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmq-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmq-2.c

diff --git a/gcc/config/i386/avx512cdintrin.h b/gcc/config/i386/avx512cdintrin.h
index 3935b77..a4939f7a 100644
--- a/gcc/config/i386/avx512cdintrin.h
+++ b/gcc/config/i386/avx512cdintrin.h
@@ -176,40 +176,6 @@ _mm512_broadcastmw_epi32 (__mmask16 __A)
   return (__m512i) __builtin_ia32_broadcastmw512 (__A);
 }
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
-{
-  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
-(__v16si) __B,
-(__mmask16) -1);
-}
-
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
-{
-  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
-(__v16si) __B, __U);
-}
-
-extern __inline __mmask8
-__attribute__ 

Re: [PATCH][i386][AVX512] Match latest spec.

2014-02-20 Thread Uros Bizjak
On Thu, Feb 20, 2014 at 4:39 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 Latest version of AVX512 spec
 http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
 Has a few changes.
 This patch fixes first of them:
 Vptestnmd and vptestnmq instructions now have CPUID AVX512F instead of
 AVX512CD. This path changes thier CPUID accordingly.
 However I have a question about other changes:

 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
 We can either support new CPUID or disable PREFETCHWT1 from generating,
 without removing code, and enable it in 4.9.1/latest version.
 I am not sure that adding new -m flag and related stuff this late
 is a good idea. Should still add it?

Please submit the patch anyway. We can relax release constraints on
non-algorithmic patch a bit, weighting in benefits of having gcc
release that fully conforms to some published specification.

 2)Currently for scatter/gather prefetches intrinsics we accept 1 as
 possible hint parameter. This is consistent with ICC. However as
 GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC
 (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces
 are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as
 hint. We can either change gather prefetches to accept 1 instead of 3 and
 hope that everyone will use _MM_HINT_T0 and not the raw value, or we can
 change _MM_HINT_T0 to be consistent with ICC. What solution do you
 prefer?

Builtins, including __builtin_prefetch, are considered as internal
implementation detail, so we can pass to them wharever we like. The
published interface is in *.h files, and this includes _MM_HINT_T0.
For now, I suggest to change prefetches, so they will accept
_MM_HINT_T0, as this is the least invasive change.

FWIW, we can change _MM_HINT_T0 in the future, as intrinsic headers
correspond to the compiler, but it will raise maintenance burden (you
can't just recompile sources involving builtins with different
versions of the compiler anymore due to difference in constant
arguments).

 Patch bellow changes CPUID of vptestnmq/vptestnmd and changes some bogus
 %v to v. Bootstraps, passes make check. Ok for trunk?

 ChangeLog

 2014-02-20  Ilya Tocar  ilya.to...@intel.com

 * config/i386/avx512fintrin.h (_mm512_testn_epi32_mask),
 (_mm512_mask_testn_epi32_mask), (_mm512_testn_epi64_mask),
 (_mm512_mask_testn_epi64_mask): Move to ...
 * config/i386/avx512cdintrin.h: Here.
 * config/i386/i386.c (bdesc_args): Change MASK_ISA for testnm.
 * config/i386/sse.md (avx512f_vmscalefmoderound_name): Remove %.
 (avx512f_scalefmodemask_nameround_name): Ditto.
 (avx512f_testnmmode3mask_scalar_merge_name): Change conditon to
 TARGET_AVX512F from TARGET_AVX512CD.

 And for testsuite

 2014-02-20  Ilya Tocar  ilya.to...@intel.com

 * gcc.target/i386/avx512cd-vptestnmd-1.c: Change into ...
 * gcc.target/i386/avx512f-vptestnmd-1.c: This.
 * gcc.target/i386/avx512cd-vptestnmq-1.c: Change into ...
 * gcc.target/i386/avx512f-vptestnmq-1.c: This.
 * gcc.target/i386/avx512cd-vptestnmd-2.c: Change into ...
 * gcc.target/i386/avx512f-vptestnmd-2.c: This.
 * gcc.target/i386/avx512cd-vptestnmq-2.c: Change into ...
 * gcc.target/i386/avx512f-vptestnmq-2.c: This.

This is OK for mainline.

Thanks,
Uros.