from:"Ilya Tocar"

Re: [PATCH, PR65915] Fix float conversion split.

2015-05-05 Thread Ilya Tocar

  +++ b/gcc/testsuite/gcc.target/i386/pr65915.c
  @@ -0,0 +1,6 @@
  +/* { dg-do run } */
  +/* { dg-options -O2 -mavx512f -fpic -mcmodel=medium } */
  +/* { dg-require-effective-target avx512f } */
  +/* { dg-require-effective-target lp64 } */
  +
  +#include avx512f-vrndscalepd-2.c
 
  Missing testcases for
 
  FAIL: gcc.target/i386/avx512f-vrndscaleps-2.c (test for excess errors)
  FAIL: gcc.target/i386/avx512vl-vrndscaleps-2.c (internal compiler error)
 
 The attached test is OK, since these two would test for the same problem.
 
  as well as ChangeLog entries.
 
 ChangeLog is missing. Please add PR number and describe *each* change
 accurately. You can say (vector convert to float spltiter) for this
 particular nameless splitter.
 
 Please repost the patch with updated ChangeLog.


ChangeLog

PR c/65915
* config/i386/i386.md (vector convert to float spltiter): Check for
xmm16+, when splitting scalar float conversion.
* config/i386/sse.md (sse2_cvtsi2sd): Support EVEX version.

And for tests

PR c/65915
* gcc.target/i386/pr65915.c: New.


Reposted patch below.

---
 gcc/config/i386/i386.md | 8 ++--
 gcc/config/i386/sse.md  | 6 +++---
 gcc/testsuite/gcc.target/i386/pr65915.c | 6 ++
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr65915.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 937871a..af1cd9b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4897,7 +4897,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed  SSE_REG_P (operands[0])
-(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
+(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
+(!EXT_REX_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
   [(const_int 0)]
 {
   operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
@@ -4921,7 +4923,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_SSE_PARTIAL_REG_DEPENDENCY
 optimize_function_for_speed_p (cfun)
-reload_completed  SSE_REG_P (operands[0])
+reload_completed  SSE_REG_P (operands[0])
+(!EXT_REX_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
   [(const_int 0)]
 {
   const machine_mode vmode = MODEF:ssevecmodemode;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9b7009a..c61098d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4258,11 +4258,11 @@
(set_attr mode TI)])
 
 (define_insn sse2_cvtsi2sd
-  [(set (match_operand:V2DF 0 register_operand =x,x,x)
+  [(set (match_operand:V2DF 0 register_operand =x,x,v)
(vec_merge:V2DF
  (vec_duplicate:V2DF
(float:DF (match_operand:SI 2 nonimmediate_operand r,m,rm)))
- (match_operand:V2DF 1 register_operand 0,0,x)
+ (match_operand:V2DF 1 register_operand 0,0,v)
  (const_int 1)))]
   TARGET_SSE2
   @
@@ -4275,7 +4275,7 @@
(set_attr amdfam10_decode vector,double,*)
(set_attr bdver1_decode double,direct,*)
(set_attr btver2_decode double,double,double)
-   (set_attr prefix orig,orig,vex)
+   (set_attr prefix orig,orig,maybe_evex)
(set_attr mode DF)])
 
 (define_insn sse2_cvtsi2sdqround_name
diff --git a/gcc/testsuite/gcc.target/i386/pr65915.c 
b/gcc/testsuite/gcc.target/i386/pr65915.c
new file mode 100644
index 000..990c5aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr65915.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-options -O2 -mavx512f -fpic -mcmodel=medium } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target lp64 } */
+
+#include avx512f-vrndscalepd-2.c
-- 
1.8.3.1

Re: [PATCH, PR65915] Fix float conversion split.

2015-04-30 Thread Ilya Tocar

 Hi,
 
 Looks like I missed some splits, which caused PR65915.
 Patch below fixes it.
 Ok for trunk?
 
 2015-04-28  Ilya Tocar  ilya.to...@intel.com
 
   * config/i386/i386.md (define_split): Check for xmm16+,
   when splitting scalar float conversion.
 
 
 ---
  gcc/config/i386/i386.md | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index 937871a..af1cd9b 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -4897,7 +4897,9 @@
TARGET_SSE2  TARGET_SSE_MATH
  TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
  reload_completed  SSE_REG_P (operands[0])
 -(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
 +(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
 +(!EXT_REX_SSE_REG_P (operands[0])
 +   || TARGET_AVX512VL)
[(const_int 0)]
  {
operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
 @@ -4921,7 +4923,9 @@
TARGET_SSE2  TARGET_SSE_MATH
  TARGET_SSE_PARTIAL_REG_DEPENDENCY
  optimize_function_for_speed_p (cfun)
 -reload_completed  SSE_REG_P (operands[0])
 +reload_completed  SSE_REG_P (operands[0])
 +(!EXT_REX_SSE_REG_P (operands[0])
 +   || TARGET_AVX512VL)
[(const_int 0)]
  {
const machine_mode vmode = MODEF:ssevecmodemode;
 -- 
 1.8.3.1


Updated version below (now with test).

---
 gcc/config/i386/i386.md | 8 ++--
 gcc/config/i386/sse.md  | 6 +++---
 gcc/testsuite/gcc.target/i386/pr65915.c | 6 ++
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr65915.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 937871a..af1cd9b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4897,7 +4897,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed  SSE_REG_P (operands[0])
-(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
+(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
+(!EXT_REX_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
   [(const_int 0)]
 {
   operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
@@ -4921,7 +4923,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_SSE_PARTIAL_REG_DEPENDENCY
 optimize_function_for_speed_p (cfun)
-reload_completed  SSE_REG_P (operands[0])
+reload_completed  SSE_REG_P (operands[0])
+(!EXT_REX_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
   [(const_int 0)]
 {
   const machine_mode vmode = MODEF:ssevecmodemode;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9b7009a..c61098d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4258,11 +4258,11 @@
(set_attr mode TI)])
 
 (define_insn sse2_cvtsi2sd
-  [(set (match_operand:V2DF 0 register_operand =x,x,x)
+  [(set (match_operand:V2DF 0 register_operand =x,x,v)
(vec_merge:V2DF
  (vec_duplicate:V2DF
(float:DF (match_operand:SI 2 nonimmediate_operand r,m,rm)))
- (match_operand:V2DF 1 register_operand 0,0,x)
+ (match_operand:V2DF 1 register_operand 0,0,v)
  (const_int 1)))]
   TARGET_SSE2
   @
@@ -4275,7 +4275,7 @@
(set_attr amdfam10_decode vector,double,*)
(set_attr bdver1_decode double,direct,*)
(set_attr btver2_decode double,double,double)
-   (set_attr prefix orig,orig,vex)
+   (set_attr prefix orig,orig,maybe_evex)
(set_attr mode DF)])
 
 (define_insn sse2_cvtsi2sdqround_name
diff --git a/gcc/testsuite/gcc.target/i386/pr65915.c 
b/gcc/testsuite/gcc.target/i386/pr65915.c
new file mode 100644
index 000..990c5aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr65915.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-options -O2 -mavx512f -fpic -mcmodel=medium } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target lp64 } */
+
+#include avx512f-vrndscalepd-2.c
-- 
1.8.3.1

[PATCH, PR65915] Fix float conversion split.

2015-04-28 Thread Ilya Tocar

  I've renamed EXT_SSE_REG_P into EXT_REX_SSE_REG_P for consistency.
  Ok for stage1?
 Patch is OK for stage1.
 
 --
 Thanks, K
 
 
  On 19 Mar 12:24, Ilya Tocar wrote:
   Hi,
   
   There were some discussion about x constraints being too conservative
   for some patterns in i386.md.
   Patch below fixes it. This is probably stage1 material.
   
   ChangeLog:
   
   gcc/
   
  2015-03-23  Ilya Tocar  ilya.to...@intel.com
  
  * config/i386/i386.h (EXT_REX_SSE_REG_P): New.
  * config/i386/i386.md (*cmpiFPCMP:unordMODEF:mode_mixed): Use v
  constraint.
  (*cmpiFPCMP:unordMODEF:mode_sse): Ditto.
  (*movxi_internal_avx512f): Ditto.
  (define_split): Check for xmm16+, when splitting scalar float_extend.
  (*extendsfdf2_mixed): Use v constraint.
  (*extendsfdf2_sse): Ditto.
  (define_split): Check for xmm16+, when splitting scalar float_truncate.
  (*truncdfsf_fast_sse): Use v constraint.
  (fix_truncMODEF:modeSWI48:mode_sse): Ditto.
  (*floatSWI48:modeMODEF:mode2_sse): Ditto.
  (define_peephole2): Check for xmm16+, when converting scalar
  float_truncate.
  (define_peephole2): Check for xmm16+, when converting scalar
  float_extend.
  (*fop_mode_comm_mixed): Use v constraint.
  (*fop_mode_comm_sse): Ditto.
  (*fop_mode_1_mixed): Ditto.
  (*sqrtmode2_sse): Ditto.
  (*ieee_sieee_maxminmode3): Ditto.
  
 

Hi,

Looks like I missed some splits, which caused PR65915.
Patch below fixes it.
Ok for trunk?

2015-04-28  Ilya Tocar  ilya.to...@intel.com

* config/i386/i386.md (define_split): Check for xmm16+,
when splitting scalar float conversion.


---
 gcc/config/i386/i386.md | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 937871a..af1cd9b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4897,7 +4897,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed  SSE_REG_P (operands[0])
-(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
+(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
+(!EXT_REX_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
   [(const_int 0)]
 {
   operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
@@ -4921,7 +4923,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_SSE_PARTIAL_REG_DEPENDENCY
 optimize_function_for_speed_p (cfun)
-reload_completed  SSE_REG_P (operands[0])
+reload_completed  SSE_REG_P (operands[0])
+(!EXT_REX_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
   [(const_int 0)]
 {
   const machine_mode vmode = MODEF:ssevecmodemode;
-- 
1.8.3.1

Re: [PATCH] Make wider use of v constraint in i386.md

2015-04-27 Thread Ilya Tocar

On 17 Apr 10:09, Uros Bizjak wrote:
 On Thu, Mar 19, 2015 at 10:24 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  There were some discussion about x constraints being too conservative
  for some patterns in i386.md.
  Patch below fixes it. This is probably stage1 material.
 
  ChangeLog:
 
  gcc/
 
  2015-03-19  Ilya Tocar  ilya.to...@intel.com
 
  * config/i386/i386.h (EXT_SSE_REG_P): New.
  * config/i386/i386.md (*cmpiFPCMP:unordMODEF:mode_mixed): Use 
  v
  constraint.
  (*cmpiFPCMP:unordMODEF:mode_sse): Ditto.
  (*movxi_internal_avx512f): Ditto.
  (define_split): Check for xmm16+, when splitting scalar 
  float_extend.
  (*extendsfdf2_mixed): Use v constraint.
  (*extendsfdf2_sse): Ditto.
  (define_split): Check for xmm16+, when splitting scalar 
  float_truncate.
  (*truncdfsf_fast_sse): Use v constraint.
  (fix_truncMODEF:modeSWI48:mode_sse): Ditto.
  (*floatSWI48:modeMODEF:mode2_sse): Ditto.
  (define_peephole2): Check for xmm16+, when converting scalar
  float_truncate.
  (define_peephole2): Check for xmm16+, when converting scalar
  float_extend.
  (*fop_mode_comm_mixed): Use v constraint.
  (*fop_mode_comm_sse): Ditto.
  (*fop_mode_1_mixed): Ditto.
  (*sqrtmode2_sse): Ditto.
  (*ieee_sieee_maxminmode3): Ditto.
 
 I wonder if there are also changes needed in mmx.md. There are a
 couple of patterns that operate on xmm registers, so they should be
 reviewed if they need to change their constraint to v to accept
 extended xmm register set.


Doesn't look like it. At first glance non-v stuff in mmx.md isn't even
avx enabled, so v is not relevant. Moreover we don't allow mmx modes
(v2sf etc.) in xmm16+ via  ix86_hard_regno_mode_ok.

[PATCH, i386] PR63211 broken type-punning in avx* tests.

2015-04-03 Thread Ilya Tocar

Hi,

I've looked into avx* tests and many of them (even those that don't fail
in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type
punning. Properly fixing them looks like a lot of work, so I propose
just adding  -fno-strict-aliasing to them.
This patch was obtained by running
sed -i s/-O2/-O2  -fno-strict-aliasing/g 
../gcc/testsuite/gcc.target/i386/avx*-2.c

Ok for stage1?

Changelog below:

testsuite/

2015-04-03  Ilya Tocar  ilya.to...@intel.com

PR target/63211
* gcc.target/i386/avx-cmpsd-2.c: Update test.
* gcc.target/i386/avx-cmpss-2.c: Update test.
* gcc.target/i386/avx-vbroadcastf128-256-2.c: Update test.
* gcc.target/i386/avx-vbroadcastss-2.c: Update test.
* gcc.target/i386/avx-vcomisd-2.c: Update test.
* gcc.target/i386/avx-vcomiss-2.c: Update test.
* gcc.target/i386/avx-vcvtsd2si-2.c: Update test.
* gcc.target/i386/avx-vcvtsi2sd-2.c: Update test.
* gcc.target/i386/avx-vcvtsi2ss-2.c: Update test.
* gcc.target/i386/avx-vcvtss2si-2.c: Update test.
* gcc.target/i386/avx-vcvttsd2si-2.c: Update test.
* gcc.target/i386/avx-vcvttss2si-2.c: Update test.
* gcc.target/i386/avx-vdppd-2.c: Update test.
* gcc.target/i386/avx-vdpps-2.c: Update test.
* gcc.target/i386/avx-vextractf128-256-2.c: Update test.
* gcc.target/i386/avx-vinsertf128-256-2.c: Update test.
* gcc.target/i386/avx-vinsertps-2.c: Update test.
* gcc.target/i386/avx-vmaskmovpd-2.c: Update test.
* gcc.target/i386/avx-vmaskmovpd-256-2.c: Update test.
* gcc.target/i386/avx-vmaskmovps-2.c: Update test.
* gcc.target/i386/avx-vmaskmovps-256-2.c: Update test.
* gcc.target/i386/avx-vmovapd-2.c: Update test.
* gcc.target/i386/avx-vmovapd-256-2.c: Update test.
* gcc.target/i386/avx-vmovaps-2.c: Update test.
* gcc.target/i386/avx-vmovaps-256-2.c: Update test.
* gcc.target/i386/avx-vmovd-2.c: Update test.
* gcc.target/i386/avx-vmovdqa-2.c: Update test.
* gcc.target/i386/avx-vmovdqa-256-2.c: Update test.
* gcc.target/i386/avx-vmovdqu-2.c: Update test.
* gcc.target/i386/avx-vmovdqu-256-2.c: Update test.
* gcc.target/i386/avx-vmovhpd-2.c: Update test.
* gcc.target/i386/avx-vmovhps-2.c: Update test.
* gcc.target/i386/avx-vmovlpd-2.c: Update test.
* gcc.target/i386/avx-vmovq-2.c: Update test.
* gcc.target/i386/avx-vmovsd-2.c: Update test.
* gcc.target/i386/avx-vmovss-2.c: Update test.
* gcc.target/i386/avx-vmovupd-2.c: Update test.
* gcc.target/i386/avx-vmovupd-256-2.c: Update test.
* gcc.target/i386/avx-vmovups-2.c: Update test.
* gcc.target/i386/avx-vmovups-256-2.c: Update test.
* gcc.target/i386/avx-vpcmpestri-2.c: Update test.
* gcc.target/i386/avx-vpcmpestrm-2.c: Update test.
* gcc.target/i386/avx-vpcmpistri-2.c: Update test.
* gcc.target/i386/avx-vpcmpistrm-2.c: Update test.
* gcc.target/i386/avx-vperm2f128-256-2.c: Update test.
* gcc.target/i386/avx-vpermilpd-2.c: Update test.
* gcc.target/i386/avx-vpermilpd-256-2.c: Update test.
* gcc.target/i386/avx-vpermilps-2.c: Update test.
* gcc.target/i386/avx-vpermilps-256-2.c: Update test.
* gcc.target/i386/avx-vpslld-2.c: Update test.
* gcc.target/i386/avx-vpsllq-2.c: Update test.
* gcc.target/i386/avx-vpsllw-2.c: Update test.
* gcc.target/i386/avx-vpsrad-2.c: Update test.
* gcc.target/i386/avx-vpsraw-2.c: Update test.
* gcc.target/i386/avx-vpsrld-2.c: Update test.
* gcc.target/i386/avx-vpsrlq-2.c: Update test.
* gcc.target/i386/avx-vpsrlw-2.c: Update test.
* gcc.target/i386/avx-vptest-2.c: Update test.
* gcc.target/i386/avx-vptest-256-2.c: Update test.
* gcc.target/i386/avx-vroundpd-2.c: Update test.
* gcc.target/i386/avx-vroundpd-256-2.c: Update test.
* gcc.target/i386/avx-vtestpd-2.c: Update test.
* gcc.target/i386/avx-vtestpd-256-2.c: Update test.
* gcc.target/i386/avx-vtestps-2.c: Update test.
* gcc.target/i386/avx-vtestps-256-2.c: Update test.
* gcc.target/i386/avx-vucomisd-2.c: Update test.
* gcc.target/i386/avx-vucomiss-2.c: Update test.
* gcc.target/i386/avx2-i32gatherd-2.c: Update test.
* gcc.target/i386/avx2-i32gatherd256-2.c: Update test.
* gcc.target/i386/avx2-i32gatherpd-2.c: Update test.
* gcc.target/i386/avx2-i32gatherpd256-2.c: Update test.
* gcc.target/i386/avx2-i32gatherps-2.c: Update test.
* gcc.target/i386/avx2-i32gatherps256-2.c: Update test.
* gcc.target/i386/avx2-i32gatherq-2.c: Update test.
* gcc.target/i386/avx2-i32gatherq256-2.c: Update test.
* gcc.target/i386/avx2-i64gatherd-2.c: Update test.
* gcc.target/i386/avx2-i64gatherd256-2.c: Update test

Re: [PATCH, i386] PR63211 broken type-punning in avx* tests.

2015-04-03 Thread Ilya Tocar

On 03 Apr 13:39, Uros Bizjak wrote:
 On Fri, Apr 3, 2015 at 1:02 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  I've looked into avx* tests and many of them (even those that don't fail
  in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type
  punning. Properly fixing them looks like a lot of work, so I propose
  just adding  -fno-strict-aliasing to them.
  This patch was obtained by running
  sed -i s/-O2/-O2  -fno-strict-aliasing/g 
  ../gcc/testsuite/gcc.target/i386/avx*-2.c
 
  Ok for stage1?
 
 I don't like this approach. If the testcase is broken, then it should
 be fixed, not worked around.

IMHO those tests don't need to be alias conformant.
There are plenty of tests for aliasing rules,
and avx tests verify intrinsics implementaion. There are plenty of real
programs braking alias rules, so why can't we have non-conformant tests?

Re: [PATCH] Warn about unclosed pragma omp declare target.

2015-03-26 Thread Ilya Tocar

On 02 Feb 13:05, Jakub Jelinek wrote:
 On Tue, Jul 29, 2014 at 06:45:01PM +0400, Ilya Tocar wrote:
  Hi,
  
  As discussed here in https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
  Gcc should complain about pragma omp declare target without
  corresponding pragma omp end declare target. This patch adds a warning
  for those cases.
  Bootstraps/passes make-check.
  Ok for trunk?
  
  ChangeLog:
  
  2014-07-29  Ilya Tocar  ilya.to...@intel.com
  
  * c-decl.c (omp_declare_target_location_stack): New.
  * c-lang.h (omp_declare_target_location_stack): Declare.
  * c-parser.c (warn_unclosed_pragma_omp_target): New.
  (c_parser_translation_unit): Call it.
  (c_parser_omp_declare_target): Remeber location.
  (c_parser_omp_end_declare_target): Forget location.
 
 Sorry for the long delay on this.
 Can you check what will happen if you have unclosed #pragma omp declare target
 in some header you precompile?  If you get the warning during the header
 compilation and then not during compilation using that PCH header,
 supposedly it might be fine and the patch might be ok as is.

I've completely forgot about PCH.
With them this patch fails with segfault.
Moreover even if I fix segfault, we will produce strange results for
something like:
#include a.h
#pragma omp end declare target
// some code
 
 If we wanted to warn even on a.c, supposedly the vector would need to be
 marked for GC.

I've tried:

static GTY(()) vecint,va_gc_atomic
*omp_declare_target_location_stack;

However it fails with:

vec.h:1118: undefined reference to `gt_pch_nx(int)'

But in ggc.h (included in c-parser.c) i see

gt_pch_nx (unsigned int)
{
}

So I'm not sure how to properly mark vector for PCH.

[PATCH] Make wider use of v constraint in i386.md

2015-03-19 Thread Ilya Tocar

Hi,

There were some discussion about x constraints being too conservative
for some patterns in i386.md.
Patch below fixes it. This is probably stage1 material.

ChangeLog:

gcc/

2015-03-19  Ilya Tocar  ilya.to...@intel.com

* config/i386/i386.h (EXT_SSE_REG_P): New.
* config/i386/i386.md (*cmpiFPCMP:unordMODEF:mode_mixed): Use v
constraint.
(*cmpiFPCMP:unordMODEF:mode_sse): Ditto.
(*movxi_internal_avx512f): Ditto.
(define_split): Check for xmm16+, when splitting scalar float_extend.
(*extendsfdf2_mixed): Use v constraint.
(*extendsfdf2_sse): Ditto.
(define_split): Check for xmm16+, when splitting scalar float_truncate.
(*truncdfsf_fast_sse): Use v constraint.
(fix_truncMODEF:modeSWI48:mode_sse): Ditto.
(*floatSWI48:modeMODEF:mode2_sse): Ditto.
(define_peephole2): Check for xmm16+, when converting scalar
float_truncate.
(define_peephole2): Check for xmm16+, when converting scalar
float_extend.
(*fop_mode_comm_mixed): Use v constraint.
(*fop_mode_comm_sse): Ditto.
(*fop_mode_1_mixed): Ditto.
(*sqrtmode2_sse): Ditto.
(*ieee_sieee_maxminmode3): Ditto.


---
 gcc/config/i386/i386.h  |  2 ++
 gcc/config/i386/i386.md | 82 +++--
 2 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1e755d3..0b8c57a 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1477,6 +1477,8 @@ enum reg_class
 #define REX_SSE_REGNO_P(N) \
   IN_RANGE ((N), FIRST_REX_SSE_REG, LAST_REX_SSE_REG)
 
+#define EXT_SSE_REG_P(X) (REG_P (X)  EXT_REX_SSE_REGNO_P (REGNO (X)))
+
 #define EXT_REX_SSE_REGNO_P(N) \
   IN_RANGE ((N), FIRST_EXT_REX_SSE_REG, LAST_EXT_REX_SSE_REG)
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1129b93..38eaf95 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1639,8 +1639,8 @@
 (define_insn *cmpiFPCMP:unordMODEF:mode_mixed
   [(set (reg:FPCMP FLAGS_REG)
(compare:FPCMP
- (match_operand:MODEF 0 register_operand f,x)
- (match_operand:MODEF 1 nonimmediate_operand f,xm)))]
+ (match_operand:MODEF 0 register_operand f,v)
+ (match_operand:MODEF 1 nonimmediate_operand f,vm)))]
   TARGET_MIX_SSE_I387
 SSE_FLOAT_MODE_P (MODEF:MODEmode)
   * return output_fp_compare (insn, operands, true,
@@ -1666,8 +1666,8 @@
 (define_insn *cmpiFPCMP:unordMODEF:mode_sse
   [(set (reg:FPCMP FLAGS_REG)
(compare:FPCMP
- (match_operand:MODEF 0 register_operand x)
- (match_operand:MODEF 1 nonimmediate_operand xm)))]
+ (match_operand:MODEF 0 register_operand v)
+ (match_operand:MODEF 1 nonimmediate_operand vm)))]
   TARGET_SSE_MATH
 SSE_FLOAT_MODE_P (MODEF:MODEmode)
   * return output_fp_compare (insn, operands, true,
@@ -1959,8 +1959,8 @@
(set_attr length_immediate 1)])
 
 (define_insn *movxi_internal_avx512f
-  [(set (match_operand:XI 0 nonimmediate_operand =x,x ,m)
-   (match_operand:XI 1 vector_move_operand  C ,xm,x))]
+  [(set (match_operand:XI 0 nonimmediate_operand =v,v ,m)
+   (match_operand:XI 1 vector_move_operand  C ,vm,v))]
   TARGET_AVX512F  !(MEM_P (operands[0])  MEM_P (operands[1]))
 {
   switch (which_alternative)
@@ -4003,7 +4003,9 @@
  (match_operand:SF 1 nonimmediate_operand)))]
   TARGET_USE_VECTOR_FP_CONVERTS
 optimize_insn_for_speed_p ()
-reload_completed  SSE_REG_P (operands[0])
+reload_completed  SSE_REG_P (operands[0])
+(!EXT_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
[(set (match_dup 2)
 (float_extend:V2DF
   (vec_select:V2SF
@@ -4048,9 +4050,9 @@
   operands[2] = gen_rtx_REG (SFmode, REGNO (operands[0]));)
 
 (define_insn *extendsfdf2_mixed
-  [(set (match_operand:DF 0 nonimmediate_operand =f,m,x)
+  [(set (match_operand:DF 0 nonimmediate_operand =f,m,v)
 (float_extend:DF
- (match_operand:SF 1 nonimmediate_operand fm,f,xm)))]
+ (match_operand:SF 1 nonimmediate_operand fm,f,vm)))]
   TARGET_SSE2  TARGET_MIX_SSE_I387
 {
   switch (which_alternative)
@@ -4071,8 +4073,8 @@
(set_attr mode SF,XF,DF)])
 
 (define_insn *extendsfdf2_sse
-  [(set (match_operand:DF 0 nonimmediate_operand =x)
-(float_extend:DF (match_operand:SF 1 nonimmediate_operand xm)))]
+  [(set (match_operand:DF 0 nonimmediate_operand =v)
+(float_extend:DF (match_operand:SF 1 nonimmediate_operand vm)))]
   TARGET_SSE2  TARGET_SSE_MATH
   %vcvtss2sd\t{%1, %d0|%d0, %1}
   [(set_attr type ssecvt)
@@ -4155,7 +4157,9 @@
  (match_operand:DF 1 nonimmediate_operand)))]
   TARGET_USE_VECTOR_FP_CONVERTS
 optimize_insn_for_speed_p ()
-reload_completed  SSE_REG_P (operands[0])
+reload_completed  SSE_REG_P (operands[0])
+(!EXT_SSE_REG_P (operands[0])
+   || TARGET_AVX512VL)
[(set (match_dup 2)
 (vec_concat:V4SF

[Patch] Fix android build.

2015-02-18 Thread Ilya Tocar

Hi,

On android dlerror returns const char*.
Ok for trunk?

libgomp/
* target.c (gomp_load_plugin_for_device): Fix type of dlerror
return value.
(DLSYM_OPT): Ditto.
---
 libgomp/target.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 73e757a..50baa4d 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -919,7 +919,7 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 const char *plugin_name)
 {
-  char *err = NULL, *last_missing = NULL;
+  const char *err = NULL, *last_missing = NULL;
   int optional_present, optional_total;
 
   /* Clear any existing error.  */
@@ -947,7 +947,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
 #define DLSYM_OPT(f, n)\
   do   \
 {  \
-  char *tmp_err;   \
+  const char *tmp_err; 
\
   device-f##_func = dlsym (plugin_handle, GOMP_OFFLOAD_ #n);\
   tmp_err = dlerror ();\
   if (tmp_err == NULL) \
-- 
1.8.3.1

Re: [PATCH] PR64387

2015-02-04 Thread Ilya Tocar

I think that fix for avx2 part should be backported to 4.8/4.9
What do you think?

On 14 Jan 14:18, Ilya Tocar wrote:
 Hi,
 
 This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64387
 Which was caused by different in predicates between vec_unpacks_hi
 and vec_extract_hi.
 Ok for trunk?
 ChangeLog:
 
 gcc/
   PR target/64387
   * config/i386/sse.md (vec_unpacks_hi_v8sf): Fix predicate.
   (vec_unpacks_hi_v16sf): Ditto.
 
 testsuite/
   PR target/64387
   * gcc.target/i386/pr64387.c: New test.
 ---
  gcc/config/i386/sse.md  |  4 ++--
  gcc/testsuite/gcc.target/i386/pr64387.c | 15 +++
  2 files changed, 17 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/i386/pr64387.c
 
 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 3b9108c..cd4af4e 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -5078,7 +5078,7 @@
  (define_expand vec_unpacks_hi_v8sf
[(set (match_dup 2)
   (vec_select:V4SF
 -   (match_operand:V8SF 1 nonimmediate_operand)
 +   (match_operand:V8SF 1 register_operand)
 (parallel [(const_int 4) (const_int 5)
(const_int 6) (const_int 7)])))
 (set (match_operand:V4DF 0 register_operand)
 @@ -5090,7 +5090,7 @@
  (define_expand vec_unpacks_hi_v16sf
[(set (match_dup 2)
   (vec_select:V8SF
 -   (match_operand:V16SF 1 nonimmediate_operand)
 +   (match_operand:V16SF 1 register_operand)
 (parallel [(const_int 8) (const_int 9)
(const_int 10) (const_int 11)
(const_int 12) (const_int 13)
 diff --git a/gcc/testsuite/gcc.target/i386/pr64387.c 
 b/gcc/testsuite/gcc.target/i386/pr64387.c
 new file mode 100644
 index 000..dd38142
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/pr64387.c
 @@ -0,0 +1,15 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -ftree-vectorize -ffloat-store -mavx512er } */
 +
 +float x[256];
 +
 +double *
 +foo (void)
 +{
 +  double *z = __builtin_malloc (sizeof (double) * 256);
 +  int i;
 +  for (i = 0; i  256; ++i)
 +z[i] = x[i] + 1.0f;
 +  foo ();
 +  return 0;
 +}
 -- 
 1.8.3.1

[PATCH][x86] Update s{r,l}li intrinsics.

2015-01-15 Thread Ilya Tocar

Hi,
Looks like new ISA doc [1] renamed srli,slli intrinsics to bsrli,bslli.
This patch adds b* versions, while keeping old srli for backward
compatibility.
OK for trunk?

1:https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

ChangeLog:
gcc/
* config/i386/avx2intrin.h (_mm256_bslli_si256,
_mm256_bsrli_si256): New.
* config/i386/emmintrin.h (_mm_bsrli_si128, _mm_bslli_si128):
Ditto.

testsuite/
* gcc.target/i386/sse-14.c: Test new intrinsic.
* gcc.target/i386/sse-22.c: Diito.
---
 gcc/config/i386/avx2intrin.h   | 18 ++
 gcc/config/i386/emmintrin.h| 16 
 gcc/testsuite/gcc.target/i386/sse-14.c |  2 ++
 gcc/testsuite/gcc.target/i386/sse-22.c |  4 
 4 files changed, 40 insertions(+)

diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h
index 669f1dc..8a30c5b 100644
--- a/gcc/config/i386/avx2intrin.h
+++ b/gcc/config/i386/avx2intrin.h
@@ -645,11 +645,20 @@ _mm256_sign_epi32 (__m256i __X, __m256i __Y)
 #ifdef __OPTIMIZE__
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_bslli_si256 (__m256i __A, const int __N)
+{
+  return (__m256i)__builtin_ia32_pslldqi256 (__A, __N * 8);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_slli_si256 (__m256i __A, const int __N)
 {
   return (__m256i)__builtin_ia32_pslldqi256 (__A, __N * 8);
 }
 #else
+#define _mm256_bslli_si256(A, N) \
+  ((__m256i)__builtin_ia32_pslldqi256 ((__m256i)(A), (int)(N) * 8))
 #define _mm256_slli_si256(A, N) \
   ((__m256i)__builtin_ia32_pslldqi256 ((__m256i)(A), (int)(N) * 8))
 #endif
@@ -727,11 +736,20 @@ _mm256_sra_epi32 (__m256i __A, __m128i __B)
 #ifdef __OPTIMIZE__
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_bsrli_si256 (__m256i __A, const int __N)
+{
+  return (__m256i)__builtin_ia32_psrldqi256 (__A, __N * 8);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_srli_si256 (__m256i __A, const int __N)
 {
   return (__m256i)__builtin_ia32_psrldqi256 (__A, __N * 8);
 }
 #else
+#define _mm256_bsrli_si256(A, N) \
+  ((__m256i)__builtin_ia32_psrldqi256 ((__m256i)(A), (int)(N) * 8))
 #define _mm256_srli_si256(A, N) \
   ((__m256i)__builtin_ia32_psrldqi256 ((__m256i)(A), (int)(N) * 8))
 #endif
diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index ad37fac..b19f05a 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -1165,6 +1165,18 @@ _mm_srai_epi32 (__m128i __A, int __B)
 
 #ifdef __OPTIMIZE__
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_bsrli_si128 (__m128i __A, const int __N)
+{
+  return (__m128i)__builtin_ia32_psrldqi128 (__A, __N * 8);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_bslli_si128 (__m128i __A, const int __N)
+{
+  return (__m128i)__builtin_ia32_pslldqi128 (__A, __N * 8);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_srli_si128 (__m128i __A, const int __N)
 {
   return (__m128i)__builtin_ia32_psrldqi128 (__A, __N * 8);
@@ -1176,6 +1188,10 @@ _mm_slli_si128 (__m128i __A, const int __N)
   return (__m128i)__builtin_ia32_pslldqi128 (__A, __N * 8);
 }
 #else
+#define _mm_bsrli_si128(A, N) \
+  ((__m128i)__builtin_ia32_psrldqi128 ((__m128i)(A), (int)(N) * 8))
+#define _mm_bslli_si128(A, N) \
+  ((__m128i)__builtin_ia32_pslldqi128 ((__m128i)(A), (int)(N) * 8))
 #define _mm_srli_si128(A, N) \
   ((__m128i)__builtin_ia32_psrldqi128 ((__m128i)(A), (int)(N) * 8))
 #define _mm_slli_si128(A, N) \
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c 
b/gcc/testsuite/gcc.target/i386/sse-14.c
index f3f6c5c..e8791e3 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -601,6 +601,8 @@ test_2 (_mm_alignr_pi8, __m64, __m64, __m64, 1)
 
 /* emmintrin.h */
 test_2 (_mm_shuffle_pd, __m128d, __m128d, __m128d, 1)
+test_1 (_mm_bsrli_si128, __m128i, __m128i, 1)
+test_1 (_mm_bslli_si128, __m128i, __m128i, 1)
 test_1 (_mm_srli_si128, __m128i, __m128i, 1)
 test_1 (_mm_slli_si128, __m128i, __m128i, 1)
 test_1 (_mm_extract_epi16, int, __m128i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c 
b/gcc/testsuite/gcc.target/i386/sse-22.c
index 0d7bd16..5735514 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -138,6 +138,8 @@ test_1 (_mm_prefetch, void, void *, _MM_HINT_NTA)
 #endif
 #include emmintrin.h
 test_2 (_mm_shuffle_pd, __m128d, __m128d, __m128d, 1)
+test_1 (_mm_bsrli_si128, __m128i, __m128i, 1)
+test_1 (_mm_bslli_si128, __m128i, __m128i, 1)
 test_1 (_mm_srli_si128, __m128i, __m128i, 1)
 test_1 (_mm_slli_si128, __m128i, __m128i, 1)
 test_1 (_mm_extract_epi16, int, __m128i, 1)
@@ -269,6

[PATCH] PR64387

2015-01-14 Thread Ilya Tocar

Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64387
Which was caused by different in predicates between vec_unpacks_hi
and vec_extract_hi.
Ok for trunk?
ChangeLog:

gcc/
PR target/64387
* config/i386/sse.md (vec_unpacks_hi_v8sf): Fix predicate.
(vec_unpacks_hi_v16sf): Ditto.

testsuite/
PR target/64387
* gcc.target/i386/pr64387.c: New test.
---
 gcc/config/i386/sse.md  |  4 ++--
 gcc/testsuite/gcc.target/i386/pr64387.c | 15 +++
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr64387.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 3b9108c..cd4af4e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5078,7 +5078,7 @@
 (define_expand vec_unpacks_hi_v8sf
   [(set (match_dup 2)
(vec_select:V4SF
- (match_operand:V8SF 1 nonimmediate_operand)
+ (match_operand:V8SF 1 register_operand)
  (parallel [(const_int 4) (const_int 5)
 (const_int 6) (const_int 7)])))
(set (match_operand:V4DF 0 register_operand)
@@ -5090,7 +5090,7 @@
 (define_expand vec_unpacks_hi_v16sf
   [(set (match_dup 2)
(vec_select:V8SF
- (match_operand:V16SF 1 nonimmediate_operand)
+ (match_operand:V16SF 1 register_operand)
  (parallel [(const_int 8) (const_int 9)
 (const_int 10) (const_int 11)
 (const_int 12) (const_int 13)
diff --git a/gcc/testsuite/gcc.target/i386/pr64387.c 
b/gcc/testsuite/gcc.target/i386/pr64387.c
new file mode 100644
index 000..dd38142
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr64387.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -ftree-vectorize -ffloat-store -mavx512er } */
+
+float x[256];
+
+double *
+foo (void)
+{
+  double *z = __builtin_malloc (sizeof (double) * 256);
+  int i;
+  for (i = 0; i  256; ++i)
+z[i] = x[i] + 1.0f;
+  foo ();
+  return 0;
+}
-- 
1.8.3.1

[PATCH] PR64393

2015-01-14 Thread Ilya Tocar

Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64393
It makes -mavx512vbmi enable avx512bw, as it requires 64-bit masks.
OK for trunk?

ChangeLog:

gcc/
PR target/64393
* common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512VBMI_SET):
Enable AVX512BW.
(OPTION_MASK_ISA_AVX512BW_UNSET): Disable AVX512BW.
* config/i386/i386.c (ix86_hard_regno_mode_ok): Don't check
AVX512VBMI, as it implies AVX512BW.

testsuite/
PR target/64393
* gcc.target/i386/pr64393.c: New test.

---
 gcc/common/config/i386/i386-common.c|  5 +++--
 gcc/config/i386/i386.c  |  2 +-
 gcc/testsuite/gcc.target/i386/pr64393.c | 12 
 3 files changed, 16 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr64393.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 77edb47..4e5687a 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -74,7 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_AVX512IFMA_SET \
   (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA_AVX512VBMI_SET \
-  (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512F_SET)
+  (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET)
 #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
@@ -171,7 +171,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_AVX512PF_UNSET OPTION_MASK_ISA_AVX512PF
 #define OPTION_MASK_ISA_AVX512ER_UNSET OPTION_MASK_ISA_AVX512ER
 #define OPTION_MASK_ISA_AVX512DQ_UNSET OPTION_MASK_ISA_AVX512DQ
-#define OPTION_MASK_ISA_AVX512BW_UNSET OPTION_MASK_ISA_AVX512BW
+#define OPTION_MASK_ISA_AVX512BW_UNSET \
+  (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VBMI_UNSET)
 #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL
 #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA
 #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7a39f80..91eae5a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41669,7 +41669,7 @@ ix86_hard_regno_mode_ok (int regno, machine_mode mode)
 return VALID_FP_MODE_P (mode);
   if (MASK_REGNO_P (regno))
 return (VALID_MASK_REG_MODE (mode)
-   || ((TARGET_AVX512BW || TARGET_AVX512VBMI)
+   || (TARGET_AVX512BW
 VALID_MASK_AVX512BW_MODE (mode)));
   if (BND_REGNO_P (regno))
 return VALID_BND_REG_MODE (mode);
diff --git a/gcc/testsuite/gcc.target/i386/pr64393.c 
b/gcc/testsuite/gcc.target/i386/pr64393.c
new file mode 100644
index 000..37a0e48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr64393.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -O -mavx512vbmi } */
+
+int a[1024];
+
+void
+foo (int i)
+{
+  for (;; i++)
+if (a[i] != (i ^ (i * 3) ^ (i * 7)))
+  return;
+}
-- 
1.8.3.1

Re: [PATCH] PR64387

2015-01-14 Thread Ilya Tocar

On 14 Jan 12:36, Uros Bizjak wrote:
 On Wed, Jan 14, 2015 at 12:18 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64387
  Which was caused by different in predicates between vec_unpacks_hi
  and vec_extract_hi.
 
 Why are vec_unpacks_hi_{v8sf,v16sf} expanders different than
 vec_unpacks_hi_v4sf? I think that these should all be expand in the
 same way, similar to vec_unpacks_hi_v4sf.

In v4sf case we use movhlps, which is not avalible in v{8,16}sf case.

[PATCH] PR64386

2015-01-14 Thread Ilya Tocar

Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64386
Ok for trunk?

ChangeLog:

gcc/
PR target/64386
* config/i386/i386.c (ix86_expand_sse_cmp): Handle V64QImode,
V32HImode. 

testsuite/
PR target/64386
* gcc.target/i386/pr64386.c: New test.

---
 gcc/config/i386/i386.c  |  8 
 gcc/testsuite/gcc.target/i386/pr64386.c | 14 ++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr64386.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 91eae5a..f358ac2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21318,6 +21318,14 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx 
cmp_op0, rtx cmp_op1,
 
   switch (cmp_ops_mode)
{
+   case V64QImode:
+ gcc_assert (TARGET_AVX512BW);
+ gen = code == GT ? gen_avx512bw_gtv64qi3 : gen_avx512bw_eqv64qi3_1;
+ break;
+   case V32HImode:
+ gcc_assert (TARGET_AVX512BW);
+ gen = code == GT ? gen_avx512bw_gtv32hi3 : gen_avx512bw_eqv32hi3_1;
+ break;
case V16SImode:
  gen = code == GT ? gen_avx512f_gtv16si3 : gen_avx512f_eqv16si3_1;
  break;
diff --git a/gcc/testsuite/gcc.target/i386/pr64386.c 
b/gcc/testsuite/gcc.target/i386/pr64386.c
new file mode 100644
index 000..fc152cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr64386.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options -O -ftree-vectorize -mavx512bw } */
+
+char ac[64], bc[64], ec[64];
+
+void fc (void)
+{
+  int i;
+  for (i = 0; i  64; i++)
+{
+  char e = ec[i];
+  ac[i] = bc[i] ? : e;
+}
+}
-- 
1.8.3.1

[PATCH x86] Add march/mtune=knl

2014-12-10 Thread Ilya Tocar

Hi,

Patch bellow adds march/mtune/attribute=knl.
For now this is just silvermont tuning and avx/avx2/avx512 support.
Ok for trunk?

gcc/
* config.gcc: Support knl.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect knl.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
PROCESSOR_KNL.
* config/i386/i386.c (m_KNL): Define.
(processor_target_table): Add knl.
(PTA_KNL): Define.
(ix86_issue_rate): Add PROCESSOR_KNL.
(ix86_adjust_cost): Ditto.
(ia32_multipass_dfa_lookahead): Ditto.
(get_builtin_code_for_version): Handle knl.
(fold_builtin_cpu): Ditto.
* config/i386/i386.h (TARGET_KNL): Define.
(processor_type): Add PROCESSOR_KNL.
* config/i386/i386.md (attr cpu): Add knl.
* config/i386/x86-tune.def: Add m_KNL.

gcc/testsuite/
* gcc.target/i386/funcspec-5.c: Test avx512f and knl.

---
 gcc/config.gcc |  3 +-
 gcc/config/i386/driver-i386.c  |  6 +++-
 gcc/config/i386/i386-c.c   |  7 +
 gcc/config/i386/i386.c | 17 ++-
 gcc/config/i386/i386.h |  2 ++
 gcc/config/i386/i386.md|  2 +-
 gcc/config/i386/x86-tune.def   | 47 +++---
 gcc/testsuite/gcc.target/i386/funcspec-5.c |  3 ++
 8 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fa3e1fc..8541274 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -591,7 +591,8 @@ pentium4 pentium4m pentiumpro prescott
 x86_64_archs=amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
 bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
 core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \
-sandybridge ivybridge haswell broadwell bonnell silvermont x86-64 native
+sandybridge ivybridge haswell broadwell bonnell silvermont knl x86-64 \
+native
 
 # Additional x86 processors supported by --with-cpu=.  Each processor
 # MUST be separated by exactly one space.
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index a2248ce..69ebebd 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -747,7 +747,11 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  if (arch)
{
  /* This is unknown family 0x6 CPU.  */
- if (has_adx)
+ /* Assume Knl.  */
+ if (has_avx512f)
+   cpu = knl;
+ /* Assume Broadwell.  */
+ else if (has_adx)
cpu = broadwell;
  else if (has_avx2)
/* Assume Haswell.  */
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 3ad7d49..1c604fc3 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -171,6 +171,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   def_or_undef (parse_in, __silvermont);
   def_or_undef (parse_in, __silvermont__);
   break;
+case PROCESSOR_KNL:
+  def_or_undef (parse_in, __knl);
+  def_or_undef (parse_in, __knl__);
+  break;
 /* use PROCESSOR_max to not set/unset the arch macro.  */
 case PROCESSOR_max:
   break;
@@ -277,6 +281,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   def_or_undef (parse_in, __tune_slm__);
   def_or_undef (parse_in, __tune_silvermont__);
   break;
+case PROCESSOR_KNL:
+  def_or_undef (parse_in, __tune_knl__);
+  break;
 case PROCESSOR_INTEL:
 case PROCESSOR_GENERIC:
   break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1e1716e..f0cbe48 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2040,6 +2040,7 @@ const struct processor_costs *ix86_cost = pentium_cost;
 #define m_CORE_ALL (m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_HASWELL)
 #define m_BONNELL (1PROCESSOR_BONNELL)
 #define m_SILVERMONT (1PROCESSOR_SILVERMONT)
+#define m_KNL (1PROCESSOR_KNL)
 #define m_INTEL (1PROCESSOR_INTEL)
 
 #define m_GEODE (1PROCESSOR_GEODE)
@@ -2505,6 +2506,7 @@ static const struct ptt 
processor_target_table[PROCESSOR_max] =
   {haswell, core_cost, 16, 10, 16, 10, 16},
   {bonnell, atom_cost, 16, 15, 16, 7, 16},
   {silvermont, slm_cost, 16, 15, 16, 7, 16},
+  {knl, slm_cost, 16, 15, 16, 7, 16},
   {intel, intel_cost, 16, 15, 16, 7, 16},
   {geode, geode_cost, 0, 0, 0, 0, 0},
   {k6, k6_cost, 32, 7, 32, 7, 32},
@@ -3178,6 +3180,8 @@ ix86_option_override_internal (bool main_args_p,
| PTA_FMA | PTA_MOVBE | PTA_HLE)
 #define PTA_BROADWELL \
   (PTA_HASWELL | PTA_ADX | PTA_PRFCHW | PTA_RDSEED)
+#define PTA_KNL \
+  (PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD)
 #define PTA_BONNELL \
   (PTA_CORE2 | PTA_MOVBE)
 #define PTA_SILVERMONT \
@@ -3241,6 +3245,7 @@ ix86_option_override_internal (bool main_args_p,
   {atom, PROCESSOR_BONNELL,

Re: [PATCH x86] Enable v64qi permutations.

2014-12-05 Thread Ilya Tocar

On 04 Dec 15:16, Uros Bizjak wrote:
 On Thu, Dec 4, 2014 at 2:53 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
Can you add a few testcases?
   
Isn't it already covered by gcc.dg/torture/vshuf* ?
   
  
   I didn't see them fail on my machines today.
  
   Those are executable testcases, those better should not fail.
   The patch just improved code generation and the testcases test
   if the improved code generation works well.
   Did you mean some scan-assembler test that verifies the better code
   generation?  Guess it is possible, though fragile.
  
   I think that existing executable testcases adequately cover the
   functionality of the patch.
  
   The patch is OK.
 
  BTW, the ChangeLog is missing.
 
  * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
  (expand_vec_perm_broadcast_1): Ditto.
  (expand_vec_perm_vpermi2_vpshub2): New.
  (ix86_expand_vec_perm_const_1): Use it.
  (ix86_vectorize_vec_perm_const_ok): Handle v64qi.
  * config/i386/sse.md (VEC_PERM_AVX2): Add v64qi.
  (VEC_PERM_CONST): Ditto.
  index ca5d720..6252e7e 100644
  --- a/gcc/config/i386/sse.md
  +++ b/gcc/config/i386/sse.md
  @@ -10678,7 +10678,7 @@
  (V8SF TARGET_AVX2) (V4DF TARGET_AVX2)
  (V16SF TARGET_AVX512F) (V8DF TARGET_AVX512F)
  (V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F)
  -   (V32HI TARGET_AVX512BW)])
  +   (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512VBMI)])
 
  I don't think change for VBMI target belongs in this patch.
 
  Those changes enable non-const v64qi permutes
  (via single vpermi2b insn), should I split them into separate patch?
 
 If they are not on the same topic, then please yes. Please don't mix
 separate issues together.

OK.
Patch bellow adds variable v64qi permutations.
OK for trunk?
(I plan to commit both of them simultaneously, if this part is approved)

 * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
 * config/i386/sse.md (VEC_PERM_AVX2): Add v64qi.
---
 gcc/config/i386/i386.c | 4 
 gcc/config/i386/sse.md | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ce5dfad..c4dbf78 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21831,6 +21831,10 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx 
mask, rtx op1,
   if (TARGET_AVX512VL  TARGET_AVX512BW)
gen = gen_avx512vl_vpermi2varv16hi3;
   break;
+case V64QImode:
+  if (TARGET_AVX512VBMI)
+   gen = gen_avx512bw_vpermi2varv64qi3;
+  break;
 case V32HImode:
   if (TARGET_AVX512BW)
gen = gen_avx512bw_vpermi2varv32hi3;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 734e6b4..cfbe40c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10691,7 +10691,7 @@
(V8SF TARGET_AVX2) (V4DF TARGET_AVX2)
(V16SF TARGET_AVX512F) (V8DF TARGET_AVX512F)
(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F)
-   (V32HI TARGET_AVX512BW)])
+   (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512VBMI)])
 
 (define_expand vec_permmode
   [(match_operand:VEC_PERM_AVX2 0 register_operand)
-- 
1.8.3.1

[PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Ilya Tocar

Hi,

As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
This patch enables v64qi permutations.
I've checked  vshuf* tests from dg-torture.exp,
with avx512* options on sde and generated permutations are correct.

OK for trunk?

---
 gcc/config/i386/i386.c | 85 ++
 gcc/config/i386/sse.md |  4 +--
 2 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..f29f8ce 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21831,6 +21831,10 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx 
mask, rtx op1,
   if (TARGET_AVX512VL  TARGET_AVX512BW)
gen = gen_avx512vl_vpermi2varv16hi3;
   break;
+case V64QImode:
+  if (TARGET_AVX512VBMI)
+   gen = gen_avx512bw_vpermi2varv64qi3;
+  break;
 case V32HImode:
   if (TARGET_AVX512BW)
gen = gen_avx512bw_vpermi2varv32hi3;
@@ -48872,6 +48876,7 @@ expand_vec_perm_broadcast_1 (struct expand_vec_perm_d 
*d)
emit_move_insn (d-target, gen_lowpart (d-vmode, dest));
   return true;
 
+case V64QImode:
 case V32QImode:
 case V16HImode:
 case V8SImode:
@@ -48905,6 +48910,78 @@ expand_vec_perm_broadcast (struct expand_vec_perm_d *d)
   return expand_vec_perm_broadcast_1 (d);
 }
 
+/* Implement arbitrary permutations of two V64QImode operands
+   will 2 vpermi2w, 2 vpshufb and one vpor instruction.  */
+static bool
+expand_vec_perm_vpermi2_vpshub2 (struct expand_vec_perm_d *d)
+{
+  if (!TARGET_AVX512BW || !(d-vmode == V64QImode))
+return false;
+
+  if (d-testing_p)
+return true;
+
+  struct expand_vec_perm_d ds[2];
+  rtx rperm[128], vperm, target0, target1;
+  unsigned int i, nelt;
+  machine_mode vmode;
+
+  nelt = d-nelt;
+  vmode = V64QImode;
+
+  for (i = 0; i  2; i++)
+{
+  ds[i] = *d;
+  ds[i].vmode = V32HImode;
+  ds[i].nelt = 32;
+  ds[i].target = gen_reg_rtx (V32HImode);
+  ds[i].op0 = gen_lowpart (V32HImode, d-op0);
+  ds[i].op1 = gen_lowpart (V32HImode, d-op1);
+}
+
+  /* Prepare permutations such that the first one takes care of
+ putting the even bytes into the right positions or one higher
+ positions (ds[0]) and the second one takes care of
+ putting the odd bytes into the right positions or one below
+ (ds[1]).  */
+
+  for (i = 0; i  nelt; i++)
+{
+  ds[i  1].perm[i / 2] = d-perm[i] / 2;
+  if (i  1)
+   {
+ rperm[i] = constm1_rtx;
+ rperm[i + 64] = GEN_INT ((i  14) + (d-perm[i]  1));
+   }
+  else
+   {
+ rperm[i] = GEN_INT ((i  14) + (d-perm[i]  1));
+ rperm[i + 64] = constm1_rtx;
+   }
+}
+
+  bool ok = expand_vec_perm_1 (ds[0]);
+  gcc_assert (ok);
+  ds[0].target = gen_lowpart (V64QImode, ds[0].target);
+
+  ok = expand_vec_perm_1 (ds[1]);
+  gcc_assert (ok);
+  ds[1].target = gen_lowpart (V64QImode, ds[1].target);
+
+  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm));
+  vperm = force_reg (vmode, vperm);
+  target0 = gen_reg_rtx (V64QImode);
+  emit_insn (gen_avx512bw_pshufbv64qi3 (target0, ds[0].target, vperm));
+
+  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm + 64));
+  vperm = force_reg (vmode, vperm);
+  target1 = gen_reg_rtx (V64QImode);
+  emit_insn (gen_avx512bw_pshufbv64qi3 (target1, ds[1].target, vperm));
+
+  emit_insn (gen_iorv64qi3 (d-target, target0, target1));
+  return true;
+}
+
 /* Implement arbitrary permutation of two V32QImode and V16QImode operands
with 4 vpshufb insns, 2 vpermq and 3 vpor.  We should have already failed
all the shorter instruction sequences.  */
@@ -49079,6 +49156,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
   if (expand_vec_perm_vpshufb2_vpermq_even_odd (d))
 return true;
 
+  if (expand_vec_perm_vpermi2_vpshub2 (d))
+return true;
+
   /* ??? Look for narrow permutations whose element orderings would
  allow the promotion to a wider mode.  */
 
@@ -49223,6 +49303,11 @@ ix86_vectorize_vec_perm_const_ok (machine_mode vmode,
/* All implementable with a single vpermi2 insn.  */
return true;
   break;
+case V64QImode:
+  if (TARGET_AVX512BW)
+   /* Implementable with 2 vpermi2, 2 vpshufb and 1 or insn.  */
+   return true;
+  break;
 case V8SImode:
 case V8SFmode:
 case V4DFmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ca5d720..6252e7e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10678,7 +10678,7 @@
(V8SF TARGET_AVX2) (V4DF TARGET_AVX2)
(V16SF TARGET_AVX512F) (V8DF TARGET_AVX512F)
(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F)
-   (V32HI TARGET_AVX512BW)])
+   (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512VBMI)])
 
 (define_expand vec_permmode
   [(match_operand:VEC_PERM_AVX2 0 register_operand)
@@ -10700,7 +10700,7 @@
(V32QI TARGET_AVX2) (V16HI TARGET_AVX2)
(V16SI TARGET_AVX512F)

Re: [PING^5][PATCH] Warn about unclosed pragma omp declare target.

2014-12-04 Thread Ilya Tocar

Ping.
On 19 Nov 16:34, Ilya Tocar wrote:
 As omp target and offloading support is committed to trunk,
 I think it's reasonable to add some new warnings.
 
 On 06 Nov 15:27, Ilya Tocar wrote:
  Ping.
  On 30 Oct 18:31, Ilya Tocar wrote:
   Ping.
   On 20 Oct 19:26, Ilya Tocar wrote:
Ping.

On 02 Oct 17:38, Ilya Tocar wrote:
 Ping.
 On 15 Aug 16:26, Ilya Tocar wrote:
  Ping.
  
  On 29 Jul 18:45, Ilya Tocar wrote:
   Hi,
   
   As discussed here in 
   https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
   Gcc should complain about pragma omp declare target without
   corresponding pragma omp end declare target. This patch adds a 
   warning
   for those cases.
   Bootstraps/passes make-check.
   Ok for trunk?
   
   ChangeLog:
   
   2014-07-29  Ilya Tocar  ilya.to...@intel.com
   
 * c-decl.c (omp_declare_target_location_stack): New.
 * c-lang.h (omp_declare_target_location_stack): Declare.
 * c-parser.c (warn_unclosed_pragma_omp_target): New.
 (c_parser_translation_unit): Call it.
 (c_parser_omp_declare_target): Remeber location.
 (c_parser_omp_end_declare_target): Forget location.
   
   And ChangeLog for testsuite:
   
   2014-07-29  Ilya Tocar  ilya.to...@intel.com
   
 * gcc.dg/gomp//target-3.c: New testcase.
   
   ---
gcc/c/c-decl.c   |  3 +++
gcc/c/c-lang.h   |  3 +++
gcc/c/c-parser.c | 22 +-
gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
   +
4 files changed, 60 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
   
   diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
   index 2a4b439..2dd5b2c 100644
   --- a/gcc/c/c-decl.c
   +++ b/gcc/c/c-decl.c
   @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = 
   VOIDmode;
/* If non-zero, implicit omp declare target attribute is added 
   into the
   attribute lists.  */
int current_omp_declare_target_attribute;
   +
   +/* Holds locations of currently open omp declare target 
   pragmas.  */
   +veclocation_t omp_declare_target_location_stack;

/* Each c_binding structure describes one binding of an 
   identifier to
   a decl.  All the decls in a scope - irrespective of namespace 
   - are
   diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
   index e974906..cef995c 100644
   --- a/gcc/c/c-lang.h
   +++ b/gcc/c/c-lang.h
   @@ -59,4 +59,7 @@ struct GTY(()) language_function {
   attribute lists.  */
extern GTY(()) int current_omp_declare_target_attribute;

   +/* Holds locations of currently open omp declare target 
   pragmas.  */
   +extern veclocation_t omp_declare_target_location_stack;
   +
#endif /* ! GCC_C_LANG_H */
   diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
   index e32bf04..0b96fe9 100644
   --- a/gcc/c/c-parser.c
   +++ b/gcc/c/c-parser.c
   @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd 
   (c_parser *, enum pragma_context);
static tree c_parser_array_notation (location_t, c_parser *, 
   tree, tree);
static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, 
   bool);

   +static void warn_unclosed_pragma_omp_target ();
   +
/* Parse a translation unit (C90 6.7, C99 6.9).

   translation-unit:
   @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
 }
  while (c_parser_next_token_is_not (parser, CPP_EOF));
}
   +
   +  warn_unclosed_pragma_omp_target ();
}

/* Parse an external declaration (C90 6.7, C99 6.9).
   @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser 
   *parser, tree fndecl, tree parms,
static void
c_parser_omp_declare_target (c_parser *parser)
{
   +  location_t loc = c_parser_peek_token (parser)-location;
  c_parser_skip_to_pragma_eol (parser);
  current_omp_declare_target_attribute++;
   +  omp_declare_target_location_stack.safe_push (loc);
}

static void
   @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target 
   (c_parser *parser)
error_at (loc, %#pragma omp end declare target% without 
   corresponding 
%#pragma omp declare target%);
  else
   -current_omp_declare_target_attribute--;
   +{
   +  current_omp_declare_target_attribute--;
   +  omp_declare_target_location_stack.pop ();
   +}
}


   @@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, 
   c_parser *parser

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Ilya Tocar

On 04 Dec 13:51, Uros Bizjak wrote:
 On Thu, Dec 4, 2014 at 1:45 PM, Uros Bizjak ubiz...@gmail.com wrote:
  On Thu, Dec 4, 2014 at 1:04 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
   Can you add a few testcases?
  
   Isn't it already covered by gcc.dg/torture/vshuf* ?
  
 
  I didn't see them fail on my machines today.
 
  Those are executable testcases, those better should not fail.
  The patch just improved code generation and the testcases test
  if the improved code generation works well.
  Did you mean some scan-assembler test that verifies the better code
  generation?  Guess it is possible, though fragile.
 
  I think that existing executable testcases adequately cover the
  functionality of the patch.
 
  The patch is OK.
 
 BTW, the ChangeLog is missing.
 
* config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
(expand_vec_perm_broadcast_1): Ditto.
(expand_vec_perm_vpermi2_vpshub2): New.
(ix86_expand_vec_perm_const_1): Use it.
(ix86_vectorize_vec_perm_const_ok): Handle v64qi.
* config/i386/sse.md (VEC_PERM_AVX2): Add v64qi.
(VEC_PERM_CONST): Ditto.
 index ca5d720..6252e7e 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -10678,7 +10678,7 @@
 (V8SF TARGET_AVX2) (V4DF TARGET_AVX2)
 (V16SF TARGET_AVX512F) (V8DF TARGET_AVX512F)
 (V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F)
 -   (V32HI TARGET_AVX512BW)])
 +   (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512VBMI)])
 
 I don't think change for VBMI target belongs in this patch.

Those changes enable non-const v64qi permutes
(via single vpermi2b insn), should I split them into separate patch?

Re: [PATCH x86] Update options for avx512 testsuite.

2014-12-01 Thread Ilya Tocar

  I saw
 
  FAIL: gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c
  scan-tree-dump-times vect vectorized 1 loops 1
  FAIL: gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c
  scan-tree-dump-times vect vectorized 1 loops 1
  FAIL: gcc.target/i386/avx256-unaligned-store-2.c scan-assembler
  vmovups.**movv16qi_internal/3
  FAIL: gcc.target/i386/avx512ifma-vpmaddhuq-2.c (test for excess errors)
  UNRESOLVED: gcc.target/i386/avx512ifma-vpmaddhuq-2.c compilation
  failed to produce executable
  FAIL: gcc.target/i386/avx512ifma-vpmaddluq-2.c (test for excess errors)
  UNRESOLVED: gcc.target/i386/avx512ifma-vpmaddluq-2.c compilation
  failed to produce executable
  FAIL: gcc.target/i386/avx512vbmi-vpermb-2.c (test for excess errors)
  UNRESOLVED: gcc.target/i386/avx512vbmi-vpermb-2.c compilation failed
  to produce executable
  FAIL: gcc.target/i386/avx512vbmi-vpermi2b-2.c (test for excess errors)
  UNRESOLVED: gcc.target/i386/avx512vbmi-vpermi2b-2.c compilation failed
  to produce executable
  FAIL: gcc.target/i386/avx512vbmi-vpermt2b-2.c (test for excess errors)
  UNRESOLVED: gcc.target/i386/avx512vbmi-vpermt2b-2.c compilation failed
  to produce executable
  FAIL: gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c (test for excess errors)
  UNRESOLVED: gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c compilation
  failed to produce executable
 
  on x86:
 
  https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg00030.html
 
 
 
 I took a look at one of them:
 
 diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c
 b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c
 index 936d938..861dce2 100644
 --- a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c
 +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c
 @@ -1,5 +1,5 @@
  /* { dg-do run } */
 -/* { dg-options -O2 -mavx512vbmi -DAVX512VBMI } */
 +/* { dg-options -O2 -mavx512vbmi } */
  /* { dg-require-effective-target avx512vbmi } */
 
  #include avx512f-helper.h
 
 There is no #define AVX512VBM added.
 

My bad,

Patch below fixed avx512* tests.
No idea about cost-model ones.

---
 gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-2.c  | 2 ++
 gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-2.c  | 2 ++
 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c | 2 ++
 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c   | 2 ++
 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c   | 2 ++
 gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c | 2 ++
 6 files changed, 12 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-2.c 
b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-2.c
index 79f3da9..78af9d4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-2.c
@@ -2,6 +2,8 @@
 /* { dg-options -O2 -mavx512ifma } */
 /* { dg-require-effective-target avx512ifma } */
 
+#define AVX512IFMA
+
 #include avx512f-helper.h
 
 #define SIZE (AVX512F_LEN / 64)
diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-2.c 
b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-2.c
index f6e4db1..ce38beb 100644
--- a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-2.c
@@ -2,6 +2,8 @@
 /* { dg-options -O2 -mavx512ifma } */
 /* { dg-require-effective-target avx512ifma } */
 
+#define AVX512IFMA
+
 #include avx512f-helper.h
 
 #define SIZE (AVX512F_LEN / 64)
diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c 
b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c
index 3027cf6..da1a22e 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c
@@ -2,6 +2,8 @@
 /* { dg-options -O2 -mavx512vbmi } */
 /* { dg-require-effective-target avx512vbmi } */
 
+#define AVX512VBMI
+
 #include avx512f-helper.h
 
 #define SIZE (AVX512F_LEN / 8)
diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c 
b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c
index cb69fc5..31afc52 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c
@@ -2,6 +2,8 @@
 /* { dg-options -O2 -mavx512vbmi } */
 /* { dg-require-effective-target avx512vbmi } */
 
+#define AVX512VBMI
+
 #include avx512f-helper.h
 
 #define SIZE (AVX512F_LEN / 8)
diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c 
b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c
index f6cb5b7..cc03426 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c
@@ -2,6 +2,8 @@
 /* { dg-options -O2 -mavx512vbmi } */
 /* { dg-require-effective-target avx512vbmi } */
 
+#define AVX512VBMI
+
 #include avx512f-helper.h
 
 #define SIZE (AVX512F_LEN / 8)
diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c

Re: [PATCH, i386] Add new arg values for __builtin_cpu_supports

2014-11-26 Thread Ilya Tocar

  I think using cpuid for that is just fine.  __builtin_cpu_supports
  is for ISA additions users might actually want to version code for,
  MPX stuff, as the instructions are nops without hw support, are not
  something one would multi-version a function for.
  If anything, AVX512F and AVX512BW+VL might be good candidates for that,
  not
  MPX.
 
  SOrry, I didn't know the __builtin_cpu_supports was really only ment for
  user multi-versioning.  In that case, it won't make any sense to put the MPX
  stuff in there.
 
  Sorry for sending you down a wrong path Ilya.
 
 It's OK, AVX guys will just transform this MPX patch into AVX512 one :)

Hi,

I've added avx512f support to __builtin_cpu_supports.
I'm not sure about bw+vl, i think for compound values like
avx512bd+dq+vl, arch is better. Also for such cases prority is unclear,
what should we choose bw+vl  or e. g. avx512f+er?
I've left MPX bits in cpuid.h, in case we will need them later (e. g.
for runtime mpx tests enabling).

Ok for trunk?

gcc/

* config/i386/cpuid.h (bit_MPX, bit_BNDREGS, bit_BNDCSR):
Define.
* config/i386/i386.c (get_builtin_code_for_version): Add avx512f.
(fold_builtin_cpu): Ditto.
* doc/extend.texi: Documment it.


gcc/testsuite/

* g++.dg/ext/mv2.C: Add test for target (avx512f).
* gcc.target/i386/builtin_target.c: Ditto.


libgcc/

* config/i386/cpuinfo.c (processor_features): Add FEATURE_AVX512F.
* config/i386/cpuinfo.c (get_available_features): Detect it.
 
---
 gcc/config/i386/cpuid.h|  5 +++
 gcc/config/i386/i386.c | 10 +++--
 gcc/doc/extend.texi|  2 +
 gcc/testsuite/g++.dg/ext/mv2.C | 51 +++---
 gcc/testsuite/gcc.target/i386/builtin_target.c |  4 ++
 libgcc/config/i386/cpuinfo.c   |  5 ++-
 6 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 6c6e7f3..e3b7646 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -72,6 +72,7 @@
 #define bit_AVX2   (1  5)
 #define bit_BMI2   (1  8)
 #define bit_RTM(1  11)
+#define bit_MPX(1  14)
 #define bit_AVX512F(1  16)
 #define bit_AVX512DQ   (1  17)
 #define bit_RDSEED (1  18)
@@ -91,6 +92,10 @@
 #define bit_PREFETCHWT1  (1  0)
 #define bit_AVX512VBMI (1  1)
 
+/* XFEATURE_ENABLED_MASK register bits (%eax == 13, %ecx == 0) */
+#define bit_BNDREGS (1  3)
+#define bit_BNDCSR  (1  4)
+
 /* Extended State Enumeration Sub-leaf (%eax == 13, %ecx == 1) */
 #define bit_XSAVEOPT   (1  0)
 #define bit_XSAVEC (1  1)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..2493130 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34235,7 +34235,8 @@ get_builtin_code_for_version (tree decl, tree 
*predicate_list)
 P_FMA,
 P_PROC_FMA,
 P_AVX2,
-P_PROC_AVX2
+P_PROC_AVX2,
+P_AVX512F
   };
 
  enum feature_priority priority = P_ZERO;
@@ -34263,7 +34264,8 @@ get_builtin_code_for_version (tree decl, tree 
*predicate_list)
   {fma4, P_FMA4},
   {xop, P_XOP},
   {fma, P_FMA},
-  {avx2, P_AVX2}
+  {avx2, P_AVX2},
+  {avx512f, P_AVX512F}
 };
 
 
@@ -35238,6 +35240,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
 F_FMA4,
 F_XOP,
 F_FMA,
+F_AVX512F,
 F_MAX
   };
 
@@ -35326,7 +35329,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
   {fma4,   F_FMA4},
   {xop,F_XOP},
   {fma,F_FMA},
-  {avx2,   F_AVX2}
+  {avx2,   F_AVX2},
+  {avx512f,F_AVX512F}
 };
 
   tree __processor_model_type = build_processor_model_struct ();
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7178c9a..773e14c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -11642,6 +11642,8 @@ SSE4.2 instructions.
 AVX instructions.
 @item avx2
 AVX2 instructions.
+@item avx512f
+AVX512F instructions.
 @end table
 
 Here is an example:
diff --git a/gcc/testsuite/g++.dg/ext/mv2.C b/gcc/testsuite/g++.dg/ext/mv2.C
index 869e99b..d4f1f92 100644
--- a/gcc/testsuite/g++.dg/ext/mv2.C
+++ b/gcc/testsuite/g++.dg/ext/mv2.C
@@ -20,31 +20,34 @@ int foo () __attribute__ ((target (sse4.2)));
 int foo () __attribute__ ((target (popcnt)));
 int foo () __attribute__ ((target (avx)));
 int foo () __attribute__ ((target (avx2)));
+int foo () __attribute__ ((target (avx512f)));
 
 int main ()
 {
   int val = foo ();
 
-  if (__builtin_cpu_supports (avx2))
-assert (val == 1);
+  if (__builtin_cpu_supports (avx512f))
+assert (val == 11);
+  else if (__builtin_cpu_supports (avx2))
+assert (val == 10);
   else if (__builtin_cpu_supports (avx))
-assert (val == 2);
+assert (val == 9);
   else if (__builtin_cpu_supports (popcnt))
-assert (val == 3);
+assert (val == 8);
   else if (__builtin_cpu_supports (sse4.2))
-assert (val == 4);
+assert (val ==

[PATCH] Remove unnecessary calls to strchr.

2014-11-25 Thread Ilya Tocar

Hi,

As proposed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63853
this patch replaces some function calls with pointer arithmetic.
I didn't mention PR in Changelog, as they are not actually related.
Ok for trunk?


gcc/
* gcc.c (handle_foffload_option): Remove unnecessary calls to strchr,
strlen, strncpy.
* lto-wrapper.c (append_offload_options): Likewise.

---
 gcc/gcc.c | 24 +---
 gcc/lto-wrapper.c |  2 +-
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 653ca8d..4731eec 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -3384,11 +3384,11 @@ handle_foffload_option (const char *arg)
 {
   next = strchr (cur, ',');
   if (next == NULL)
-   next = strchr (cur, '\0');
+   next = end;
   next = (next  end) ? end : next;
 
   target = XNEWVEC (char, next - cur + 1);
-  strncpy (target, cur, next - cur);
+  memcpy (target, cur, next - cur);
   target[next - cur] = '\0';
 
   /* If 'disable' is passed to the option, stop parsing the option and 
clean
@@ -3408,8 +3408,7 @@ handle_foffload_option (const char *arg)
  if (n == NULL)
n = strchr (c, '\0');
 
- if (strlen (target) == (size_t) (n - c)
-  strncmp (target, c, n - c) == 0)
+ if (next - cur == n - c  strncmp (target, c, n - c) == 0)
break;
 
  c = *n ? n + 1 : NULL;
@@ -3420,7 +3419,10 @@ handle_foffload_option (const char *arg)
 target);
 
   if (!offload_targets)
-   offload_targets = xstrdup (target);
+   {
+ offload_targets = target;
+ target = NULL;
+   }
   else
{
  /* Check that the target hasn't already presented in the list.  */
@@ -3431,8 +3433,7 @@ handle_foffload_option (const char *arg)
  if (n == NULL)
n = strchr (c, '\0');
 
- if (strlen (target) == (size_t) (n - c)
-  strncmp (c, target, n - c) == 0)
+ if (next - cur == n - c  strncmp (c, target, n - c) == 0)
break;

Re: [PATCH 1/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.

2014-11-21 Thread Ilya Tocar

On 20 Nov 09:43, Uros Bizjak wrote:
 On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  New revision of Intel ISA reference [1] has new instructions:
  Clwb, pcommit and new flavors of AVX512. Patch bellow adds them.
  I understand that stage 1 is closed, however those changes shouldn't
  affect anything outside if i386 backend. And are extremely unlikely to
  break existing functionality, and I personally think it's desirable for
  newest GCC to support newest spec.
  Bootstrapped/regtestsed on x86_64-unknown-linux-gnu.
  Ok for trunk?
 
 Please split the patch into patch series, like it was done previously
 for AVX512F patches.
 
 Uros.


This part adds avx512ifma.
Bootstraps/passes make check.

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512IFMA_SET,
, OPTION_MASK_ISA_AVX512IFMA_UNSET): New.
(ix86_handle_option): Handle OPT_mavx512ifma.
* config.gcc: Add avx512ifmaintrin.h, avx512ifmavlintrin.h.
* config/i386/avx512ifmaintrin.h: New file.
* config/i386/avx512ifmaivlntrin.h: Ditto.
* config/i386/cpuid.h (bit_AVX512IFMA): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
avx512ifma.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512IFMA__.
* config/i386/i386.c (ix86_target_string): Add -mavx512ifma.
(PTA_AVX512IFMA): Define.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Add avx512ifma.
(ix86_builtins): Add IX86_BUILTIN_VPMADD52LUQ512,
IX86_BUILTIN_VPMADD52HUQ512, IX86_BUILTIN_VPMADD52LUQ256,
IX86_BUILTIN_VPMADD52HUQ256, IX86_BUILTIN_VPMADD52LUQ128,
IX86_BUILTIN_VPMADD52HUQ128, IX86_BUILTIN_VPMADD52LUQ512_MASKZ,
IX86_BUILTIN_VPMADD52HUQ512_MASKZ, IX86_BUILTIN_VPMADD52LUQ256_MASKZ,
IX86_BUILTIN_VPMADD52HUQ256_MASKZ, IX86_BUILTIN_VPMADD52LUQ128_MASKZ,
IX86_BUILTIN_VPMADD52HUQ128_MASKZ.
(bdesc_special_args): Add __builtin_ia32_vpmadd52luq512_mask,
__builtin_ia32_vpmadd52luq512_maskz,
__builtin_ia32_vpmadd52huq512_mask,
__builtin_ia32_vpmadd52huq512_maskx,
__builtin_ia32_vpmadd52luq256_mask,
__builtin_ia32_vpmadd52luq256_maskz,
__builtin_ia32_vpmadd52huq256_mask,
__builtin_ia32_vpmadd52huq256_maskz,
__builtin_ia32_vpmadd52luq128_mask,
__builtin_ia32_vpmadd52luq128_maskz,
__builtin_ia32_vpmadd52huq128_mask,
__builtin_ia32_vpmadd52huq128_maskz,
* config/i386/i386.h (TARGET_AVX512IFMA, TARGET_AVX512IFMA_P): Define.
* config/i386/i386.opt: Add mavx512ifma.
* config/i386/immintrin.h: Include avx512ifmaintrin.h,
avx512ifmavlintrin.h.
* config/i386/sse.md (unspec): Add UNSPEC_VPMADD52LUQ,
UNSPEC_VPMADD52HUQ.
(VPMADD52): New iterator.
(vpmadd52type): New attribute.
(vpamdd52huqmode_maskz): New.
(vpamdd52luqmode_maskz): Ditto.
(vpamdd52vpmadd52typemodesd_maskz_name): Ditto.
(vpamdd52vpmadd52typemode_mask): Ditto.


gcc/testsuite/

* g++.dg/other/i386-2.C: Add -mavx512ifma.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/avx512f-helper.h: Add avx512ifma-check.h.
* gcc.target/i386/avx512ifma-check.h: New.
* gcc.target/i386/avx512ifma-vpmaddhuq-1.c: Ditto.
* gcc.target/i386/avx512ifma-vpmaddhuq-2.c: Ditto.
* gcc.target/i386/avx512ifma-vpmaddluq-1.c: Ditto.
* gcc.target/i386/avx512ifma-vpmaddluq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto.
* gcc.target/i386/i386.exp (check_effective_target_avx512ifma): New.
* gcc.target/i386/sse-12.c: Add new options.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

---
 gcc/common/config/i386/i386-common.c   |  16 ++
 gcc/config.gcc |   6 +-
 gcc/config/i386/avx512ifmaintrin.h | 104 +
 gcc/config/i386/avx512ifmavlintrin.h   | 164 +
 gcc/config/i386/cpuid.h|   1 +
 gcc/config/i386/driver-i386.c  |   5 +-
 gcc/config/i386/i386-c.c   |   2 +
 gcc/config/i386/i386.c |  35 +
 gcc/config/i386/i386.h |   2 +
 gcc/config/i386/i386.opt   |   4 +
 gcc/config/i386/immintrin.h|   4 +
 gcc/config/i386/sse.md |  69 +
 gcc/testsuite/g++.dg/other/i386-2.C|   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C|   2 +-
 gcc/testsuite/gcc.target/i386/avx512f-helper.h |   5 +
 gcc/testsuite

Re: [PATCH 2/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.

2014-11-21 Thread Ilya Tocar

On 20 Nov 09:43, Uros Bizjak wrote:
 On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  New revision of Intel ISA reference [1] has new instructions:
  Clwb, pcommit and new flavors of AVX512. Patch bellow adds them.
  I understand that stage 1 is closed, however those changes shouldn't
  affect anything outside if i386 backend. And are extremely unlikely to
  break existing functionality, and I personally think it's desirable for
  newest GCC to support newest spec.
  Bootstrapped/regtestsed on x86_64-unknown-linux-gnu.
  Ok for trunk?
 
 Please split the patch into patch series, like it was done previously
 for AVX512F patches.
 
 Uros.

This part adds avx512vbmi.
I'll send vpermi2b autogen patch together with v64qi const perm later.
Boostraps/passes make check.
Ok for trunk?


gcc/
* common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512VBMI_SET
OPTION_MASK_ISA_AVX512VBMI_UNSET): New.
(ix86_handle_option): Handle OPT_mavx512vbmi.
* config.gcc: Add avx512vbmiintrin.h, avx512vbmivlintrin.h.
* config/i386/avx512vbmiintrin.h: New file.
* config/i386/avx512vbmivlintrin.h: Ditto.
* config/i386/cpuid.h (bit_AVX512VBMI): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect avx512vbmi.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512VBMI__.
* config/i386/i386.c (ix86_target_string): Add -mavx512vbmi.
(PTA_AVX512VBMI): Define.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Add avx512vbmi,
(ix86_builtins): Add IX86_BUILTIN_VPMULTISHIFTQB512,
IX86_BUILTIN_VPMULTISHIFTQB256, IX86_BUILTIN_VPMULTISHIFTQB128,
IX86_BUILTIN_VPERMVARQI512_MASK, IX86_BUILTIN_VPERMT2VARQI512,
IX86_BUILTIN_VPERMT2VARQI512_MASKZ, IX86_BUILTIN_VPERMI2VARQI512,
IX86_BUILTIN_VPERMVARQI256_MASK, IX86_BUILTIN_VPERMVARQI128_MASK,
IX86_BUILTIN_VPERMT2VARQI256, IX86_BUILTIN_VPERMT2VARQI256_MASKZ,
IX86_BUILTIN_VPERMT2VARQI128, IX86_BUILTIN_VPERMI2VARQI256,
IX86_BUILTIN_VPERMI2VARQI128.
(bdesc_special_args): Add __builtin_ia32_vpmultishiftqb512_mask,
__builtin_ia32_vpmultishiftqb256_mask,
__builtin_ia32_vpmultishiftqb128_mask,
__builtin_ia32_permvarqi512_mask, __builtin_ia32_vpermt2varqi512_mask,
__builtin_ia32_vpermt2varqi512_maskz,
__builtin_ia32_vpermi2varqi512_mask, __builtin_ia32_permvarqi256_mask,
__builtin_ia32_permvarqi128_mask, __builtin_ia32_vpermt2varqi256_mask,
__builtin_ia32_vpermt2varqi256_maskz,
__builtin_ia32_vpermt2varqi128_mask,
__builtin_ia32_vpermt2varqi128_maskz,
__builtin_ia32_vpermi2varqi256_mask,
__builtin_ia32_vpermi2varqi128_mask.
(ix86_hard_regno_mode_ok): Allow big masks for AVX512VBMI.
* config/i386/i386.h (TARGET_AVX512VBMI, TARGET_AVX512VBMI_P): Define.
* config/i386/i386.opt: Add mavx512vbmi.
* config/i386/immintrin.h: Include avx512vbmiintrin.h,
avx512vbmivlintrin.h.
* config/i386/sse.md (unspec): Add UNSPEC_VPMULTISHIFT.
(VI1_AVX512VL): New iterator.
(avx512_permvarmodemask_name): Use it.
(avx512_vpermi2varmode3_maskz): Ditto.
(avx512_vpermi2varmode3sd_maskz_name): Ditto.
(avx512_vpermi2varmode3_mask): Ditto.
(avx512_vpermt2varmode3_maskz): Ditto.
(avx512_vpermt2varmode3sd_maskz_name): Ditto.
(avx512_vpermt2varmode3_mask): Ditto.
(vpmultishiftqbmodemask_name): Ditto.

gcc/testsuite/

* g++.dg/other/i386-2.C: Add -mavx512vbmi.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/avx512f-helper.h: Add avx512vbmi-check.h.
* gcc.target/i386/avx512vbmi-check.h: Ditto.
* gcc.target/i386/avx512vbmi-vpermb-1.c: Ditto.
* gcc.target/i386/avx512vbmi-vpermb-2.c: Ditto.
* gcc.target/i386/avx512vbmi-vpermi2b-1.c: Ditto.
* gcc.target/i386/avx512vbmi-vpermi2b-2.c: Ditto.
* gcc.target/i386/avx512vbmi-vpermt2b-1.c: Ditto.
* gcc.target/i386/avx512vbmi-vpermt2b-2.c: Ditto.
* gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c: Ditto.
* gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpermb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpermi2b-2.c: Ditto.
* gcc.target/i386/avx512vl-vpermt2b-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmultishiftqb-2.c: Ditto.
* gcc.target/i386/i386.exp (check_effective_target_avx512vbmi): New.
* gcc.target/i386/sse-12.c: Add new options.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

---
 gcc/common/config/i386/i386

Re: [PATCH 3/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.

2014-11-21 Thread Ilya Tocar

On 20 Nov 09:43, Uros Bizjak wrote:
 On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  New revision of Intel ISA reference [1] has new instructions:
  Clwb, pcommit and new flavors of AVX512. Patch bellow adds them.
  I understand that stage 1 is closed, however those changes shouldn't
  affect anything outside if i386 backend. And are extremely unlikely to
  break existing functionality, and I personally think it's desirable for
  newest GCC to support newest spec.
  Bootstrapped/regtestsed on x86_64-unknown-linux-gnu.
  Ok for trunk?
 
 Please split the patch into patch series, like it was done previously
 for AVX512F patches.
 
 Uros.

Done. This part adds clwb.
Bootstrapped/passes make-check.
Ok for trunk?

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_CLWB_UNSET,
OPTION_MASK_ISA_CLWB_SET): New.
(ix86_handle_option): Handle OPT_mclwb.
* config.gcc: Add clwbintrin.h.
* config/i386/clwbintrin.h: New file.
* config/i386/cpuid.h (bit_CLWB): Define.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect clwb. 
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__CLWB__.
* config/i386/i386.c (ix86_target_string): Add -mclwb.
(PTA_CLWB): Define.
(ix86_option_override_internal): Handle new option.
(ix86_valid_target_attribute_inner_p): Add clwb.
(ix86_builtins): Add IX86_BUILTIN_CLWB.
(ix86_init_mmx_sse_builtins): Add __builtin_ia32_clwb.
(ix86_expand_builtin): Handle IX86_BUILTIN_CLWB.
* config/i386/i386.h (TARGET_CLWB, TARGET_CLWB_P): Define.
* config/i386/i386.md (unspecv): Add UNSPECV_CLWB.
(clwb): New instruction.
* config/i386/i386.opt: Add mclwb.
* config/i386/x86intrin.h: Include clwbintrin.h.

gcc/testsuite/

* g++.dg/other/i386-2.C: Add -mclwb.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/clwb-1.c: New test.
* gcc.target/i386/sse-12.c: Add new options.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

---
 gcc/common/config/i386/i386-common.c   | 15 +++
 gcc/config.gcc |  4 +--
 gcc/config/i386/clwbintrin.h   | 49 ++
 gcc/config/i386/cpuid.h|  1 +
 gcc/config/i386/driver-i386.c  |  6 +++--
 gcc/config/i386/i386-c.c   |  2 ++
 gcc/config/i386/i386.c | 23 
 gcc/config/i386/i386.h |  2 ++
 gcc/config/i386/i386.md| 12 +
 gcc/config/i386/i386.opt   |  4 +++
 gcc/config/i386/x86intrin.h|  2 ++
 gcc/testsuite/g++.dg/other/i386-2.C|  2 +-
 gcc/testsuite/g++.dg/other/i386-3.C|  2 +-
 gcc/testsuite/gcc.target/i386/clwb-1.c | 11 
 gcc/testsuite/gcc.target/i386/sse-12.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c |  2 +-
 19 files changed, 134 insertions(+), 11 deletions(-)
 create mode 100644 gcc/config/i386/clwbintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/clwb-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 1c4f15e..bad0988 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_XSAVE)
 #define OPTION_MASK_ISA_XSAVEC_SET \
   (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE)
+#define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -181,6 +182,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_CLFLUSHOPT_UNSET OPTION_MASK_ISA_CLFLUSHOPT
 #define OPTION_MASK_ISA_XSAVEC_UNSET OPTION_MASK_ISA_XSAVEC
 #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES
+#define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
as -mno-sse4.1. */
@@ -901,6 +903,19 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
+case OPT_mclwb:
+  if (value)
+   {
+ opts-x_ix86_isa_flags |= OPTION_MASK_ISA_CLWB_SET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLWB_SET;
+   }
+  else
+   {
+ opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_CLWB_UNSET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLWB_UNSET;
+   }
+  return true;
+
   /* Comes from final.c -- no real reason to change it.  */
 #define MAX_CODE_ALIGN 16
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index da2a723..766f13b 100644

Re: [PATCH][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.

2014-11-21 Thread Ilya Tocar

On 20 Nov 09:43, Uros Bizjak wrote:
 On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  New revision of Intel ISA reference [1] has new instructions:
  Clwb, pcommit and new flavors of AVX512. Patch bellow adds them.
  I understand that stage 1 is closed, however those changes shouldn't
  affect anything outside if i386 backend. And are extremely unlikely to
  break existing functionality, and I personally think it's desirable for
  newest GCC to support newest spec.
  Bootstrapped/regtestsed on x86_64-unknown-linux-gnu.
  Ok for trunk?
 
 Please split the patch into patch series, like it was done previously
 for AVX512F patches.
 
 Uros.
 
  [1]:https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
 

This part adds pcommit.
Bootstrapps/passes make check.
Ok for trunk?

gcc/


* common/config/i386/i386-common.c (OPTION_MASK_ISA_PCOMMIT_UNSET,
OPTION_MASK_ISA_PCOMMIT_SET): New.
(ix86_handle_option): Handle OPT_mpcommit.
* config.gcc: Add pcommitintrin.h
* config/i386/pcommitintrin.h: New file.
* config/i386/cpuid.h (bit_PCOMMIT): Define.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect pcommit.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__PCOMMIT__.
* config/i386/i386.c (ix86_target_string): Add -mpcommit.
(PTA_PCOMMIT): Define.
(ix86_option_override_internal): Handle new option.
(ix86_valid_target_attribute_inner_p): Add pcommit.
(ix86_builtins): Add IX86_BUILTIN_PCOMMIT.
(bdesc_special_args): Add __builtin_ia32_pcommit.
* config/i386/i386.h (TARGET_PCOMMIT, TARGET_PCOMMIT_P): Define.
* config/i386/i386.md (unspecv): Add UNSPECV_PCOMMIT.
(pcommit): New instruction.
* config/i386/i386.opt: Add mpcommit.
* config/i386/x86intrin.h: Include pcommitintrin.h.

 
---
 gcc/common/config/i386/i386-common.c  | 15 ++
 gcc/config.gcc|  4 +--
 gcc/config/i386/cpuid.h   |  1 +
 gcc/config/i386/driver-i386.c |  5 +++-
 gcc/config/i386/i386-c.c  |  2 ++
 gcc/config/i386/i386.c| 12 
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/i386.md   | 10 +++
 gcc/config/i386/i386.opt  |  4 +++
 gcc/config/i386/pcommitintrin.h   | 49 +++
 gcc/config/i386/x86intrin.h   |  2 ++
 gcc/testsuite/g++.dg/other/i386-2.C   |  2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |  2 +-
 gcc/testsuite/gcc.target/i386/pcommit-1.c | 11 +++
 gcc/testsuite/gcc.target/i386/sse-12.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  2 +-
 19 files changed, 121 insertions(+), 10 deletions(-)
 create mode 100644 gcc/config/i386/pcommitintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/pcommit-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index bad0988..2e09d77 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_XSAVEC_SET \
   (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE)
 #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
+#define OPTION_MASK_ISA_PCOMMIT_SET OPTION_MASK_ISA_PCOMMIT
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -182,6 +183,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_CLFLUSHOPT_UNSET OPTION_MASK_ISA_CLFLUSHOPT
 #define OPTION_MASK_ISA_XSAVEC_UNSET OPTION_MASK_ISA_XSAVEC
 #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES
+#define OPTION_MASK_ISA_PCOMMIT_UNSET OPTION_MASK_ISA_PCOMMIT
 #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
@@ -903,6 +905,19 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
+case OPT_mpcommit:
+  if (value)
+   {
+ opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PCOMMIT_SET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PCOMMIT_SET;
+   }
+  else
+   {
+ opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PCOMMIT_UNSET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PCOMMIT_UNSET;
+   }
+  return true;
+
 case OPT_mclwb:
   if (value)
{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 766f13b..fa3e1fc 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -369,7 +369,7 @@ i[34567]86-*-*)
   xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
   avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h

Re: [PING^4][PATCH] Warn about unclosed pragma omp declare target.

2014-11-19 Thread Ilya Tocar

As omp target and offloading support is committed to trunk,
I think it's reasonable to add some new warnings.

On 06 Nov 15:27, Ilya Tocar wrote:
 Ping.
 On 30 Oct 18:31, Ilya Tocar wrote:
  Ping.
  On 20 Oct 19:26, Ilya Tocar wrote:
   Ping.
   
   On 02 Oct 17:38, Ilya Tocar wrote:
Ping.
On 15 Aug 16:26, Ilya Tocar wrote:
 Ping.
 
 On 29 Jul 18:45, Ilya Tocar wrote:
  Hi,
  
  As discussed here in 
  https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
  Gcc should complain about pragma omp declare target without
  corresponding pragma omp end declare target. This patch adds a 
  warning
  for those cases.
  Bootstraps/passes make-check.
  Ok for trunk?
  
  ChangeLog:
  
  2014-07-29  Ilya Tocar  ilya.to...@intel.com
  
  * c-decl.c (omp_declare_target_location_stack): New.
  * c-lang.h (omp_declare_target_location_stack): Declare.
  * c-parser.c (warn_unclosed_pragma_omp_target): New.
  (c_parser_translation_unit): Call it.
  (c_parser_omp_declare_target): Remeber location.
  (c_parser_omp_end_declare_target): Forget location.
  
  And ChangeLog for testsuite:
  
  2014-07-29  Ilya Tocar  ilya.to...@intel.com
  
  * gcc.dg/gomp//target-3.c: New testcase.
  
  ---
   gcc/c/c-decl.c   |  3 +++
   gcc/c/c-lang.h   |  3 +++
   gcc/c/c-parser.c | 22 +-
   gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
  +
   4 files changed, 60 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
  
  diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
  index 2a4b439..2dd5b2c 100644
  --- a/gcc/c/c-decl.c
  +++ b/gcc/c/c-decl.c
  @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = 
  VOIDmode;
   /* If non-zero, implicit omp declare target attribute is added 
  into the
  attribute lists.  */
   int current_omp_declare_target_attribute;
  +
  +/* Holds locations of currently open omp declare target pragmas. 
   */
  +veclocation_t omp_declare_target_location_stack;
   
   /* Each c_binding structure describes one binding of an identifier 
  to
  a decl.  All the decls in a scope - irrespective of namespace - 
  are
  diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
  index e974906..cef995c 100644
  --- a/gcc/c/c-lang.h
  +++ b/gcc/c/c-lang.h
  @@ -59,4 +59,7 @@ struct GTY(()) language_function {
  attribute lists.  */
   extern GTY(()) int current_omp_declare_target_attribute;
   
  +/* Holds locations of currently open omp declare target pragmas. 
   */
  +extern veclocation_t omp_declare_target_location_stack;
  +
   #endif /* ! GCC_C_LANG_H */
  diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
  index e32bf04..0b96fe9 100644
  --- a/gcc/c/c-parser.c
  +++ b/gcc/c/c-parser.c
  @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd 
  (c_parser *, enum pragma_context);
   static tree c_parser_array_notation (location_t, c_parser *, tree, 
  tree);
   static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, 
  bool);
   
  +static void warn_unclosed_pragma_omp_target ();
  +
   /* Parse a translation unit (C90 6.7, C99 6.9).
   
  translation-unit:
  @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
  }
 while (c_parser_next_token_is_not (parser, CPP_EOF));
   }
  +
  +  warn_unclosed_pragma_omp_target ();
   }
   
   /* Parse an external declaration (C90 6.7, C99 6.9).
  @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser 
  *parser, tree fndecl, tree parms,
   static void
   c_parser_omp_declare_target (c_parser *parser)
   {
  +  location_t loc = c_parser_peek_token (parser)-location;
 c_parser_skip_to_pragma_eol (parser);
 current_omp_declare_target_attribute++;
  +  omp_declare_target_location_stack.safe_push (loc);
   }
   
   static void
  @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser 
  *parser)
   error_at (loc, %#pragma omp end declare target% without 
  corresponding 
 %#pragma omp declare target%);
 else
  -current_omp_declare_target_attribute--;
  +{
  +  current_omp_declare_target_attribute--;
  +  omp_declare_target_location_stack.pop ();
  +}
   }
   
   
  @@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, 
  c_parser *parser, tree initial_index,
 return value_tree;
   }
   
  +static void
  +warn_unclosed_pragma_omp_target ()
  +{
  +  int i

[PATCH][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.

2014-11-19 Thread Ilya Tocar

Hi,

New revision of Intel ISA reference [1] has new instructions:
Clwb, pcommit and new flavors of AVX512. Patch bellow adds them.
I understand that stage 1 is closed, however those changes shouldn't
affect anything outside if i386 backend. And are extremely unlikely to
break existing functionality, and I personally think it's desirable for
newest GCC to support newest spec.
Bootstrapped/regtestsed on x86_64-unknown-linux-gnu.
Ok for trunk?

[1]:https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf


gcc/

2014-11-19  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512IFMA_SET,
OPTION_MASK_ISA_AVX512VBMI_SET, OPTION_MASK_ISA_AVX512IFMA_UNSET,
OPTION_MASK_ISA_AVX512VBMI_UNSET, OPTION_MASK_ISA_PCOMMIT_UNSET,
OPTION_MASK_ISA_CLWB_UNSET, OPTION_MASK_ISA_CLWB_SET,
OPTION_MASK_ISA_PCOMMIT_SET): New.
(ix86_handle_option): Handle OPT_mavx512ifma, OPT_mavx512vbmi,
OPT_mpcommit, OPT_mclwb.
* config.gcc: Add avx512ifmaintrin.h, avx512ifmavlintrin.h,
avx512vbmiintrin.h, avx512vbmivlintrin.h clwbintrin.h pcommitintrin.h
* config/i386/avx512ifmaintrin.h: New file.
* config/i386/avx512ifmaivlntrin.h: Ditto.
* config/i386/avx512vbmiintrin.h: Ditto.
* config/i386/avx512vbmivlintrin.h: Ditto.
* config/i386/clwbintrin.h: Ditto.
* config/i386/pcommitintrin.h: Ditto.
* config/i386/cpuid.h (bit_AVX512IFMA, bit_PCOMMIT, bit_CLWB,
bit_AVX512VBMI): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect pcommit,
clwb, avx512ifma, avx512vbmi.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512VBMI__, __AVX512IFMA__, __PCOMMIT__, __CLWB__.
* config/i386/i386.c (ix86_target_string): Add -mavx512ifma,
-mavx512vbmi, -mclwb, -mpcommit.
(PTA_AVX512VBMI, PTA_AVX512IFMA, PTA_CLWB, PTA_PCOMMIT): Define.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Add avx512vbmi, avx512ifma,
clwb, pcommit.
(ix86_builtins): Add IX86_BUILTIN_VPMADD52LUQ512,
IX86_BUILTIN_VPMADD52HUQ512, IX86_BUILTIN_VPMADD52LUQ256,
IX86_BUILTIN_VPMADD52HUQ256, IX86_BUILTIN_VPMADD52LUQ128,
IX86_BUILTIN_VPMADD52HUQ128, IX86_BUILTIN_VPMADD52LUQ512_MASKZ,
IX86_BUILTIN_VPMADD52HUQ512_MASKZ, IX86_BUILTIN_VPMADD52LUQ256_MASKZ,
IX86_BUILTIN_VPMADD52HUQ256_MASKZ, IX86_BUILTIN_VPMADD52LUQ128_MASKZ,
IX86_BUILTIN_VPMADD52HUQ128_MASKZ, IX86_BUILTIN_VPMULTISHIFTQB512,
IX86_BUILTIN_VPMULTISHIFTQB256, IX86_BUILTIN_VPMULTISHIFTQB128,
IX86_BUILTIN_VPERMVARQI512_MASK, IX86_BUILTIN_VPERMT2VARQI512,
IX86_BUILTIN_VPERMT2VARQI512_MASKZ, IX86_BUILTIN_VPERMI2VARQI512,
IX86_BUILTIN_VPERMVARQI256_MASK, IX86_BUILTIN_VPERMVARQI128_MASK,
IX86_BUILTIN_VPERMT2VARQI256, IX86_BUILTIN_VPERMT2VARQI256_MASKZ,
IX86_BUILTIN_VPERMT2VARQI128, IX86_BUILTIN_VPERMI2VARQI256,
IX86_BUILTIN_VPERMI2VARQI128, IX86_BUILTIN_CLWB, IX86_BUILTIN_PCOMMIT.
(bdesc_special_args): Add __builtin_ia32_pcommit,
__builtin_ia32_vpmadd52luq512_mask,
__builtin_ia32_vpmadd52luq512_maskz,
__builtin_ia32_vpmadd52huq512_mask,
__builtin_ia32_vpmadd52huq512_maskx,
__builtin_ia32_vpmadd52luq256_mask,
__builtin_ia32_vpmadd52luq256_maskz,
__builtin_ia32_vpmadd52huq256_mask,
__builtin_ia32_vpmadd52huq256_maskz,
__builtin_ia32_vpmadd52luq128_mask,
__builtin_ia32_vpmadd52luq128_maskz,
__builtin_ia32_vpmadd52huq128_mask,
__builtin_ia32_vpmadd52huq128_maskz,
__builtin_ia32_vpmultishiftqb512_mask,
__builtin_ia32_vpmultishiftqb256_mask,
__builtin_ia32_vpmultishiftqb128_mask,
__builtin_ia32_permvarqi512_mask, __builtin_ia32_vpermt2varqi512_mask,
__builtin_ia32_vpermt2varqi512_maskz,
__builtin_ia32_vpermi2varqi512_mask, __builtin_ia32_permvarqi256_mask,
__builtin_ia32_permvarqi128_mask, __builtin_ia32_vpermt2varqi256_mask,
__builtin_ia32_vpermt2varqi256_maskz,
__builtin_ia32_vpermt2varqi128_mask,
__builtin_ia32_vpermt2varqi128_maskz,
__builtin_ia32_vpermi2varqi256_mask,
__builtin_ia32_vpermi2varqi128_mask.
(ix86_init_mmx_sse_builtins): Add __builtin_ia32_clwb.
(ix86_expand_builtin): Handle IX86_BUILTIN_CLWB.
(ix86_hard_regno_mode_ok): Allow big masks for AVX612VBMI.
* config/i386/i386.h (TARGET_AVX512VBMI, TARGET_AVX512VBMI_P,
TARGET_AVX512IFMA, TARGET_AVX512IFMA_P, TARGET_PCOMMIT,
TARGET_PCOMMIT_P, TARGET_CLWB, TARGET_CLWB_P): Define.
* config/i386/i386.md (unspecv): Add UNSPECV_CLWB, UNSPECV_PCOMMIT.
(pcommit): New instruction.
(clwb): Ditto.
* config/i386/i386.opt: Add mavx512ifma, mavx512vbmi, mclwb, mpcommit

Re: [PATCH] Correctly check dg-require-effective-target in avx512 tests.

2014-11-06 Thread Ilya Tocar

On 05 Nov 17:17, Uros Bizjak wrote:
 On Wed, Nov 5, 2014 at 5:14 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  Currently we only check for dg-require-effective-target avx512vl in
  avx512vl tests. We should also check for avx512dq/avx512bw.
  Patch bwllow does this.
  Ok for trunk?
 
 
 OK, if tested on some simulator.


I've tested it on sde [1], but those changes are compile-time only.
Committed as rev 217188.

[1]https://software.intel.com/en-us/articles/intel-software-development-emulator

Re: [PING^3][PATCH] Warn about unclosed pragma omp declare target.

2014-11-06 Thread Ilya Tocar

Ping.
On 30 Oct 18:31, Ilya Tocar wrote:
 Ping.
 On 20 Oct 19:26, Ilya Tocar wrote:
  Ping.
  
  On 02 Oct 17:38, Ilya Tocar wrote:
   Ping.
   On 15 Aug 16:26, Ilya Tocar wrote:
Ping.

On 29 Jul 18:45, Ilya Tocar wrote:
 Hi,
 
 As discussed here in https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
 Gcc should complain about pragma omp declare target without
 corresponding pragma omp end declare target. This patch adds a warning
 for those cases.
 Bootstraps/passes make-check.
 Ok for trunk?
 
 ChangeLog:
 
 2014-07-29  Ilya Tocar  ilya.to...@intel.com
 
   * c-decl.c (omp_declare_target_location_stack): New.
   * c-lang.h (omp_declare_target_location_stack): Declare.
   * c-parser.c (warn_unclosed_pragma_omp_target): New.
   (c_parser_translation_unit): Call it.
   (c_parser_omp_declare_target): Remeber location.
   (c_parser_omp_end_declare_target): Forget location.
 
 And ChangeLog for testsuite:
 
 2014-07-29  Ilya Tocar  ilya.to...@intel.com
 
   * gcc.dg/gomp//target-3.c: New testcase.
 
 ---
  gcc/c/c-decl.c   |  3 +++
  gcc/c/c-lang.h   |  3 +++
  gcc/c/c-parser.c | 22 +-
  gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
 +
  4 files changed, 60 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
 
 diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
 index 2a4b439..2dd5b2c 100644
 --- a/gcc/c/c-decl.c
 +++ b/gcc/c/c-decl.c
 @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = 
 VOIDmode;
  /* If non-zero, implicit omp declare target attribute is added 
 into the
 attribute lists.  */
  int current_omp_declare_target_attribute;
 +
 +/* Holds locations of currently open omp declare target pragmas.  
 */
 +veclocation_t omp_declare_target_location_stack;
  
  /* Each c_binding structure describes one binding of an identifier to
 a decl.  All the decls in a scope - irrespective of namespace - 
 are
 diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
 index e974906..cef995c 100644
 --- a/gcc/c/c-lang.h
 +++ b/gcc/c/c-lang.h
 @@ -59,4 +59,7 @@ struct GTY(()) language_function {
 attribute lists.  */
  extern GTY(()) int current_omp_declare_target_attribute;
  
 +/* Holds locations of currently open omp declare target pragmas.  
 */
 +extern veclocation_t omp_declare_target_location_stack;
 +
  #endif /* ! GCC_C_LANG_H */
 diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
 index e32bf04..0b96fe9 100644
 --- a/gcc/c/c-parser.c
 +++ b/gcc/c/c-parser.c
 @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd (c_parser 
 *, enum pragma_context);
  static tree c_parser_array_notation (location_t, c_parser *, tree, 
 tree);
  static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, 
 bool);
  
 +static void warn_unclosed_pragma_omp_target ();
 +
  /* Parse a translation unit (C90 6.7, C99 6.9).
  
 translation-unit:
 @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
   }
while (c_parser_next_token_is_not (parser, CPP_EOF));
  }
 +
 +  warn_unclosed_pragma_omp_target ();
  }
  
  /* Parse an external declaration (C90 6.7, C99 6.9).
 @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser *parser, 
 tree fndecl, tree parms,
  static void
  c_parser_omp_declare_target (c_parser *parser)
  {
 +  location_t loc = c_parser_peek_token (parser)-location;
c_parser_skip_to_pragma_eol (parser);
current_omp_declare_target_attribute++;
 +  omp_declare_target_location_stack.safe_push (loc);
  }
  
  static void
 @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser 
 *parser)
  error_at (loc, %#pragma omp end declare target% without 
 corresponding 
  %#pragma omp declare target%);
else
 -current_omp_declare_target_attribute--;
 +{
 +  current_omp_declare_target_attribute--;
 +  omp_declare_target_location_stack.pop ();
 +}
  }
  
  
 @@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, 
 c_parser *parser, tree initial_index,
return value_tree;
  }
  
 +static void
 +warn_unclosed_pragma_omp_target ()
 +{
 +  int i;
 +  for (i = 0; i  current_omp_declare_target_attribute; i++)
 +warning_at (omp_declare_target_location_stack[i], 0,
 + %#pragma omp declare target% without corresponding 
 + %#pragma omp end declare target

Re: [PATCH AVX512] Fix dg.torture tests with avx512

2014-11-05 Thread Ilya Tocar

On 03 Nov 11:21, Jakub Jelinek wrote:
 On Fri, Oct 31, 2014 at 11:17:07AM +0100, Uros Bizjak wrote:
  I'd like to ask Jakub for a review of the above two parts, other parts
  are OK with a rename (as mentioned above).
 
 Looks ok to me.  Where the ICEs discovered just by normal make check or only
 with GCC_TEST_RUN_EXPENSIVE ?  If the latter, can you promote one of the
 permutations that caused the ICEs to normal tests?  If not and
 GCC_TEST_RUN_EXPENSIVE has not been tested, can you try that?


This was discovered without GCC_TEST_RUN_EXPENSIVE, but I've tested it
with it enabled, and didn't see any fails.

I've committed version below.

---
 gcc/config/i386/i386.c | 59 --
 gcc/config/i386/sse.md | 54 ++---
 2 files changed, 98 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c528599..aaffe9d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -45943,6 +45943,42 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
{
  if (!TARGET_AVX512BW)
return false;
+
+ /* If vpermq didn't work, vpshufb won't work either.  */
+ if (d-vmode == V8DFmode || d-vmode == V8DImode)
+   return false;
+
+ vmode = V64QImode;
+ if (d-vmode == V16SImode
+ || d-vmode == V32HImode
+ || d-vmode == V64QImode)
+   {
+ /* First see if vpermq can be used for
+V16SImode/V32HImode/V64QImode.  */
+ if (valid_perm_using_mode_p (V8DImode, d))
+   {
+ for (i = 0; i  8; i++)
+   perm[i] = (d-perm[i * nelt / 8] * 8 / nelt)  7;
+ if (d-testing_p)
+   return true;
+ target = gen_reg_rtx (V8DImode);
+ if (expand_vselect (target, gen_lowpart (V8DImode, d-op0),
+ perm, 8, false))
+   {
+ emit_move_insn (d-target,
+ gen_lowpart (d-vmode, target));
+ return true;
+   }
+ return false;
+   }
+
+ /* Next see if vpermd can be used.  */
+ if (valid_perm_using_mode_p (V16SImode, d))
+   vmode = V16SImode;
+   }
+ /* Or if vpermps can be used.  */
+ else if (d-vmode == V16SFmode)
+   vmode = V16SImode;
  if (vmode == V64QImode)
{
  /* vpshufb only works intra lanes, it is not
@@ -45962,6 +45998,9 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   if (vmode == V8SImode)
 for (i = 0; i  8; ++i)
   rperm[i] = GEN_INT ((d-perm[i * nelt / 8] * 8 / nelt)  7);
+  else if (vmode == V16SImode)
+for (i = 0; i  16; ++i)
+  rperm[i] = GEN_INT ((d-perm[i * nelt / 16] * 16 / nelt)  15);
   else
 {
   eltsz = GET_MODE_SIZE (GET_MODE_INNER (d-vmode));
@@ -46000,8 +46039,14 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
emit_insn (gen_avx512bw_pshufbv64qi3 (target, op0, vperm));
   else if (vmode == V8SFmode)
emit_insn (gen_avx2_permvarv8sf (target, op0, vperm));
-  else
+  else if (vmode == V8SImode)
emit_insn (gen_avx2_permvarv8si (target, op0, vperm));
+  else if (vmode == V16SFmode)
+   emit_insn (gen_avx512f_permvarv16sf (target, op0, vperm));
+  else if (vmode == V16SImode)
+   emit_insn (gen_avx512f_permvarv16si (target, op0, vperm));
+  else
+   gcc_unreachable ();
 }
   else
 {
@@ -46055,21 +46100,21 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
{
case V64QImode:
  if (TARGET_AVX512BW)
-   gen = gen_avx512bw_vec_dupv64qi;
+   gen = gen_avx512bw_vec_dupv64qi_1;
  break;
case V32QImode:
  gen = gen_avx2_pbroadcastv32qi_1;
  break;
case V32HImode:
  if (TARGET_AVX512BW)
-   gen = gen_avx512bw_vec_dupv32hi;
+   gen = gen_avx512bw_vec_dupv32hi_1;
  break;
case V16HImode:
  gen = gen_avx2_pbroadcastv16hi_1;
  break;
case V16SImode:
  if (TARGET_AVX512F)
-   gen = gen_avx512f_vec_dupv16si;
+   gen = gen_avx512f_vec_dupv16si_1;
  break;
case V8SImode:
  gen = gen_avx2_pbroadcastv8si_1;
@@ -46082,18 +46127,18 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
  break;
case V16SFmode:
  if (TARGET_AVX512F)
-   gen = gen_avx512f_vec_dupv16sf;
+   gen = gen_avx512f_vec_dupv16sf_1;
  break;
case V8SFmode:
  gen = gen_avx2_vec_dupv8sf_1;
  break;
case V8DFmode:
  if (TARGET_AVX512F)
-

Re: [PING][PATCH] Don't call fatal_error before error reporting has been initialized.

2014-11-05 Thread Ilya Tocar

Ping.

On 20 Oct 19:25, Ilya Tocar wrote:
 Same in collect2.
 
 On 09 Oct 15:40, Ilya Tocar wrote:
  Ping.
  
  On 29 Sep 18:02, Ilya Tocar wrote:
   Hi,
   
   Currently if call to atexit (lto_wrapper_cleanup) fails we
   won't report error as we haven't initialized error-reporting
   infrastructure. This patch moves this call after diagnostic_initialize.
   I hope that we can't  exit inside diagnostic_initialize. Otherwise we
   won't cleanup after it.
   Ok for trunk?
  
 
 ---
  gcc/collect2.c| 6 +++---
  gcc/lto-wrapper.c | 6 +++---
  2 files changed, 6 insertions(+), 6 deletions(-)
 
 diff --git a/gcc/collect2.c b/gcc/collect2.c
 index c54e6fb..b0784e8 100644
 --- a/gcc/collect2.c
 +++ b/gcc/collect2.c
 @@ -955,9 +955,6 @@ main (int argc, char **argv)
signal (SIGCHLD, SIG_DFL);
  #endif
  
 -  if (atexit (collect_atexit) != 0)
 -fatal_error (atexit failed);
 -
/* Unlock the stdio streams.  */
unlock_std_streams ();
  
 @@ -965,6 +962,9 @@ main (int argc, char **argv)
  
diagnostic_initialize (global_dc, 0);
  
 +  if (atexit (collect_atexit) != 0)
 +fatal_error (atexit failed);
 +
/* Do not invoke xcalloc before this point, since locale needs to be
   set first, in case a diagnostic is issued.  */
  
 diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
 index 8033b15..d97f617 100644
 --- a/gcc/lto-wrapper.c
 +++ b/gcc/lto-wrapper.c
 @@ -879,13 +879,13 @@ main (int argc, char *argv[])
  
xmalloc_set_program_name (progname);
  
 -  if (atexit (lto_wrapper_cleanup) != 0)
 -fatal_error (atexit failed);
 -
gcc_init_libintl ();
  
diagnostic_initialize (global_dc, 0);
  
 +  if (atexit (lto_wrapper_cleanup) != 0)
 +fatal_error (atexit failed);
 +
if (signal (SIGINT, SIG_IGN) != SIG_IGN)
  signal (SIGINT, fatal_signal);
  #ifdef SIGHUP
 -- 
 1.8.3.1

[PATCH] Correctly check dg-require-effective-target in avx512 tests.

2014-11-05 Thread Ilya Tocar

Hi,

Currently we only check for dg-require-effective-target avx512vl in
avx512vl tests. We should also check for avx512dq/avx512bw.
Patch bwllow does this.
Ok for trunk?

2014-11-05  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx512vl-vandnpd-2.c: Fix
dg-require-effective-target cehck.
* gcc.target/i386/avx512vl-vandnps-2.c: Ditto.
* gcc.target/i386/avx512vl-vandpd-2.c: Ditto.
* gcc.target/i386/avx512vl-vandps-2.c: Ditto.
* gcc.target/i386/avx512vl-vbroadcastf32x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vbroadcastf32x4-2.c: Ditto.
* gcc.target/i386/avx512vl-vbroadcastf64x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vbroadcasti32x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vbroadcasti32x4-2.c: Ditto.
* gcc.target/i386/avx512vl-vbroadcasti64x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtpd2qq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtpd2uqq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2qq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2uqq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtqq2pd-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtqq2ps-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvttpd2qq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvttpd2uqq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvttps2qq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvttps2uqq-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtuqq2pd-2.c: Ditto.
* gcc.target/i386/avx512vl-vcvtuqq2ps-2.c: Ditto.
* gcc.target/i386/avx512vl-vdbpsadbw-2.c: Ditto.
* gcc.target/i386/avx512vl-vextractf64x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vextracti64x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vfpclasspd-2.c: Ditto.
* gcc.target/i386/avx512vl-vfpclassps-2.c: Ditto.
* gcc.target/i386/avx512vl-vinsertf64x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vinserti64x2-2.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqu16-2.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqu8-2.c: Ditto.
* gcc.target/i386/avx512vl-vorpd-2.c: Ditto.
* gcc.target/i386/avx512vl-vorps-2.c: Ditto.
* gcc.target/i386/avx512vl-vpabsb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpabsw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpackssdw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpacksswb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpackusdw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpackuswb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpaddb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpaddsb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpaddsw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpaddusb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpaddusw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpaddw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpalignr-2.c: Ditto.
* gcc.target/i386/avx512vl-vpavgb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpavgw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpblendmb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpblendmw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpbroadcastb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpbroadcastw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpeqb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpequb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpequw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpeqw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpgtb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpgtub-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpgtuw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpgtw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpub-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpuw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpermi2w-2.c: Ditto.
* gcc.target/i386/avx512vl-vpermt2w-2.c: Ditto.
* gcc.target/i386/avx512vl-vpermw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaddubsw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaddwd-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaxsb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaxsw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaxub-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmaxuw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpminsb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpminsw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpminub-2.c: Ditto.
* gcc.target/i386/avx512vl-vpminuw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmovb2m-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmovd2m-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmovm2b-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmovm2d-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmovm2q-2.c: Ditto.
* gcc.target/i386/avx512vl-vpmovm2w-2.c: Ditto.
* gcc.target/i386

[PATCH AVX512] Fix dg.torture tests with avx512

2014-10-30 Thread Ilya Tocar

Hi,

I've run gcc.dg/torture/* tests with -mavx512bw -mavx512vl -mavx512dq
flags, and got a bunch of fails (mostly in permutes autogen).
Patch below fixes them.
Ok for trunk?

2014-10-30  Ilya Tocar  ilya.to...@intel.com

* config/i386/i386.c (expand_vec_perm_pshufb): Try vpermq/vpermd
for 512-bit wide modes.
(expand_vec_perm_1): Use correct versions of patterns.
* config/i386/sse.md (avx512f_vec_dup_mode_1): New.
(vashrmode3mask_name): Split V8HImode and V16QImode.

---
 gcc/config/i386/i386.c | 59 --
 gcc/config/i386/sse.md | 54 ++---
 2 files changed, 98 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 71a4f6a..74ff894 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -45889,6 +45889,42 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
{
  if (!TARGET_AVX512BW)
return false;
+
+ /* If vpermq didn't work, vpshufb won't work either.  */
+ if (d-vmode == V8DFmode || d-vmode == V8DImode)
+   return false;
+
+ vmode = V64QImode;
+ if (d-vmode == V16SImode
+ || d-vmode == V32HImode
+ || d-vmode == V64QImode)
+   {
+ /* First see if vpermq can be used for
+V16SImode/V32HImode/V64QImode.  */
+ if (valid_perm_using_mode_p (V8DImode, d))
+   {
+ for (i = 0; i  8; i++)
+   perm[i] = (d-perm[i * nelt / 8] * 8 / nelt)  7;
+ if (d-testing_p)
+   return true;
+ target = gen_reg_rtx (V8DImode);
+ if (expand_vselect (target, gen_lowpart (V8DImode, d-op0),
+ perm, 8, false))
+   {
+ emit_move_insn (d-target,
+ gen_lowpart (d-vmode, target));
+ return true;
+   }
+ return false;
+   }
+
+ /* Next see if vpermd can be used.  */
+ if (valid_perm_using_mode_p (V16SImode, d))
+   vmode = V16SImode;
+   }
+ /* Or if vpermps can be used.  */
+ else if (d-vmode == V16SFmode)
+   vmode = V16SImode;
  if (vmode == V64QImode)
{
  /* vpshufb only works intra lanes, it is not
@@ -45908,6 +45944,9 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   if (vmode == V8SImode)
 for (i = 0; i  8; ++i)
   rperm[i] = GEN_INT ((d-perm[i * nelt / 8] * 8 / nelt)  7);
+  else if (vmode == V16SImode)
+for (i = 0; i  16; ++i)
+  rperm[i] = GEN_INT ((d-perm[i * nelt / 16] * 16 / nelt)  15);
   else
 {
   eltsz = GET_MODE_SIZE (GET_MODE_INNER (d-vmode));
@@ -45946,8 +45985,14 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
emit_insn (gen_avx512bw_pshufbv64qi3 (target, op0, vperm));
   else if (vmode == V8SFmode)
emit_insn (gen_avx2_permvarv8sf (target, op0, vperm));
-  else
+  else if (vmode == V8SImode)
emit_insn (gen_avx2_permvarv8si (target, op0, vperm));
+  else if (vmode == V16SFmode)
+   emit_insn (gen_avx512f_permvarv16sf (target, op0, vperm));
+  else if (vmode == V16SImode)
+   emit_insn (gen_avx512f_permvarv16si (target, op0, vperm));
+  else
+   gcc_unreachable ();
 }
   else
 {
@@ -46001,21 +46046,21 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
{
case V64QImode:
  if (TARGET_AVX512BW)
-   gen = gen_avx512bw_vec_dupv64qi;
+   gen = gen_avx512bw_vec_dup_v64qi_1;
  break;
case V32QImode:
  gen = gen_avx2_pbroadcastv32qi_1;
  break;
case V32HImode:
  if (TARGET_AVX512BW)
-   gen = gen_avx512bw_vec_dupv32hi;
+   gen = gen_avx512bw_vec_dup_v32hi_1;
  break;
case V16HImode:
  gen = gen_avx2_pbroadcastv16hi_1;
  break;
case V16SImode:
  if (TARGET_AVX512F)
-   gen = gen_avx512f_vec_dupv16si;
+   gen = gen_avx512f_vec_dup_v16si_1;
  break;
case V8SImode:
  gen = gen_avx2_pbroadcastv8si_1;
@@ -46028,18 +46073,18 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
  break;
case V16SFmode:
  if (TARGET_AVX512F)
-   gen = gen_avx512f_vec_dupv16sf;
+   gen = gen_avx512f_vec_dup_v16sf_1;
  break;
case V8SFmode:
  gen = gen_avx2_vec_dupv8sf_1;
  break;
case V8DFmode:
  if (TARGET_AVX512F)
-   gen = gen_avx512f_vec_dupv8df;
+   gen = gen_avx512f_vec_dup_v8df_1;
  break;
case

Re: [PING^2][PATCH] Warn about unclosed pragma omp declare target.

2014-10-30 Thread Ilya Tocar

Ping.
On 20 Oct 19:26, Ilya Tocar wrote:
 Ping.
 
 On 02 Oct 17:38, Ilya Tocar wrote:
  Ping.
  On 15 Aug 16:26, Ilya Tocar wrote:
   Ping.
   
   On 29 Jul 18:45, Ilya Tocar wrote:
Hi,

As discussed here in https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
Gcc should complain about pragma omp declare target without
corresponding pragma omp end declare target. This patch adds a warning
for those cases.
Bootstraps/passes make-check.
Ok for trunk?

ChangeLog:

2014-07-29  Ilya Tocar  ilya.to...@intel.com

* c-decl.c (omp_declare_target_location_stack): New.
* c-lang.h (omp_declare_target_location_stack): Declare.
* c-parser.c (warn_unclosed_pragma_omp_target): New.
(c_parser_translation_unit): Call it.
(c_parser_omp_declare_target): Remeber location.
(c_parser_omp_end_declare_target): Forget location.

And ChangeLog for testsuite:

2014-07-29  Ilya Tocar  ilya.to...@intel.com

* gcc.dg/gomp//target-3.c: New testcase.

---
 gcc/c/c-decl.c   |  3 +++
 gcc/c/c-lang.h   |  3 +++
 gcc/c/c-parser.c | 22 +-
 gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
+
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 2a4b439..2dd5b2c 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = VOIDmode;
 /* If non-zero, implicit omp declare target attribute is added into 
the
attribute lists.  */
 int current_omp_declare_target_attribute;
+
+/* Holds locations of currently open omp declare target pragmas.  */
+veclocation_t omp_declare_target_location_stack;
 
 /* Each c_binding structure describes one binding of an identifier to
a decl.  All the decls in a scope - irrespective of namespace - are
diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
index e974906..cef995c 100644
--- a/gcc/c/c-lang.h
+++ b/gcc/c/c-lang.h
@@ -59,4 +59,7 @@ struct GTY(()) language_function {
attribute lists.  */
 extern GTY(()) int current_omp_declare_target_attribute;
 
+/* Holds locations of currently open omp declare target pragmas.  */
+extern veclocation_t omp_declare_target_location_stack;
+
 #endif /* ! GCC_C_LANG_H */
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index e32bf04..0b96fe9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd (c_parser 
*, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, 
tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
 
+static void warn_unclosed_pragma_omp_target ();
+
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
translation-unit:
@@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
}
   while (c_parser_next_token_is_not (parser, CPP_EOF));
 }
+
+  warn_unclosed_pragma_omp_target ();
 }
 
 /* Parse an external declaration (C90 6.7, C99 6.9).
@@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser *parser, 
tree fndecl, tree parms,
 static void
 c_parser_omp_declare_target (c_parser *parser)
 {
+  location_t loc = c_parser_peek_token (parser)-location;
   c_parser_skip_to_pragma_eol (parser);
   current_omp_declare_target_attribute++;
+  omp_declare_target_location_stack.safe_push (loc);
 }
 
 static void
@@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser 
*parser)
 error_at (loc, %#pragma omp end declare target% without 
corresponding 
   %#pragma omp declare target%);
   else
-current_omp_declare_target_attribute--;
+{
+  current_omp_declare_target_attribute--;
+  omp_declare_target_location_stack.pop ();
+}
 }
 
 
@@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, 
c_parser *parser, tree initial_index,
   return value_tree;
 }
 
+static void
+warn_unclosed_pragma_omp_target ()
+{
+  int i;
+  for (i = 0; i  current_omp_declare_target_attribute; i++)
+warning_at (omp_declare_target_location_stack[i], 0,
+   %#pragma omp declare target% without corresponding 
+   %#pragma omp end declare target%);
+  omp_declare_target_location_stack.release ();
+}
+
 #include gt-c-c-parser.h
diff --git a/gcc/testsuite/gcc.dg/gomp/target-3.c 
b/gcc/testsuite/gcc.dg/gomp/target-3.c
new file mode 100644
index

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-20 Thread Ilya Tocar

  
  The patch is OK with the above improvement.
  
 
 
 Will commit version below, if no objections in 24 hours.
 

Sorry,
I've missed palignr, which should also have v64qi version,
and lost return in expand_vec_perm_palignr case
(this caused avx512f-vec-unpack test failures).
Patch below fixes it. Ok for trunk?

2014-10-20  Ilya Tocar  ilya.to...@intel.com

* config/i386/i386.c (expand_vec_perm_1): Fix
expand_vec_perm_palignr case.
* config/i386/sse.md (ssse3_avx2_palignrmode_mask): Use
VI1_AVX512.

---
 gcc/config/i386/i386.c |  1 +
 gcc/config/i386/sse.md | 12 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 33b21f4..34273ca 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43552,6 +43552,7 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 
   /* Try the AVX2 vpalignr instruction.  */
   if (expand_vec_perm_palignr (d, true))
+return true;
 
   /* Try the AVX512F vpermi2 instructions.  */
   if (ix86_expand_vec_perm_vpermi2 (NULL_RTX, NULL_RTX, NULL_RTX, NULL_RTX, d))
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8157045..a3f336f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13716,14 +13716,14 @@
(set_attr mode DI)])
 
 (define_insn ssse3_avx2_palignrmode_mask
-  [(set (match_operand:VI1_AVX2 0 register_operand =v)
-(vec_merge:VI1_AVX2
- (unspec:VI1_AVX2
-   [(match_operand:VI1_AVX2 1 register_operand v)
-(match_operand:VI1_AVX2 2 nonimmediate_operand vm)
+  [(set (match_operand:VI1_AVX512 0 register_operand =v)
+(vec_merge:VI1_AVX512
+ (unspec:VI1_AVX512
+   [(match_operand:VI1_AVX512 1 register_operand v)
+(match_operand:VI1_AVX512 2 nonimmediate_operand vm)
 (match_operand:SI 3 const_0_to_255_mul_8_operand n)]
UNSPEC_PALIGNR)
-   (match_operand:VI1_AVX2 4 vector_move_operand 0C)
+   (match_operand:VI1_AVX512 4 vector_move_operand 0C)
(match_operand:avx512fmaskmode 5 register_operand Yk)))]
   TARGET_AVX512BW  (MODE_SIZE == 64 || TARGET_AVX512VL)
 {
-- 
1.8.3.1

Re: [PATCH] Don't call fatal_error before error reporting has been initialized.

2014-10-20 Thread Ilya Tocar

Same in collect2.

On 09 Oct 15:40, Ilya Tocar wrote:
 Ping.
 
 On 29 Sep 18:02, Ilya Tocar wrote:
  Hi,
  
  Currently if call to atexit (lto_wrapper_cleanup) fails we
  won't report error as we haven't initialized error-reporting
  infrastructure. This patch moves this call after diagnostic_initialize.
  I hope that we can't  exit inside diagnostic_initialize. Otherwise we
  won't cleanup after it.
  Ok for trunk?
 

---
 gcc/collect2.c| 6 +++---
 gcc/lto-wrapper.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index c54e6fb..b0784e8 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -955,9 +955,6 @@ main (int argc, char **argv)
   signal (SIGCHLD, SIG_DFL);
 #endif
 
-  if (atexit (collect_atexit) != 0)
-fatal_error (atexit failed);
-
   /* Unlock the stdio streams.  */
   unlock_std_streams ();
 
@@ -965,6 +962,9 @@ main (int argc, char **argv)
 
   diagnostic_initialize (global_dc, 0);
 
+  if (atexit (collect_atexit) != 0)
+fatal_error (atexit failed);
+
   /* Do not invoke xcalloc before this point, since locale needs to be
  set first, in case a diagnostic is issued.  */
 
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 8033b15..d97f617 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -879,13 +879,13 @@ main (int argc, char *argv[])
 
   xmalloc_set_program_name (progname);
 
-  if (atexit (lto_wrapper_cleanup) != 0)
-fatal_error (atexit failed);
-
   gcc_init_libintl ();
 
   diagnostic_initialize (global_dc, 0);
 
+  if (atexit (lto_wrapper_cleanup) != 0)
+fatal_error (atexit failed);
+
   if (signal (SIGINT, SIG_IGN) != SIG_IGN)
 signal (SIGINT, fatal_signal);
 #ifdef SIGHUP
-- 
1.8.3.1

Re: [PING][PATCH] Warn about unclosed pragma omp declare target.

2014-10-20 Thread Ilya Tocar

Ping.

On 02 Oct 17:38, Ilya Tocar wrote:
 Ping.
 On 15 Aug 16:26, Ilya Tocar wrote:
  Ping.
  
  On 29 Jul 18:45, Ilya Tocar wrote:
   Hi,
   
   As discussed here in https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
   Gcc should complain about pragma omp declare target without
   corresponding pragma omp end declare target. This patch adds a warning
   for those cases.
   Bootstraps/passes make-check.
   Ok for trunk?
   
   ChangeLog:
   
   2014-07-29  Ilya Tocar  ilya.to...@intel.com
   
 * c-decl.c (omp_declare_target_location_stack): New.
 * c-lang.h (omp_declare_target_location_stack): Declare.
 * c-parser.c (warn_unclosed_pragma_omp_target): New.
 (c_parser_translation_unit): Call it.
 (c_parser_omp_declare_target): Remeber location.
 (c_parser_omp_end_declare_target): Forget location.
   
   And ChangeLog for testsuite:
   
   2014-07-29  Ilya Tocar  ilya.to...@intel.com
   
 * gcc.dg/gomp//target-3.c: New testcase.
   
   ---
gcc/c/c-decl.c   |  3 +++
gcc/c/c-lang.h   |  3 +++
gcc/c/c-parser.c | 22 +-
gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
   +
4 files changed, 60 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
   
   diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
   index 2a4b439..2dd5b2c 100644
   --- a/gcc/c/c-decl.c
   +++ b/gcc/c/c-decl.c
   @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = VOIDmode;
/* If non-zero, implicit omp declare target attribute is added into the
   attribute lists.  */
int current_omp_declare_target_attribute;
   +
   +/* Holds locations of currently open omp declare target pragmas.  */
   +veclocation_t omp_declare_target_location_stack;

/* Each c_binding structure describes one binding of an identifier to
   a decl.  All the decls in a scope - irrespective of namespace - are
   diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
   index e974906..cef995c 100644
   --- a/gcc/c/c-lang.h
   +++ b/gcc/c/c-lang.h
   @@ -59,4 +59,7 @@ struct GTY(()) language_function {
   attribute lists.  */
extern GTY(()) int current_omp_declare_target_attribute;

   +/* Holds locations of currently open omp declare target pragmas.  */
   +extern veclocation_t omp_declare_target_location_stack;
   +
#endif /* ! GCC_C_LANG_H */
   diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
   index e32bf04..0b96fe9 100644
   --- a/gcc/c/c-parser.c
   +++ b/gcc/c/c-parser.c
   @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd (c_parser *, 
   enum pragma_context);
static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);

   +static void warn_unclosed_pragma_omp_target ();
   +
/* Parse a translation unit (C90 6.7, C99 6.9).

   translation-unit:
   @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
 }
  while (c_parser_next_token_is_not (parser, CPP_EOF));
}
   +
   +  warn_unclosed_pragma_omp_target ();
}

/* Parse an external declaration (C90 6.7, C99 6.9).
   @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser *parser, 
   tree fndecl, tree parms,
static void
c_parser_omp_declare_target (c_parser *parser)
{
   +  location_t loc = c_parser_peek_token (parser)-location;
  c_parser_skip_to_pragma_eol (parser);
  current_omp_declare_target_attribute++;
   +  omp_declare_target_location_stack.safe_push (loc);
}

static void
   @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser 
   *parser)
error_at (loc, %#pragma omp end declare target% without 
   corresponding 
%#pragma omp declare target%);
  else
   -current_omp_declare_target_attribute--;
   +{
   +  current_omp_declare_target_attribute--;
   +  omp_declare_target_location_stack.pop ();
   +}
}


   @@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, 
   c_parser *parser, tree initial_index,
  return value_tree;
}

   +static void
   +warn_unclosed_pragma_omp_target ()
   +{
   +  int i;
   +  for (i = 0; i  current_omp_declare_target_attribute; i++)
   +warning_at (omp_declare_target_location_stack[i], 0,
   + %#pragma omp declare target% without corresponding 
   + %#pragma omp end declare target%);
   +  omp_declare_target_location_stack.release ();
   +}
   +
#include gt-c-c-parser.h
   diff --git a/gcc/testsuite/gcc.dg/gomp/target-3.c 
   b/gcc/testsuite/gcc.dg/gomp/target-3.c
   new file mode 100644
   index 000..d50604f
   --- /dev/null
   +++ b/gcc/testsuite/gcc.dg/gomp/target-3.c
   @@ -0,0 +1,33 @@
   +/* { dg-do compile } */
   +/* { dg-options -fopenmp } */
   +
   +#pragma omp declare target
   +int tgtv = 6;
   +
   +int
   +tgt (void

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-16 Thread Ilya Tocar

On 10 Oct 18:37, Uros Bizjak wrote:
 On Fri, Oct 10, 2014 at 5:47 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
 
 Please recode that horrible first switch statement to:
 
 --cut here--
   rtx (*gen) (rtx, rtx, rtx, rtx) = NULL;
 
   switch (mode)
 {
 case V8HImode:
   if (TARGET_AVX512VL  TARGET_AVX152BW)
 gen = gen_avx512vl_vpermi2varv8hi3;
   break;
 
 ...
 
 case V2DFmode:
   if (TARGET_AVX512VL)
 {
   gen = gen_avx512vl_vpermi2varv2df3;
   maskmode = V2DImode;
 
 The patch is OK with the above improvement.
 
 Thanks,
 Uros.


Will commit version below, if no objections in 24 hours.

---
 gcc/config/i386/i386.c | 292 ++---
 gcc/config/i386/sse.md |  45 
 2 files changed, 255 insertions(+), 82 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aedac19..e1228e3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21411,35 +21411,132 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+/* AVX512F does support 64-byte integer vector operations,
+   thus the longest vector we are faced with is V64QImode.  */
+#define MAX_VECT_LEN   64
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  unsigned char perm[MAX_VECT_LEN];
+  enum machine_mode vmode;
+  unsigned char nelt;
+  bool one_operand_p;
+  bool testing_p;
+};
+
 static bool
-ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1)
+ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1,
+ struct expand_vec_perm_d *d)
 {
-  enum machine_mode mode = GET_MODE (op0);
+  /* ix86_expand_vec_perm_vpermi2 is called from both const and non-const
+ expander, so args are either in d, or in op0, op1 etc.  */
+  enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
+  enum machine_mode maskmode = mode;
+  rtx (*gen) (rtx, rtx, rtx, rtx) = NULL;
+
   switch (mode)
 {
+case V8HImode:
+  if (TARGET_AVX512VL  TARGET_AVX512BW)
+   gen = gen_avx512vl_vpermi2varv8hi3;
+  break;
+case V16HImode:
+  if (TARGET_AVX512VL  TARGET_AVX512BW)
+   gen = gen_avx512vl_vpermi2varv16hi3;
+  break;
+case V32HImode:
+  if (TARGET_AVX512BW)
+   gen = gen_avx512bw_vpermi2varv32hi3;
+  break;
+case V4SImode:
+  if (TARGET_AVX512VL)
+   gen = gen_avx512vl_vpermi2varv4si3;
+  break;
+case V8SImode:
+  if (TARGET_AVX512VL)
+   gen = gen_avx512vl_vpermi2varv8si3;
+  break;
 case V16SImode:
-  emit_insn (gen_avx512f_vpermi2varv16si3 (target, op0,
-  force_reg (V16SImode, mask),
-  op1));
-  return true;
+  if (TARGET_AVX512F)
+   gen = gen_avx512f_vpermi2varv16si3;
+  break;
+case V4SFmode:
+  if (TARGET_AVX512VL)
+   {
+ gen = gen_avx512vl_vpermi2varv4sf3;
+ maskmode = V4SImode;
+   }
+  break;
+case V8SFmode:
+  if (TARGET_AVX512VL)
+   {
+ gen = gen_avx512vl_vpermi2varv8sf3;
+ maskmode = V8SImode;
+   }
+  break;
 case V16SFmode:
-  emit_insn (gen_avx512f_vpermi2varv16sf3 (target, op0,
-  force_reg (V16SImode, mask),
-  op1));
-  return true;
+  if (TARGET_AVX512F)
+   {
+ gen = gen_avx512f_vpermi2varv16sf3;
+ maskmode = V16SImode;
+   }
+  break;
+case V2DImode:
+  if (TARGET_AVX512VL)
+   gen = gen_avx512vl_vpermi2varv2di3;
+  break;
+case V4DImode:
+  if (TARGET_AVX512VL)
+   gen = gen_avx512vl_vpermi2varv4di3;
+  break;
 case V8DImode:
-  emit_insn (gen_avx512f_vpermi2varv8di3 (target, op0,
- force_reg (V8DImode, mask),
- op1));
-  return true;
+  if (TARGET_AVX512F)
+   gen = gen_avx512f_vpermi2varv8di3;
+  break;
+case V2DFmode:
+  if (TARGET_AVX512VL)
+   {
+ gen = gen_avx512vl_vpermi2varv2df3;
+ maskmode = V2DImode;
+   }
+  break;
+case V4DFmode:
+  if (TARGET_AVX512VL)
+   {
+ gen = gen_avx512vl_vpermi2varv4df3;
+ maskmode = V4DImode;
+   }
+  break;
 case V8DFmode:
-  emit_insn (gen_avx512f_vpermi2varv8df3 (target, op0,
- force_reg (V8DImode, mask),
- op1));
-  return true;
+  if (TARGET_AVX512F)
+   {
+ gen = gen_avx512f_vpermi2varv8df3;
+ maskmode = V8DImode;
+   }
+  break;
 default:
-  return false;
+  break;
 }
+
+  if (gen == NULL)
+return false;
+
+  /* ix86_expand_vec_perm_vpermi2 is called from both const and non-const
+ expander, so args are either in d, or in op0, op1 etc.  */
+  if (d

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-10 Thread Ilya Tocar

On 09 Oct 20:51, Jakub Jelinek wrote:
 On Thu, Oct 09, 2014 at 04:15:23PM +0400, Ilya Tocar wrote:
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -21358,32 +21358,169 @@ ix86_expand_int_vcond (rtx operands[])
 return true;
   }
   
  -ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1)
  +ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1, 
  struct expand_vec_perm_d *d)
 
 Too long line, please wrap it.

Fixed.

   {
  -  enum machine_mode mode = GET_MODE (op0);
  +  enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
  +
 switch (mode)
   {
  +case V8HImode:
  +  if (!TARGET_AVX512VL || !TARGET_AVX512BW)
  +   return false;
 
 My strong preference would be:
   enum machine_mode maskmode = mode;
   rtx (*gen) (rtx, rtx, rtx, rtx);
 right below the enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
 line and then inside of the first switch just do:
 ...
 case V16SImode:
   if (!TARGET_AVX512F)
   return false;
   gen = gen_avx512f_vpermi2varv16si3;
   break;
 case V4SFmode:
   if (!TARGET_AVX512VL)
   return false;
   gen = gen_avx512vl_vpermi2varv4sf3;
   maskmode = V4SImode;
   break;
 ...
 etc., then in the mask = line use:
 mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d-nelt, vec));
 and finally instead of the second switch do:
   emit_insn (gen (target, op0, force_reg (maskmode, mask), op1));
   return true;

Updated patch below.

---
 gcc/config/i386/i386.c | 281 +++--
 gcc/config/i386/sse.md |  45 
 2 files changed, 253 insertions(+), 73 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 352ab81..2247da8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21358,33 +21358,132 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+/* AVX512F does support 64-byte integer vector operations,
+   thus the longest vector we are faced with is V64QImode.  */
+#define MAX_VECT_LEN   64
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  unsigned char perm[MAX_VECT_LEN];
+  enum machine_mode vmode;
+  unsigned char nelt;
+  bool one_operand_p;
+  bool testing_p;
+};
+
 static bool
-ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1)
+ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1,
+ struct expand_vec_perm_d *d)
 {
-  enum machine_mode mode = GET_MODE (op0);
+  /* ix86_expand_vec_perm_vpermi2 is called from both const and non-const
+ expander, so args are either in d, or in op0, op1 etc.  */
+  enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
+  enum machine_mode maskmode = mode;
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
   switch (mode)
 {
+case V8HImode:
+  if (!TARGET_AVX512VL || !TARGET_AVX512BW)
+   return false;
+  gen = gen_avx512vl_vpermi2varv8hi3; 
+  break;
+case V16HImode:
+  if (!TARGET_AVX512VL || !TARGET_AVX512BW)
+   return false;
+  gen = gen_avx512vl_vpermi2varv16hi3;
+  break;
+case V32HImode:
+  if (!TARGET_AVX512BW)
+   return false;
+  gen = gen_avx512bw_vpermi2varv32hi3; 
+  break;
+case V4SImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv4si3;
+  break;
+case V8SImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv8si3;
+  break;
 case V16SImode:
-  emit_insn (gen_avx512f_vpermi2varv16si3 (target, op0,
- force_reg (V16SImode, mask),
- op1));
-  return true;
+  if (!TARGET_AVX512F)
+   return false;
+  gen = gen_avx512f_vpermi2varv16si3;
+  break;
+case V4SFmode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv4sf3;
+  maskmode = V4SImode;
+  break;
+case V8SFmode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv8sf3;
+  maskmode = V8SImode;
+  break;
 case V16SFmode:
-  emit_insn (gen_avx512f_vpermi2varv16sf3 (target, op0,
- force_reg (V16SImode, mask),
- op1));
-  return true;
+  if (!TARGET_AVX512F)
+   return false;
+  gen = gen_avx512f_vpermi2varv16sf3;
+  maskmode = V16SImode;
+  break;
+case V2DImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv2di3;
+  break;
+case V4DImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv4di3;
+  break;
 case V8DImode:
-  emit_insn (gen_avx512f_vpermi2varv8di3 (target, op0,
-force_reg (V8DImode, mask), op1));
-  return true;
+  if (!TARGET_AVX512F)
+   return false

Re: [PATCH] Don't call fatal_error before error reporting has been initialized.

2014-10-09 Thread Ilya Tocar

Ping.

On 29 Sep 18:02, Ilya Tocar wrote:
 Hi,
 
 Currently if call to atexit (lto_wrapper_cleanup) fails we
 won't report error as we haven't initialized error-reporting
 infrastructure. This patch moves this call after diagnostic_initialize.
 I hope that we can't  exit inside diagnostic_initialize. Otherwise we
 won't cleanup after it.
 Ok for trunk?
 
 2014-09-29  Ilya Tocar  ilya.to...@intel.com
 
   * lto-wrapper.c (main): Don't call fatal_error before
   diagnostic_initialize.
 ---
  gcc/lto-wrapper.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
 index 08fd090..39e13b8 100644
 --- a/gcc/lto-wrapper.c
 +++ b/gcc/lto-wrapper.c
 @@ -870,13 +870,13 @@ main (int argc, char *argv[])
  
xmalloc_set_program_name (progname);
  
 -  if (atexit (lto_wrapper_cleanup) != 0)
 -fatal_error (atexit failed);
 -
gcc_init_libintl ();
  
diagnostic_initialize (global_dc, 0);
  
 +  if (atexit (lto_wrapper_cleanup) != 0)
 +fatal_error (atexit failed);
 +
if (signal (SIGINT, SIG_IGN) != SIG_IGN)
  signal (SIGINT, fatal_signal);
  #ifdef SIGHUP
 -- 
 1.8.3.1

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-09 Thread Ilya Tocar

Hi,

I think this patch should be split in 2 parts:
V64QI related and non-V64QI related.
This part contains non-V64QI related changes.
Also I've noticed, that not all patterns using VI1_AVX2,
actually have AVX512 versions, so fixed bogus patterns.

On 06 Oct 16:10, Jakub Jelinek wrote:
 On Mon, Oct 06, 2014 at 04:55:28PM +0400, Kirill Yukhin wrote:
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -21364,20 +21364,113 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx 
  op0, rtx mask, rtx op1)
 enum machine_mode mode = GET_MODE (op0);
 switch (mode)
   {
  +  /* There is no byte version of vpermi2.  So we use vpermi2w.  */
  +case V64QImode:
...
 
 I believe this case doesn't belong to this function, other than this
 case ix86_expand_vec_perm_vpermi2 emits always just a single insn, and
 so it should always do that, and there should be a separate function
 that expands the worst case of V64QImode full 2 operand permutation.
 See my previous mail, IMHO it is doable with 5 instructions rather than 7.
 And IMHO we should have a separate function which emits that, supposedly
 one for the constant permutations, one for the variable case (perhaps
 then your 7 insn sequence is best?).
This will be done in following patch.
 
 Also, IMHO rather than building a CONST_VECTOR ahead in each of the callers,
 supposedly ix86_expand_vec_perm_vpermi2 could take the arguments it takes
 right now plus D, either D would be NULL (then it would behave as now), or
 SEL would be NULL, then it would create a CONST_VECTOR on the fly if needed.
 I.e. the function would start with a switch that would just contain the
 if (...)
 return false; 
 hunks plus break; for the success case, then code to generate CONST_VECTOR
 if sel is NULL_RTX from d, and finally another switch with just the emit
 cases.
Done.

 
  +case V8HImode:
  +  if (!TARGET_AVX512VL)
  +   return false;
  +  emit_insn (gen_avx512vl_vpermi2varv8hi3 (target, op0,
  +  force_reg (V8HImode, mask), 
  op1));
  +  return true;
  +case V16HImode:
  +  if (!TARGET_AVX512VL)
  +   return false;
  +  emit_insn (gen_avx512vl_vpermi2varv16hi3 (target, op0,
  +force_reg (V16HImode, mask), op1));
  +  return true;
 
 Aren't these two insns there only if both TARGET_AVX512VL  TARGET_AVX512BW?
 I mean, the ISA pdf mentions both of the CPUID flags simultaneously, and I
 think neither of these depends on the other one in GCC.  That's unlike insns
 where CPUID AVX512VL and AVX512F are mentioned together, because in GCC
 AVX512VL depends on AVX512F.

Good catch!

  @@ -42662,7 +42764,12 @@ expand_vec_perm_blend (struct expand_vec_perm_d *d)
   
 if (d-one_operand_p)
   return false;
  -  if (TARGET_AVX2  GET_MODE_SIZE (vmode) == 32)
  +  if (TARGET_AVX512F  GET_MODE_SIZE (vmode) == 64 
  +  GET_MODE_SIZE (GET_MODE_INNER (vmode)) = 4)
 
 Formatting,  belongs on the second line.
 
Fixed.

  +;
  +  else if (TARGET_AVX512VL)
 
 I'd add  GET_MODE_SIZE (GET_MODE_INNER (vmode) == 64 here.
 AVX512VL is not going to handle 64-bit vectors, or 1024-bit ones,
 and the == 32 and == 16 cases are handled because AVX512VL implies
 TARGET_AVX2 and TARGET_SSE4_1, doesn't it?
 
As TARGET_AVX512VL always implies TARGET_AVX2 and TARGET_SSE4_1 and
works only on 32/16-byte mode this case is redundant, so I've removed
it.

  @@ -43012,6 +43125,17 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d 
  *d)
return false;
  }
  }
  +  else if (GET_MODE_SIZE (d-vmode) == 64)
  +   {
  + if (!TARGET_AVX512BW)
  +   return false;
  + if (vmode == V64QImode)
  +   {
  + for (i = 0; i  nelt; ++i)
  +   if ((d-perm[i] ^ i)  (nelt / 4))
  + return false;
 
 Missing comment, I'd duplicate the
   /* vpshufb only works intra lanes, it is not
  possible to shuffle bytes in between the lanes.  */
 comment there.
 
Done.

  @@ -43109,12 +43237,24 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
rtx (*gen) (rtx, rtx) = NULL;
switch (d-vmode)
  {
  +   case V64QImode:
  + if (TARGET_AVX512VL)
 
 VL?  Isn't that BW?
 
  +   gen = gen_avx512bw_vec_dupv64qi;
  + break;
  case V32QImode:
gen = gen_avx2_pbroadcastv32qi_1;
break;
  +   case V32HImode:
  + if (TARGET_AVX512VL)
 
 Ditto.

Fixed.

  @@ -43216,6 +43368,14 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   mode = V8DImode;
 else if (mode == V16SFmode)
   mode = V16SImode;
  +  else if (mode == V4DFmode)
  +mode = V4DImode;
  +  else if (mode == V2DFmode)
  +mode = V2DImode;
  +  else if (mode == V8SFmode)
  +mode = V8SImode;
  +  else if (mode == V4SFmode)
  +mode = V4SImode;
 for (i = 0; i  nelt; ++i)
   vec[i] = GEN_INT (d-perm[i]);
 rtx mask = gen_rtx_CONST_VECTOR

Re: [PATCH,i386] Fix adxintrin on mingw.

2014-10-06 Thread Ilya Tocar

On 03 Oct 07:53, H.J. Lu wrote:
 On Fri, Oct 3, 2014 at 6:46 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
  On 02 Oct 07:41, H.J. Lu wrote:
  On Thu, Oct 2, 2014 at 7:29 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  The same is true for x32.  Can you add a testcase to show it
  fails on x32 without the fix?
 
 
  This could only be done with runtime test.
  I've had troubles running sde (emulator) on x32 enabled system,
  but replacing long long with int in intrinsic signature will cause
  adx-addcarryx64-2.c to fail under sde on 64 bits. I believe it will
  also fail on sde+{win,x32} or real hardware, when it's available.
 
 
 Can we scan the assembly output for the wrong instruction?

I don't think so.
Incorrect instruction is movl (instead of movq) for value load.
However with x32 we also generate movl for pointer moves,
So scanning for movl doesn't make sense. And hardcoding particular
movl will make test quite fragile.

Re: [RFC PATCH] Enable V32HI/V64QI const permutations

2014-10-06 Thread Ilya Tocar

On 06 Oct 09:08, Jakub Jelinek wrote:
 On Fri, Oct 03, 2014 at 04:39:08PM +0200, Jakub Jelinek wrote:
  Just to stress the new testcases some more, I've enabled the
  vec_perm_const{32hi,64qi} patterns.
  Got several ICEs in expand_vec_perm_broadcast_1,
  on the final gcc_unreachable () in the function.  That function
  is only called if it couldn't be broadcasted in a single insn,
  which I believe for TARGET_AVX512BW must be always possible.
  Shall I look at this, or do you plan to address this in the near future?
 
 Speaking of -mavx512{bw,vl,f}, there apparently is a full 2 operand shuffle
 for V32HI, V16S[IF], V8D[IF], so the only one instruction full
 2 operand shuffle we are missing is V64QI, right?
 
 What would be best worst case sequence for that?
 
 I'd think 2x vpermi2w, 2x vpshufb and one vpor could achieve that,
 (first vpermi2w would put the even bytes into the right word positions
 (i.e. at the right position or one above it), second vpermi2w would put
 the odd bytes into the right word positions (i.e. at the right position
 or one below it),
I think we will also need to spend insns converting byte-sized mask into
word-sized mask.
 each vpshufb would swap the byte pairs where necessary
 and zero out the other (odd or even) byte,
This will probably also require vpshufb mask preparation (setting high
bit for zeroing)
 and vpor merge the results), do you have something better?
Currently (in branch) it's implemented as  2x vpermi2w + 4x shift +
blend. 3 shifts to prepare masks for vpermi2w,
2 vpermi2w to put odd/even bytes in low part of right position,
shift to move low part into high part and finally blend with
101010.. mask to get a result.
 What about arbitrary one operand V64QI const permutation?

Currently it loads const-vector into register and uses the same codepath
as non-const version (this probably can be improved).

Re: [PATCH,i386] Fix adxintrin on mingw.

2014-10-03 Thread Ilya Tocar

On 02 Oct 07:41, H.J. Lu wrote:
 On Thu, Oct 2, 2014 at 7:29 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Hi,
 
  sizeof (long) == 4 on windows, so we should use long long as param type.
  Patch below does it.
 
 The same is true for x32.  Can you add a testcase to show it
 fails on x32 without the fix?


This could only be done with runtime test.
I've had troubles running sde (emulator) on x32 enabled system,
but replacing long long with int in intrinsic signature will cause
adx-addcarryx64-2.c to fail under sde on 64 bits. I believe it will
also fail on sde+{win,x32} or real hardware, when it's available.

  Ok for trunk?
 
  2014-10-02  Ilya Tocar  ilya.to...@intel.com
 
  * config/i386/adxintrin.h (_subborrow_u64): Use long long for param
  type.
  (_addcarry_u64): Ditto.
  (_addcarryx_u64): Ditto.
 
  ---
   gcc/config/i386/adxintrin.h | 12 ++--
   1 file changed, 6 insertions(+), 6 deletions(-)
 
  diff --git a/gcc/config/i386/adxintrin.h b/gcc/config/i386/adxintrin.h
  index 8f2c01a..00a9b86 100644
  --- a/gcc/config/i386/adxintrin.h
  +++ b/gcc/config/i386/adxintrin.h
  @@ -55,24 +55,24 @@ _addcarryx_u32 (unsigned char __CF, unsigned int __X,
   #ifdef __x86_64__
   extern __inline unsigned char
   __attribute__((__gnu_inline__, __always_inline__, __artificial__))
  -_subborrow_u64 (unsigned char __CF, unsigned long __X,
  -   unsigned long __Y, unsigned long long *__P)
  +_subborrow_u64 (unsigned char __CF, unsigned long long __X,
  +   unsigned long long __Y, unsigned long long *__P)
   {
   return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
   }
 
   extern __inline unsigned char
   __attribute__((__gnu_inline__, __always_inline__, __artificial__))
  -_addcarry_u64 (unsigned char __CF, unsigned long __X,
  -  unsigned long __Y, unsigned long long *__P)
  +_addcarry_u64 (unsigned char __CF, unsigned long long __X,
  +  unsigned long long __Y, unsigned long long *__P)
   {
   return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
   }
 
   extern __inline unsigned char
   __attribute__((__gnu_inline__, __always_inline__, __artificial__))
  -_addcarryx_u64 (unsigned char __CF, unsigned long __X,
  -   unsigned long __Y, unsigned long long *__P)
  +_addcarryx_u64 (unsigned char __CF, unsigned long long __X,
  +   unsigned long long __Y, unsigned long long *__P)
   {
   return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
   }
  --
  1.8.3.1
 
 
 
 
 -- 
 H.J.

[PING][PATCH] Warn about unclosed pragma omp declare target.

2014-10-02 Thread Ilya Tocar

Ping.
On 15 Aug 16:26, Ilya Tocar wrote:
 Ping.
 
 On 29 Jul 18:45, Ilya Tocar wrote:
  Hi,
  
  As discussed here in https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
  Gcc should complain about pragma omp declare target without
  corresponding pragma omp end declare target. This patch adds a warning
  for those cases.
  Bootstraps/passes make-check.
  Ok for trunk?
  
  ChangeLog:
  
  2014-07-29  Ilya Tocar  ilya.to...@intel.com
  
  * c-decl.c (omp_declare_target_location_stack): New.
  * c-lang.h (omp_declare_target_location_stack): Declare.
  * c-parser.c (warn_unclosed_pragma_omp_target): New.
  (c_parser_translation_unit): Call it.
  (c_parser_omp_declare_target): Remeber location.
  (c_parser_omp_end_declare_target): Forget location.
  
  And ChangeLog for testsuite:
  
  2014-07-29  Ilya Tocar  ilya.to...@intel.com
  
  * gcc.dg/gomp//target-3.c: New testcase.
  
  ---
   gcc/c/c-decl.c   |  3 +++
   gcc/c/c-lang.h   |  3 +++
   gcc/c/c-parser.c | 22 +-
   gcc/testsuite/gcc.dg/gomp/target-3.c | 33 +
   4 files changed, 60 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
  
  diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
  index 2a4b439..2dd5b2c 100644
  --- a/gcc/c/c-decl.c
  +++ b/gcc/c/c-decl.c
  @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = VOIDmode;
   /* If non-zero, implicit omp declare target attribute is added into the
  attribute lists.  */
   int current_omp_declare_target_attribute;
  +
  +/* Holds locations of currently open omp declare target pragmas.  */
  +veclocation_t omp_declare_target_location_stack;
   
   /* Each c_binding structure describes one binding of an identifier to
  a decl.  All the decls in a scope - irrespective of namespace - are
  diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
  index e974906..cef995c 100644
  --- a/gcc/c/c-lang.h
  +++ b/gcc/c/c-lang.h
  @@ -59,4 +59,7 @@ struct GTY(()) language_function {
  attribute lists.  */
   extern GTY(()) int current_omp_declare_target_attribute;
   
  +/* Holds locations of currently open omp declare target pragmas.  */
  +extern veclocation_t omp_declare_target_location_stack;
  +
   #endif /* ! GCC_C_LANG_H */
  diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
  index e32bf04..0b96fe9 100644
  --- a/gcc/c/c-parser.c
  +++ b/gcc/c/c-parser.c
  @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd (c_parser *, 
  enum pragma_context);
   static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
   static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
   
  +static void warn_unclosed_pragma_omp_target ();
  +
   /* Parse a translation unit (C90 6.7, C99 6.9).
   
  translation-unit:
  @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
  }
 while (c_parser_next_token_is_not (parser, CPP_EOF));
   }
  +
  +  warn_unclosed_pragma_omp_target ();
   }
   
   /* Parse an external declaration (C90 6.7, C99 6.9).
  @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser *parser, tree 
  fndecl, tree parms,
   static void
   c_parser_omp_declare_target (c_parser *parser)
   {
  +  location_t loc = c_parser_peek_token (parser)-location;
 c_parser_skip_to_pragma_eol (parser);
 current_omp_declare_target_attribute++;
  +  omp_declare_target_location_stack.safe_push (loc);
   }
   
   static void
  @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser *parser)
   error_at (loc, %#pragma omp end declare target% without 
  corresponding 
 %#pragma omp declare target%);
 else
  -current_omp_declare_target_attribute--;
  +{
  +  current_omp_declare_target_attribute--;
  +  omp_declare_target_location_stack.pop ();
  +}
   }
   
   
  @@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, c_parser 
  *parser, tree initial_index,
 return value_tree;
   }
   
  +static void
  +warn_unclosed_pragma_omp_target ()
  +{
  +  int i;
  +  for (i = 0; i  current_omp_declare_target_attribute; i++)
  +warning_at (omp_declare_target_location_stack[i], 0,
  +   %#pragma omp declare target% without corresponding 
  +   %#pragma omp end declare target%);
  +  omp_declare_target_location_stack.release ();
  +}
  +
   #include gt-c-c-parser.h
  diff --git a/gcc/testsuite/gcc.dg/gomp/target-3.c 
  b/gcc/testsuite/gcc.dg/gomp/target-3.c
  new file mode 100644
  index 000..d50604f
  --- /dev/null
  +++ b/gcc/testsuite/gcc.dg/gomp/target-3.c
  @@ -0,0 +1,33 @@
  +/* { dg-do compile } */
  +/* { dg-options -fopenmp } */
  +
  +#pragma omp declare target
  +int tgtv = 6;
  +
  +int
  +tgt (void)
  +{
  +  tgtv++;
  +  return 0;
  +}
  +#pragma omp end declare target
  +
  +#pragma omp declare target/* { dg-warning '#pragma omp declare 
  target' without corresponding

[PATCH,i386] Fix adxintrin on mingw.

2014-10-02 Thread Ilya Tocar

Hi,

sizeof (long) == 4 on windows, so we should use long long as param type.
Patch below does it.
Ok for trunk?

2014-10-02  Ilya Tocar  ilya.to...@intel.com

* config/i386/adxintrin.h (_subborrow_u64): Use long long for param
type.
(_addcarry_u64): Ditto.
(_addcarryx_u64): Ditto.

---
 gcc/config/i386/adxintrin.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/adxintrin.h b/gcc/config/i386/adxintrin.h
index 8f2c01a..00a9b86 100644
--- a/gcc/config/i386/adxintrin.h
+++ b/gcc/config/i386/adxintrin.h
@@ -55,24 +55,24 @@ _addcarryx_u32 (unsigned char __CF, unsigned int __X,
 #ifdef __x86_64__
 extern __inline unsigned char
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_subborrow_u64 (unsigned char __CF, unsigned long __X,
-   unsigned long __Y, unsigned long long *__P)
+_subborrow_u64 (unsigned char __CF, unsigned long long __X,
+   unsigned long long __Y, unsigned long long *__P)
 {
 return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
 }
 
 extern __inline unsigned char
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_addcarry_u64 (unsigned char __CF, unsigned long __X,
-  unsigned long __Y, unsigned long long *__P)
+_addcarry_u64 (unsigned char __CF, unsigned long long __X,
+  unsigned long long __Y, unsigned long long *__P)
 {
 return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
 }
 
 extern __inline unsigned char
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_addcarryx_u64 (unsigned char __CF, unsigned long __X,
-   unsigned long __Y, unsigned long long *__P)
+_addcarryx_u64 (unsigned char __CF, unsigned long long __X,
+   unsigned long long __Y, unsigned long long *__P)
 {
 return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
 }
-- 
1.8.3.1

Re: [PATCH][PING] PR62120

2014-09-30 Thread Ilya Tocar

Ping.

On 15 Sep 18:43, Ilya Tocar wrote:
 On 01 Sep 18:38, Ilya Tocar wrote:
   Please mention the PR in the ChangeLog entry and add some testcases
   (can be gcc.target/i386/, but we should have it tested).
   Does this change anything on say register short sil __asm (sil); in 
   32-bit
   mode (when it IMHO should be rejected too?)?
  
  Do we support sil at all? In i386.h i see:
  
  /* Note we are omitting these since currently I don't know how
  to get gcc to use these, since they want the same but different
  number as al, and ax.
  */
  #define QI_REGISTER_NAMES \
  {al, dl, cl, bl, sil, dil, bpl, spl,}
  
  And gcc doesn't recognize sil.
  
  Added testcase, and fixed avx512f-additional-reg-names.c to be valid on
  32 bits. Ok for trunk?
 
 
 Slightly updated tests.
 Ok for trunk?
 
 gcc/
 
 2014-09-15  Ilya Tocar  ilya.to...@intel.com
 
PR middle-end/62120
* varasm.c (decode_reg_name_and_count): Check availability for
registers from ADDITIONAL_REGISTER_NAMES.
 
 Testsuite/
 
 2014-09-15  Ilya Tocar  ilya.to...@intel.com
 
PR middle-end/62120
* gcc.target/i386/avx512f-additional-reg-names.c: Use register vaild
in 32-bit mode.
* gcc.target/i386/pr62120.c: New.
 
 ---
  gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c | 2 +-
  gcc/testsuite/gcc.target/i386/pr62120.c  | 8 
  gcc/varasm.c | 5 +++--
  3 files changed, 12 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/i386/pr62120.c
 
 diff --git a/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c 
 b/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
 index 164a1de..98a9052 100644
 --- a/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
 +++ b/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
 @@ -3,7 +3,7 @@
  
  void foo ()
  {
 -  register int zmm_var asm (zmm9) __attribute__((unused));
 +  register int zmm_var asm (zmm7) __attribute__((unused));
  
__asm__ __volatile__(vxorpd %%zmm0, %%zmm0, %%zmm7\n : : : zmm7 );
  }
 diff --git a/gcc/testsuite/gcc.target/i386/pr62120.c 
 b/gcc/testsuite/gcc.target/i386/pr62120.c
 new file mode 100644
 index 000..bfb8c47
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/pr62120.c
 @@ -0,0 +1,8 @@
 +/* { dg-do compile } */
 +/* { dg-options -mno-sse } */
 +
 +void foo ()
 +{
 +  register int zmm_var asm (ymm9);/* { dg-error invalid register name } 
 */
 +  register int zmm_var2 asm (23);/* { dg-error invalid register name } */
 +}
 diff --git a/gcc/varasm.c b/gcc/varasm.c
 index cd4a230..9c12b81 100644
 --- a/gcc/varasm.c
 +++ b/gcc/varasm.c
 @@ -888,7 +888,7 @@ decode_reg_name_and_count (const char *asmspec, int 
 *pnregs)
if (asmspec[0] != 0  i  0)
   {
 i = atoi (asmspec);
 -   if (i  FIRST_PSEUDO_REGISTER  i = 0)
 +   if (i  FIRST_PSEUDO_REGISTER  i = 0  reg_names[i][0])
   return i;
 else
   return -2;
 @@ -925,7 +925,8 @@ decode_reg_name_and_count (const char *asmspec, int 
 *pnregs)
  
   for (i = 0; i  (int) ARRAY_SIZE (table); i++)
 if (table[i].name[0]
 -! strcmp (asmspec, table[i].name))
 +! strcmp (asmspec, table[i].name)
 +reg_names[table[i].number][0])
   return table[i].number;
}
  #endif /* ADDITIONAL_REGISTER_NAMES */
 -- 
 1.8.3.1

[PATCH] Don't call fatal_error before error reporting has been initialized.

2014-09-29 Thread Ilya Tocar

Hi,

Currently if call to atexit (lto_wrapper_cleanup) fails we
won't report error as we haven't initialized error-reporting
infrastructure. This patch moves this call after diagnostic_initialize.
I hope that we can't  exit inside diagnostic_initialize. Otherwise we
won't cleanup after it.
Ok for trunk?

2014-09-29  Ilya Tocar  ilya.to...@intel.com

* lto-wrapper.c (main): Don't call fatal_error before
diagnostic_initialize.
---
 gcc/lto-wrapper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 08fd090..39e13b8 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -870,13 +870,13 @@ main (int argc, char *argv[])
 
   xmalloc_set_program_name (progname);
 
-  if (atexit (lto_wrapper_cleanup) != 0)
-fatal_error (atexit failed);
-
   gcc_init_libintl ();
 
   diagnostic_initialize (global_dc, 0);
 
+  if (atexit (lto_wrapper_cleanup) != 0)
+fatal_error (atexit failed);
+
   if (signal (SIGINT, SIG_IGN) != SIG_IGN)
 signal (SIGINT, fatal_signal);
 #ifdef SIGHUP
-- 
1.8.3.1

Re: [PATCH] fix hardreg_cprop to honor HARD_REGNO_MODE_OK.

2014-09-26 Thread Ilya Tocar

On 25 Sep 13:14, Jeff Law wrote:
 On 09/01/14 04:29, Ilya Tocar wrote:
 
 AVX512 added new 16 xmm registers (xmm16-xmm31).
 Those registers require evex encoding.
 Only 512-bit wide versions of instructions have evex encoding with
 avx512f, but all versions have it with avx512vl.
 Most instructions have same macroized pattern for 128/256/512 vector
 length. They all use constraint 'v', which corresponds to
 class ALL_SSE_REGS (xmm0 - xmm31). To disallow e. g. xmm20 in
 256-bit case (avx512f) and allow it only in avx512vl case we have
 HARD_REGNO_MODE_OK checking for regno being evex-only and
 disallowing it if mode is not 512-bit.
 Generally this kind of thing has been handled by splitting the register
 class into two classes.  I strongly suspect there are numerous places where
 we assume that two regs in the same class are interchangeable.
 I'm not sure that there are many places where we replace hard regs
 without checks. E. g. in regrename we have HARD_REGNO_RENAME_OK.
 As far as I understand, idea behind HARD_REGNO_RENAME_OK is that we
 should always check when substituting hard reg. Why is regcprop
 different, and what's the point of HARD_REGNO_MODE_OK if it is ignored
 by some passes?
 
 
 I realize that's going to require some work in the x86 machine description,
 but I think that's going to be a much better approach and save you work in
 the long run.
 
 
 This will approximately double sse.md, as we will need to split all
 patterns with 512-bit versions in 2 (512 and 128/256 cases) and play
 games with enabling/disabling alternatives depending on flags.
 Are you sure that this better than honoring HARD_REGNO_MODE_OK?
 As far as I understand, honoring  HARD_REGNO_MODE_OK shouldn't produce
 worse code.
 I don't see how it doubles the size.  You split the class into two classes.
 Whatever letter your second class has, you use it in conjunction with 'v'
 that you're already using.  Note you do not need different alternatives, you
 use them in the same alternative.
I'm not sure how will this help. Consider
addV2DF,V4DF,V8DF, right now they are described in one pattern.
Now in AVX512F (without AVX512VL) case we can use xmm16 for V8DF, but not for
V2DF,V4DF. If we keep them in one pattern, they will have same
alternatives for all modes. So we will need to either
split V2DF,V4DF into separate pattern (doubling number of patterns), or
disallow particular modes depending on flags (what we do now).

 
 It's not a question of performance, but of design.
Obviously, but I still fail to see why honoring HARD_REGNO_MODE_OK is
bad design. I suspect that even without avx512 changes not honoring it will
bite us sooner or later.
 I suspect you're really
 just at the tip of the iceberg with this stuff if you continue to go down
 the path of having registers in the same class, some of which are
 allocatable and some of which are not.
Having class where some registers are not available is an old approach:
Consider SSE_REGS class, where half of registers is not available in
32-bit case. Problem is with different modes being valid in those
registers, depending on flags. And it worked fine for previous
~year in gcc 4.9. In my opinion if we check in original patch we will
harm no one, and fix correctness problem. If we later discover some new
problem, that is not fixable by simple patch, we may rework all of avx512
implementation. As all bugs of this kind will never generate incorrect
code (all error will be caught by assembler), I see no reason not to
check it in.
 
 The other approach that I believe has been taken has been to mark the new
 registers as fixed when compiling for hardware where they're not available.
 But I'm not sure offhand if that would be sufficient to fix this problem.
It will not help. Registers are available. Just some modes are not
supported.

Re: [PATCH] PR62120

2014-09-15 Thread Ilya Tocar

On 01 Sep 18:38, Ilya Tocar wrote:
  Please mention the PR in the ChangeLog entry and add some testcases
  (can be gcc.target/i386/, but we should have it tested).
  Does this change anything on say register short sil __asm (sil); in 32-bit
  mode (when it IMHO should be rejected too?)?
 
 Do we support sil at all? In i386.h i see:
 
 /* Note we are omitting these since currently I don't know how
 to get gcc to use these, since they want the same but different
 number as al, and ax.
 */
 #define QI_REGISTER_NAMES \
 {al, dl, cl, bl, sil, dil, bpl, spl,}
 
 And gcc doesn't recognize sil.
 
 Added testcase, and fixed avx512f-additional-reg-names.c to be valid on
 32 bits. Ok for trunk?


Slightly updated tests.
Ok for trunk?

gcc/

2014-09-15  Ilya Tocar  ilya.to...@intel.com

   PR middle-end/62120
   * varasm.c (decode_reg_name_and_count): Check availability for
   registers from ADDITIONAL_REGISTER_NAMES.

Testsuite/

2014-09-15  Ilya Tocar  ilya.to...@intel.com

   PR middle-end/62120
   * gcc.target/i386/avx512f-additional-reg-names.c: Use register vaild
   in 32-bit mode.
   * gcc.target/i386/pr62120.c: New.

---
 gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr62120.c  | 8 
 gcc/varasm.c | 5 +++--
 3 files changed, 12 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr62120.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c 
b/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
index 164a1de..98a9052 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
@@ -3,7 +3,7 @@
 
 void foo ()
 {
-  register int zmm_var asm (zmm9) __attribute__((unused));
+  register int zmm_var asm (zmm7) __attribute__((unused));
 
   __asm__ __volatile__(vxorpd %%zmm0, %%zmm0, %%zmm7\n : : : zmm7 );
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr62120.c 
b/gcc/testsuite/gcc.target/i386/pr62120.c
new file mode 100644
index 000..bfb8c47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr62120.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options -mno-sse } */
+
+void foo ()
+{
+  register int zmm_var asm (ymm9);/* { dg-error invalid register name } */
+  register int zmm_var2 asm (23);/* { dg-error invalid register name } */
+}
diff --git a/gcc/varasm.c b/gcc/varasm.c
index cd4a230..9c12b81 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -888,7 +888,7 @@ decode_reg_name_and_count (const char *asmspec, int *pnregs)
   if (asmspec[0] != 0  i  0)
{
  i = atoi (asmspec);
- if (i  FIRST_PSEUDO_REGISTER  i = 0)
+ if (i  FIRST_PSEUDO_REGISTER  i = 0  reg_names[i][0])
return i;
  else
return -2;
@@ -925,7 +925,8 @@ decode_reg_name_and_count (const char *asmspec, int *pnregs)
 
for (i = 0; i  (int) ARRAY_SIZE (table); i++)
  if (table[i].name[0]
-  ! strcmp (asmspec, table[i].name))
+  ! strcmp (asmspec, table[i].name)
+  reg_names[table[i].number][0])
return table[i].number;
   }
 #endif /* ADDITIONAL_REGISTER_NAMES */
-- 
1.8.3.1

[PATCH,i386] Properly check xgetbv for zmm support.

2014-09-15 Thread Ilya Tocar

Hi,

Currently we don't check zmm/mask-registers related bits in xgetbv
output, when detecting native cpu. Patch below fixes it.
Bootstraps/passes make check.
Ok for trunk?

ChangeLog:

gcc/
2014-09-15  Ilya Tocar  ilya.to...@intel.com

* config/i386/driver-i386.c (host_detect_local_cpu): Detect lack of 
zmm/k regs support.

testsuite/
2014-09-15  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx512f-os-support.h: Remove magic number.

---
 gcc/config/i386/driver-i386.c  | 17 +
 gcc/testsuite/gcc.target/i386/avx512f-os-support.h | 17 +++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index af3088e..4d6bf83 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -533,6 +533,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
 #define XSTATE_FP  0x1
 #define XSTATE_SSE 0x2
 #define XSTATE_YMM 0x4
+#define XSTATE_OPMASK  0x20
+#define XSTATE_ZMM 0x40
+#define XSTATE_HI_ZMM  0x80
   if (has_osxsave)
 asm (.byte 0x0f; .byte 0x01; .byte 0xd0
 : =a (eax), =d (edx)
@@ -554,6 +557,20 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
   has_xsavec = 0;
 }
 
+  if (!has_osxsave
+  || (eax 
+ (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | 
XSTATE_HI_ZMM))
+ != (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | 
XSTATE_HI_ZMM))
+{
+  has_avx512f = 0;
+  has_avx512er = 0;
+  has_avx512pf = 0;
+  has_avx512cd = 0;
+  has_avx512dq = 0;
+  has_avx512bw = 0;
+  has_avx512vl = 0;
+}
+
   if (!arch)
 {
   if (vendor == signature_AMD_ebx
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-os-support.h 
b/gcc/testsuite/gcc.target/i386/avx512f-os-support.h
index deefa5e..2f1ed03 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-os-support.h
+++ b/gcc/testsuite/gcc.target/i386/avx512f-os-support.h
@@ -1,10 +1,23 @@
 /* Check if the OS supports executing AVX512F instructions.  */
 
+#define XCR_XFEATURE_ENABLED_MASK  0x0
+
+#define XSTATE_FP  0x1
+#define XSTATE_SSE 0x2
+#define XSTATE_YMM 0x4
+#define XSTATE_OPMASK  0x20
+#define XSTATE_ZMM 0x40
+#define XSTATE_HI_ZMM  0x80
+
 static int
 avx512f_os_support (void)
 {
   unsigned int eax, edx;
+  unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
+  unsigned int mask = XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK
+ | XSTATE_ZMM | XSTATE_HI_ZMM;
+
+  __asm__ (xgetbv : =a (eax), =d (edx) : c (ecx));
 
-  __asm__ (xgetbv : =a (eax), =d (edx) : c (0));
-  return (eax  230) == 230;
+  return ((eax  mask) == mask);
 }
-- 
1.8.3.1

Add missing Broadwell intrinsics.

2014-09-02 Thread Ilya Tocar

Hi,

Along with intrinsics for adcx/adox (supported since 4.8) ICC also
added intrinsics for adc/sbb [1]. This patch adds them.
Bootstraps/passes make-check. Ok for trunk?

[1] 
http://www.xlsoft.com/jp/products/intel/compilers/ccm/2013/Release_Notes_u3.pdf

ChangeLog below:

gcc/

2014-09-02  Ilya Tocar  ilya.to...@intel.com

* config/i386/adxintrin.h (_subborrow_u32): New.
(_addcarry_u32): Ditto.
(_subborrow_u64): Ditto.
(_addcarry_u64): Ditto.
* config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SBB32,
IX86_BUILTIN_SBB64.
(ix86_init_mmx_sse_builtins): Handle __builtin_ia32_sbb_u32,
__builtin_ia32_sbb_u64


testsuite/

2014-09-02  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/adx-addcarryx32-1.c: Test addcarry, subborrow.
* gcc.target/i386/adx-addcarryx32-2.c: Ditto.
* gcc.target/i386/adx-addcarryx32-3.c: Ditto.
* gcc.target/i386/adx-addcarryx64-1.c: Ditto.
* gcc.target/i386/adx-addcarryx64-2.c: Ditto.
* gcc.target/i386/adx-addcarryx64-3.c: Ditto.

---
 gcc/config/i386/adxintrin.h   | 32 +++
 gcc/config/i386/i386.c| 22 
 gcc/testsuite/gcc.target/i386/adx-addcarryx32-1.c |  5 +++-
 gcc/testsuite/gcc.target/i386/adx-addcarryx32-2.c | 27 +++
 gcc/testsuite/gcc.target/i386/adx-addcarryx32-3.c |  5 +++-
 gcc/testsuite/gcc.target/i386/adx-addcarryx64-1.c |  5 +++-
 gcc/testsuite/gcc.target/i386/adx-addcarryx64-2.c | 27 +++
 gcc/testsuite/gcc.target/i386/adx-addcarryx64-3.c |  5 +++-
 8 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/adxintrin.h b/gcc/config/i386/adxintrin.h
index 6118900..8f2c01a 100644
--- a/gcc/config/i386/adxintrin.h
+++ b/gcc/config/i386/adxintrin.h
@@ -30,6 +30,22 @@
 
 extern __inline unsigned char
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_subborrow_u32 (unsigned char __CF, unsigned int __X,
+   unsigned int __Y, unsigned int *__P)
+{
+return __builtin_ia32_sbb_u32 (__CF, __Y, __X, __P);
+}
+
+extern __inline unsigned char
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_addcarry_u32 (unsigned char __CF, unsigned int __X,
+  unsigned int __Y, unsigned int *__P)
+{
+return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
+}
+
+extern __inline unsigned char
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _addcarryx_u32 (unsigned char __CF, unsigned int __X,
unsigned int __Y, unsigned int *__P)
 {
@@ -39,6 +55,22 @@ _addcarryx_u32 (unsigned char __CF, unsigned int __X,
 #ifdef __x86_64__
 extern __inline unsigned char
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_subborrow_u64 (unsigned char __CF, unsigned long __X,
+   unsigned long __Y, unsigned long long *__P)
+{
+return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
+}
+
+extern __inline unsigned char
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_addcarry_u64 (unsigned char __CF, unsigned long __X,
+  unsigned long __Y, unsigned long long *__P)
+{
+return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
+}
+
+extern __inline unsigned char
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _addcarryx_u64 (unsigned char __CF, unsigned long __X,
unsigned long __Y, unsigned long long *__P)
 {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3e4c93e..91b5d06 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28778,6 +28778,10 @@ enum ix86_builtins
   IX86_BUILTIN_ADDCARRYX32,
   IX86_BUILTIN_ADDCARRYX64,
 
+  /* ADC/SBB instructions.  */
+  IX86_BUILTIN_SBB32,
+  IX86_BUILTIN_SBB64,
+
   /* FSGSBASE instructions.  */
   IX86_BUILTIN_RDFSBASE32,
   IX86_BUILTIN_RDFSBASE64,
@@ -31213,6 +31217,14 @@ ix86_init_mmx_sse_builtins (void)
   UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PULONGLONG,
   IX86_BUILTIN_ADDCARRYX64);
 
+  /* ADX/SBB */
+  def_builtin (0, __builtin_ia32_sbb_u32,
+  UCHAR_FTYPE_UCHAR_UINT_UINT_PUNSIGNED, IX86_BUILTIN_SBB32);
+  def_builtin (OPTION_MASK_ISA_64BIT,
+  __builtin_ia32_sbb_u64,
+  UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PULONGLONG,
+  IX86_BUILTIN_SBB64);
+
   /* Read/write FLAGS.  */
   def_builtin (~OPTION_MASK_ISA_64BIT, __builtin_ia32_readeflags_u32,
UNSIGNED_FTYPE_VOID, IX86_BUILTIN_READ_FLAGS);
@@ -35617,6 +35629,16 @@ rdseed_step:
   emit_insn (gen_zero_extendqisi2 (target, op2));
   return target;
 
+case IX86_BUILTIN_SBB32:
+  icode = CODE_FOR_subsi3_carry;
+  mode0 = SImode;
+  goto addcarryx;
+
+case IX86_BUILTIN_SBB64:
+  icode = CODE_FOR_subdi3_carry;
+  mode0 = DImode;
+  goto addcarryx;
+
 case IX86_BUILTIN_ADDCARRYX32:
   icode = TARGET_ADX

Re: [PATCH] fix hardreg_cprop to honor HARD_REGNO_MODE_OK.

2014-09-01 Thread Ilya Tocar

 
 AVX512 added new 16 xmm registers (xmm16-xmm31).
 Those registers require evex encoding.
 Only 512-bit wide versions of instructions have evex encoding with
 avx512f, but all versions have it with avx512vl.
 Most instructions have same macroized pattern for 128/256/512 vector
 length. They all use constraint 'v', which corresponds to
 class ALL_SSE_REGS (xmm0 - xmm31). To disallow e. g. xmm20 in
 256-bit case (avx512f) and allow it only in avx512vl case we have
 HARD_REGNO_MODE_OK checking for regno being evex-only and
 disallowing it if mode is not 512-bit.
 Generally this kind of thing has been handled by splitting the register
 class into two classes.  I strongly suspect there are numerous places where
 we assume that two regs in the same class are interchangeable.
I'm not sure that there are many places where we replace hard regs
without checks. E. g. in regrename we have HARD_REGNO_RENAME_OK.
As far as I understand, idea behind HARD_REGNO_RENAME_OK is that we
should always check when substituting hard reg. Why is regcprop
different, and what's the point of HARD_REGNO_MODE_OK if it is ignored
by some passes?

 
 I realize that's going to require some work in the x86 machine description,
 but I think that's going to be a much better approach and save you work in
 the long run.


This will approximately double sse.md, as we will need to split all
patterns with 512-bit versions in 2 (512 and 128/256 cases) and play
games with enabling/disabling alternatives depending on flags.
Are you sure that this better than honoring HARD_REGNO_MODE_OK?
As far as I understand, honoring  HARD_REGNO_MODE_OK shouldn't produce
worse code.

[PATCH] PR62120

2014-09-01 Thread Ilya Tocar

Hi, this patch adds checks for registers availability, when
alternative/numeric name is used.
Bootstraps/passes make-check on x86-64.
Ok for trunk?

ChangeLog:

gcc/

2014-09-01  Ilya Tocar  ilya.to...@intel.com

* varasm.c (decode_reg_name_and_count): Check availability for
registers from ADDITIONAL_REGISTER_NAMES.


---
 gcc/varasm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/varasm.c b/gcc/varasm.c
index 9d8602b..1d6f79f 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -888,7 +888,7 @@ decode_reg_name_and_count (const char *asmspec, int *pnregs)
   if (asmspec[0] != 0  i  0)
{
  i = atoi (asmspec);
- if (i  FIRST_PSEUDO_REGISTER  i = 0)
+ if (i  FIRST_PSEUDO_REGISTER  i = 0  reg_names[i][0])
return i;
  else
return -2;
@@ -925,7 +925,8 @@ decode_reg_name_and_count (const char *asmspec, int *pnregs)
 
for (i = 0; i  (int) ARRAY_SIZE (table); i++)
  if (table[i].name[0]
-  ! strcmp (asmspec, table[i].name))
+  ! strcmp (asmspec, table[i].name)
+  reg_names[table[i].number][0])
return table[i].number;
   }
 #endif /* ADDITIONAL_REGISTER_NAMES */
-- 
1.8.3.1

Re: [PATCH] PR62120

2014-09-01 Thread Ilya Tocar

 Please mention the PR in the ChangeLog entry and add some testcases
 (can be gcc.target/i386/, but we should have it tested).
 Does this change anything on say register short sil __asm (sil); in 32-bit
 mode (when it IMHO should be rejected too?)?

Do we support sil at all? In i386.h i see:

/* Note we are omitting these since currently I don't know how
to get gcc to use these, since they want the same but different
number as al, and ax.
*/
#define QI_REGISTER_NAMES \
{al, dl, cl, bl, sil, dil, bpl, spl,}

And gcc doesn't recognize sil.

Added testcase, and fixed avx512f-additional-reg-names.c to be valid on
32 bits. Ok for trunk?

gcc/

2014-09-01  Ilya Tocar  ilya.to...@intel.com

   PR middle-end/62120
   * varasm.c (decode_reg_name_and_count): Check availability for
   registers from ADDITIONAL_REGISTER_NAMES.

Testsuite/

2014-09-01  Ilya Tocar  ilya.to...@intel.com

   PR middle-end/62120
   * gcc.target/i386/avx512f-additional-reg-names.c: Use register vaild
   in 32-bit mode.
   * gcc.target/i386/pr62120.c: New.

---
 gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr62120.c  | 7 +++
 gcc/varasm.c | 5 +++--
 3 files changed, 11 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr62120.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c 
b/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
index 164a1de..98a9052 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
@@ -3,7 +3,7 @@
 
 void foo ()
 {
-  register int zmm_var asm (zmm9) __attribute__((unused));
+  register int zmm_var asm (zmm7) __attribute__((unused));
 
   __asm__ __volatile__(vxorpd %%zmm0, %%zmm0, %%zmm7\n : : : zmm7 );
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr62120.c 
b/gcc/testsuite/gcc.target/i386/pr62120.c
new file mode 100644
index 000..8870d48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr62120.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options -mno-sse } */
+
+void foo ()
+{
+  register int zmm_var asm (ymm9);/* { dg-error invalid register name } */
+}
diff --git a/gcc/varasm.c b/gcc/varasm.c
index de4479c..9638665 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -888,7 +888,7 @@ decode_reg_name_and_count (const char *asmspec, int *pnregs)
   if (asmspec[0] != 0  i  0)
{
  i = atoi (asmspec);
- if (i  FIRST_PSEUDO_REGISTER  i = 0)
+ if (i  FIRST_PSEUDO_REGISTER  i = 0  reg_names[i][0])
return i;
  else
return -2;
@@ -925,7 +925,8 @@ decode_reg_name_and_count (const char *asmspec, int *pnregs)
 
for (i = 0; i  (int) ARRAY_SIZE (table); i++)
  if (table[i].name[0]
-  ! strcmp (asmspec, table[i].name))
+  ! strcmp (asmspec, table[i].name)
+  reg_names[table[i].number][0])
return table[i].number;
   }
 #endif /* ADDITIONAL_REGISTER_NAMES */
-- 
1.8.3.1

[COMMITTED] Add myself to MAINTAINERS file (Write After Approval)

2014-08-15 Thread Ilya Tocar

Hi,

This patch adds myself to the MAINTAINERS file.  Commmitted as 214012. 

---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 87fb9dd..a40a537 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -555,6 +555,7 @@ Dinar Temirbulatov  
dtemirbula...@gmail.com
 Kresten Krab Thorupk...@gcc.gnu.org
 Caroline Tice  cmt...@google.com
 Kyrylo Tkachov kyrylo.tkac...@arm.com
+Tocar Ilya toca...@gmail.com
 Konrad Trifunovic  konrad.trifuno...@inria.fr
 Markus Trippelsdorfmar...@trippelsdorf.de
 David Ung  dav...@mips.com
-- 
1.8.3.1

Re: [COMMITTED] Add myself to MAINTAINERS file (Write After Approval)

2014-08-15 Thread Ilya Tocar

  This patch adds myself to the MAINTAINERS file.  Commmitted as 214012.
 
 Please keep this list sorted alphabetically.

Sorry attached wrong version of the patch.
Actually commited vesrion (rev 214012), has alphabetical order

Index: ChangeLog
===
--- ChangeLog   (revision 214011)
+++ ChangeLog   (revision 214012)
@@ -1,3 +1,7 @@
+2014-08-15  Ilya Tocar  toca...@gmail.com 
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2014-08-01  Jiong Wang  jiong.w...@arm.com
 
* MAINTAINERS (Write After Approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 214011)
+++ MAINTAINERS (revision 214012)
@@ -555,6 +555,7 @@
 Kresten Krab Thorupk...@gcc.gnu.org
 Caroline Tice  cmt...@google.com
 Kyrylo Tkachov kyrylo.tkac...@arm.com
+Ilya Tocar toca...@gmail.com
 Konrad Trifunovic  konrad.trifuno...@inria.fr
 Markus Trippelsdorfmar...@trippelsdorf.de
 David Ung  dav...@mips.com

Re: [PATCH] Warn about unclosed pragma omp declare target.

2014-08-15 Thread Ilya Tocar

Ping.

On 29 Jul 18:45, Ilya Tocar wrote:
 Hi,
 
 As discussed here in https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
 Gcc should complain about pragma omp declare target without
 corresponding pragma omp end declare target. This patch adds a warning
 for those cases.
 Bootstraps/passes make-check.
 Ok for trunk?
 
 ChangeLog:
 
 2014-07-29  Ilya Tocar  ilya.to...@intel.com
 
   * c-decl.c (omp_declare_target_location_stack): New.
   * c-lang.h (omp_declare_target_location_stack): Declare.
   * c-parser.c (warn_unclosed_pragma_omp_target): New.
   (c_parser_translation_unit): Call it.
   (c_parser_omp_declare_target): Remeber location.
   (c_parser_omp_end_declare_target): Forget location.
 
 And ChangeLog for testsuite:
 
 2014-07-29  Ilya Tocar  ilya.to...@intel.com
 
   * gcc.dg/gomp//target-3.c: New testcase.
 
 ---
  gcc/c/c-decl.c   |  3 +++
  gcc/c/c-lang.h   |  3 +++
  gcc/c/c-parser.c | 22 +-
  gcc/testsuite/gcc.dg/gomp/target-3.c | 33 +
  4 files changed, 60 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
 
 diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
 index 2a4b439..2dd5b2c 100644
 --- a/gcc/c/c-decl.c
 +++ b/gcc/c/c-decl.c
 @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = VOIDmode;
  /* If non-zero, implicit omp declare target attribute is added into the
 attribute lists.  */
  int current_omp_declare_target_attribute;
 +
 +/* Holds locations of currently open omp declare target pragmas.  */
 +veclocation_t omp_declare_target_location_stack;
  
  /* Each c_binding structure describes one binding of an identifier to
 a decl.  All the decls in a scope - irrespective of namespace - are
 diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
 index e974906..cef995c 100644
 --- a/gcc/c/c-lang.h
 +++ b/gcc/c/c-lang.h
 @@ -59,4 +59,7 @@ struct GTY(()) language_function {
 attribute lists.  */
  extern GTY(()) int current_omp_declare_target_attribute;
  
 +/* Holds locations of currently open omp declare target pragmas.  */
 +extern veclocation_t omp_declare_target_location_stack;
 +
  #endif /* ! GCC_C_LANG_H */
 diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
 index e32bf04..0b96fe9 100644
 --- a/gcc/c/c-parser.c
 +++ b/gcc/c/c-parser.c
 @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd (c_parser *, enum 
 pragma_context);
  static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
  static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
  
 +static void warn_unclosed_pragma_omp_target ();
 +
  /* Parse a translation unit (C90 6.7, C99 6.9).
  
 translation-unit:
 @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
   }
while (c_parser_next_token_is_not (parser, CPP_EOF));
  }
 +
 +  warn_unclosed_pragma_omp_target ();
  }
  
  /* Parse an external declaration (C90 6.7, C99 6.9).
 @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser *parser, tree 
 fndecl, tree parms,
  static void
  c_parser_omp_declare_target (c_parser *parser)
  {
 +  location_t loc = c_parser_peek_token (parser)-location;
c_parser_skip_to_pragma_eol (parser);
current_omp_declare_target_attribute++;
 +  omp_declare_target_location_stack.safe_push (loc);
  }
  
  static void
 @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser *parser)
  error_at (loc, %#pragma omp end declare target% without corresponding 
 
  %#pragma omp declare target%);
else
 -current_omp_declare_target_attribute--;
 +{
 +  current_omp_declare_target_attribute--;
 +  omp_declare_target_location_stack.pop ();
 +}
  }
  
  
 @@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, c_parser 
 *parser, tree initial_index,
return value_tree;
  }
  
 +static void
 +warn_unclosed_pragma_omp_target ()
 +{
 +  int i;
 +  for (i = 0; i  current_omp_declare_target_attribute; i++)
 +warning_at (omp_declare_target_location_stack[i], 0,
 + %#pragma omp declare target% without corresponding 
 + %#pragma omp end declare target%);
 +  omp_declare_target_location_stack.release ();
 +}
 +
  #include gt-c-c-parser.h
 diff --git a/gcc/testsuite/gcc.dg/gomp/target-3.c 
 b/gcc/testsuite/gcc.dg/gomp/target-3.c
 new file mode 100644
 index 000..d50604f
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/gomp/target-3.c
 @@ -0,0 +1,33 @@
 +/* { dg-do compile } */
 +/* { dg-options -fopenmp } */
 +
 +#pragma omp declare target
 +int tgtv = 6;
 +
 +int
 +tgt (void)
 +{
 +  tgtv++;
 +  return 0;
 +}
 +#pragma omp end declare target
 +
 +#pragma omp declare target/* { dg-warning '#pragma omp declare 
 target' without corresponding '#pragma omp end declare target' } */
 +int tgtv1 = 6;
 +#pragma omp declare target /* { dg-warning '#pragma omp declare target' 
 without corresponding '#pragma omp

Re: [PATCH] fix hardreg_cprop to honor HARD_REGNO_MODE_OK.

2014-08-14 Thread Ilya Tocar

 I've observed SPEC2006 failure on avx512-vlbwdq branch.
 It was caused by  hardreg_cprop. In maybe_mode_change it was
 assumed, that all values of the same register class and same mode.
 are ok. This is not the case for i386/avx512. We need to honor
 HARD_REGNO_MODE_OK.

 One could argue that having a class where some members are OK for being used
 in a particular mode, but other members are not is the core issue here.
 
 Can you describe a bit about why you've got a class of that nature?
 Background on that would be useful.
 

AVX512 added new 16 xmm registers (xmm16-xmm31).
Those registers require evex encoding.
Only 512-bit wide versions of instructions have evex encoding with
avx512f, but all versions have it with avx512vl.
Most instructions have same macroized pattern for 128/256/512 vector
length. They all use constraint 'v', which corresponds to 
class ALL_SSE_REGS (xmm0 - xmm31). To disallow e. g. xmm20 in
256-bit case (avx512f) and allow it only in avx512vl case we have
HARD_REGNO_MODE_OK checking for regno being evex-only and
disallowing it if mode is not 512-bit.

[PATCH] PR61878

2014-08-13 Thread Ilya Tocar

Hi,

This patch adds missing intrinsics and tests for them.
Ok for trunk?

gcc/ChangeLog:

2014-08-13  Ilya Tocar  ilya.to...@intel.com

* config/i386/avx512fintrin.h (_mm512_mask_cmpge_epi32_mask): New.
(_mm512_mask_cmpge_epu32_mask): Ditto.
(_mm512_cmpge_epu32_mask): Ditto.
(_mm512_mask_cmpge_epi64_mask): Ditto.
(_mm512_cmpge_epi64_mask): Ditto.
(_mm512_mask_cmpge_epu64_mask): Ditto.
(_mm512_cmpge_epu64_mask): Ditto.
(_mm512_mask_cmple_epi32_mask): Ditto.
(_mm512_cmple_epi32_mask): Ditto.
(_mm512_mask_cmple_epu32_mask): Ditto.
(_mm512_cmple_epu32_mask): Ditto.
(_mm512_mask_cmple_epi64_mask): Ditto.
(_mm512_cmple_epi64_mask): Ditto.
(_mm512_mask_cmple_epu64_mask): Ditto.
(_mm512_cmple_epu64_mask): Ditto.
(_mm512_mask_cmplt_epi32_mask): Ditto.
(_mm512_cmplt_epi32_mask): Ditto.
(_mm512_mask_cmplt_epu32_mask): Ditto.
(_mm512_cmplt_epu32_mask): Ditto.
(_mm512_mask_cmplt_epi64_mask): Ditto.
(_mm512_cmplt_epi64_mask): Ditto.
(_mm512_mask_cmplt_epu64_mask): Ditto.
(_mm512_cmplt_epu64_mask): Ditto.
(_mm512_mask_cmpneq_epi32_mask): Ditto.
(_mm512_mask_cmpneq_epu32_mask): Ditto.
(_mm512_cmpneq_epu32_mask): Ditto.
(_mm512_mask_cmpneq_epi64_mask): Ditto.
(_mm512_cmpneq_epi64_mask): Ditto.
(_mm512_mask_cmpneq_epu64_mask): Ditto.
(_mm512_cmpneq_epu64_mask): Ditto.
(_mm512_castpd_ps): Ditto.
(_mm512_castpd_si512): Ditto.
(_mm512_castps_pd): Ditto.
(_mm512_castps_si512): Ditto.
(_mm512_castsi512_ps): Ditto.
(_mm512_castsi512_pd): Ditto.
(_mm512_castpd512_pd128): Ditto.
(_mm512_castps512_ps128): Ditto.
(_mm512_castsi512_si128): Ditto.
(_mm512_castpd512_pd256): Ditto.
(_mm512_castps512_ps256): Ditto.
(_mm512_castsi512_si256): Ditto.
(_mm512_castpd128_pd512): Ditto.
(_mm512_castps128_ps512): Ditto.
(_mm512_castsi128_si512): Ditto.
(_mm512_castpd256_pd512): Ditto.
(_mm512_castps256_ps512): Ditto.
(_mm512_castsi256_si512): Ditto.
(_mm512_cmpeq_epu32_mask): Ditto.
(_mm512_mask_cmpeq_epu32_mask): Ditto.
(_mm512_mask_cmpeq_epu64_mask): Ditto.
(_mm512_cmpeq_epu64_mask): Ditto.
(_mm512_cmpgt_epu32_mask): Ditto.
(_mm512_mask_cmpgt_epu32_mask): Ditto.
(_mm512_mask_cmpgt_epu64_mask): Ditto.
(_mm512_cmpgt_epu64_mask): Ditto.
* config/i386/i386-builtin-types.def: Add V16SF_FTYPE_V8SF,
V16SI_FTYPE_V8SI, V16SI_FTYPE_V4SI, V8DF_FTYPE_V2DF.
* config/i386/i386.c (enum ix86_builtins): Add
IX86_BUILTIN_SI512_SI256, IX86_BUILTIN_PD512_PD256,
IX86_BUILTIN_PS512_PS256, IX86_BUILTIN_SI512_SI,
IX86_BUILTIN_PD512_PD, IX86_BUILTIN_PS512_PS.
(bdesc_args): Add __builtin_ia32_si512_256si,
__builtin_ia32_ps512_256ps, __builtin_ia32_pd512_256pd,
__builtin_ia32_si512_si, __builtin_ia32_ps512_ps,
__builtin_ia32_pd512_pd.
(ix86_expand_args_builtin): Handle new FTYPEs.
* config/i386/sse.md (castmode): Add 512-bit modes.
(AVX512MODE2P): New.
(avx512f_castmodeavxsizesuffix_castmode): New.
(avx512f_castmodeavxsizesuffix_256castmode): Ditto.

gcc/testsuite/ChangeLog:

2014-08-13  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx512f-typecast-1.c: New test.
* gcc.target/i386/avx512f-vpcmpequd-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpequd-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpequq-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpequq-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpged-1.c: Add new intrinsic.
* gcc.target/i386/avx512f-vpcmpged-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgeq-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgeq-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgeud-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgeud-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgeuq-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgeuq-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgtud-1.c: New test.
* gcc.target/i386/avx512f-vpcmpgtud-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgtuq-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpgtuq-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpled-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpled-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpleq-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpleq-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpleud-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpleud-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpleuq-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpleuq-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpltd-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpltd-2.c: Ditto

[PATCH] fix hardreg_cprop to honor HARD_REGNO_MODE_OK.

2014-08-11 Thread Ilya Tocar

Hi,

I've observed SPEC2006 failure on avx512-vlbwdq branch.
It was caused by  hardreg_cprop. In maybe_mode_change it was
assumed, that all values of the same register class and same mode.
are ok. This is not the case for i386/avx512. We need to honor
HARD_REGNO_MODE_OK.
Patch bellow does it.
Ok for trunk?

2014-08-11  Ilya Tocar  ilya.to...@intel.com

* regcprop.c (maybe_mode_change): Honor HARD_REGNO_MODE_OK.


---
 gcc/regcprop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 932037d..694deb2 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -410,7 +410,7 @@ maybe_mode_change (enum machine_mode orig_mode, enum 
machine_mode copy_mode,
GET_MODE_SIZE (copy_mode)  GET_MODE_SIZE (new_mode))
 return NULL_RTX;
 
-  if (orig_mode == new_mode)
+  if (orig_mode == new_mode  HARD_REGNO_MODE_OK (regno, new_mode))
 return gen_rtx_raw_REG (new_mode, regno);
   else if (mode_change_ok (orig_mode, new_mode, regno))
 {
-- 
1.8.3.1

Fix vec_extract_lo constraint.

2014-08-05 Thread Ilya Tocar

Hi,
I've noticed that vec_extract_lo_modemask_name pattern has
vm/vm alternative when mask is not applied. This can lead to insn
with 2 memory operands. Patch bellow fixes it.
Ok for trunk?

2014-08-05  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/sse.md (vec_extract_lo_modemask_name): Fix
constraint.

---
 gcc/config/i386/sse.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0f7ca27..85f48ab 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5999,9 +5999,9 @@
(set_attr mode sseinsnmode)])
 
 (define_insn vec_extract_lo_modemask_name
-  [(set (match_operand:ssehalfvecmode 0 store_mask_predicate 
=store_mask_constraint)
+  [(set (match_operand:ssehalfvecmode 0 store_mask_predicate 
=v,store_mask_constraint)
(vec_select:ssehalfvecmode
- (match_operand:V8FI 1 nonimmediate_operand vm)
+ (match_operand:V8FI 1 nonimmediate_operand vm,v)
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
   TARGET_AVX512F  !(MEM_P (operands[0])  MEM_P (operands[1]))
-- 
1.8.3.1

Re: Fix vec_extract_lo constraint.

2014-08-05 Thread Ilya Tocar

 I'd suggest op0: =store_mask_constraint,v and op1: v,m. This
 would result in op0:=vm,v op1:v,m and op0:=v,v op1:v,m.
 
 Uros.

Done.

2014-08-05  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/sse.md (vec_extract_lo_modemask_name): Fix
constraint.

---
 gcc/config/i386/sse.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0f7ca27..3337104 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5999,9 +5999,9 @@
(set_attr mode sseinsnmode)])
 
 (define_insn vec_extract_lo_modemask_name
-  [(set (match_operand:ssehalfvecmode 0 store_mask_predicate 
=store_mask_constraint)
+  [(set (match_operand:ssehalfvecmode 0 store_mask_predicate 
=store_mask_constraint,v)
(vec_select:ssehalfvecmode
- (match_operand:V8FI 1 nonimmediate_operand vm)
+ (match_operand:V8FI 1 nonimmediate_operand v,m)
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
   TARGET_AVX512F  !(MEM_P (operands[0])  MEM_P (operands[1]))
-- 
1.8.3.1

[PATCH] Warn about unclosed pragma omp declare target.

2014-07-29 Thread Ilya Tocar

Hi,

As discussed here in https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
Gcc should complain about pragma omp declare target without
corresponding pragma omp end declare target. This patch adds a warning
for those cases.
Bootstraps/passes make-check.
Ok for trunk?

ChangeLog:

2014-07-29  Ilya Tocar  ilya.to...@intel.com

* c-decl.c (omp_declare_target_location_stack): New.
* c-lang.h (omp_declare_target_location_stack): Declare.
* c-parser.c (warn_unclosed_pragma_omp_target): New.
(c_parser_translation_unit): Call it.
(c_parser_omp_declare_target): Remeber location.
(c_parser_omp_end_declare_target): Forget location.

And ChangeLog for testsuite:

2014-07-29  Ilya Tocar  ilya.to...@intel.com

* gcc.dg/gomp//target-3.c: New testcase.

---
 gcc/c/c-decl.c   |  3 +++
 gcc/c/c-lang.h   |  3 +++
 gcc/c/c-parser.c | 22 +-
 gcc/testsuite/gcc.dg/gomp/target-3.c | 33 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 2a4b439..2dd5b2c 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = VOIDmode;
 /* If non-zero, implicit omp declare target attribute is added into the
attribute lists.  */
 int current_omp_declare_target_attribute;
+
+/* Holds locations of currently open omp declare target pragmas.  */
+veclocation_t omp_declare_target_location_stack;
 
 /* Each c_binding structure describes one binding of an identifier to
a decl.  All the decls in a scope - irrespective of namespace - are
diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
index e974906..cef995c 100644
--- a/gcc/c/c-lang.h
+++ b/gcc/c/c-lang.h
@@ -59,4 +59,7 @@ struct GTY(()) language_function {
attribute lists.  */
 extern GTY(()) int current_omp_declare_target_attribute;
 
+/* Holds locations of currently open omp declare target pragmas.  */
+extern veclocation_t omp_declare_target_location_stack;
+
 #endif /* ! GCC_C_LANG_H */
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index e32bf04..0b96fe9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd (c_parser *, enum 
pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
 
+static void warn_unclosed_pragma_omp_target ();
+
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
translation-unit:
@@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
}
   while (c_parser_next_token_is_not (parser, CPP_EOF));
 }
+
+  warn_unclosed_pragma_omp_target ();
 }
 
 /* Parse an external declaration (C90 6.7, C99 6.9).
@@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser *parser, tree 
fndecl, tree parms,
 static void
 c_parser_omp_declare_target (c_parser *parser)
 {
+  location_t loc = c_parser_peek_token (parser)-location;
   c_parser_skip_to_pragma_eol (parser);
   current_omp_declare_target_attribute++;
+  omp_declare_target_location_stack.safe_push (loc);
 }
 
 static void
@@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser *parser)
 error_at (loc, %#pragma omp end declare target% without corresponding 
   %#pragma omp declare target%);
   else
-current_omp_declare_target_attribute--;
+{
+  current_omp_declare_target_attribute--;
+  omp_declare_target_location_stack.pop ();
+}
 }
 
 
@@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, c_parser 
*parser, tree initial_index,
   return value_tree;
 }
 
+static void
+warn_unclosed_pragma_omp_target ()
+{
+  int i;
+  for (i = 0; i  current_omp_declare_target_attribute; i++)
+warning_at (omp_declare_target_location_stack[i], 0,
+   %#pragma omp declare target% without corresponding 
+   %#pragma omp end declare target%);
+  omp_declare_target_location_stack.release ();
+}
+
 #include gt-c-c-parser.h
diff --git a/gcc/testsuite/gcc.dg/gomp/target-3.c 
b/gcc/testsuite/gcc.dg/gomp/target-3.c
new file mode 100644
index 000..d50604f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-3.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options -fopenmp } */
+
+#pragma omp declare target
+int tgtv = 6;
+
+int
+tgt (void)
+{
+  tgtv++;
+  return 0;
+}
+#pragma omp end declare target
+
+#pragma omp declare target/* { dg-warning '#pragma omp declare 
target' without corresponding '#pragma omp end declare target' } */
+int tgtv1 = 6;
+#pragma omp declare target   /* { dg-warning '#pragma omp declare target' 
without corresponding '#pragma omp end declare target' } */
+
+int
+tgt2 (void)
+{
+  tgtv1++;
+  return 0;
+}
+
+#pragma omp declare target
+int
+tgt3 (void)
+{
+  tgtv1++;
+  return 0;
+}
+#pragma omp end

Re: [PATCH][x86] Support clflushopt, xsaves, xsavec.

2014-05-13 Thread Ilya Tocar

On 12 May 15:42, Uros Bizjak wrote:
 On Mon, May 12, 2014 at 3:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  This patch add support for xsavec, xsaves ISA extensions, introduced in
  [1], and clflushopt introduced in [2].
 
  [1]http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
  [2]http://software.intel.com/en-us/file/319433-018pdf
 
  Bootstraps, passes make-check.
 
 Please also add new options to g++.dg/other/i386-{2,3}.C and
 gcc.target/i386/sse-{14,15,22,23}.c.
 
 Uros.

Done.
Looks like sse-15 doesn't need new options, I've assumed sse-12/13.

Changelog:

2014-05-12  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/i386-common.c
(OPTION_MASK_ISA_CLFLUSHOPT_SET): Define.
(OPTION_MASK_ISA_XSAVES_SET): Ditto.
(OPTION_MASK_ISA_XSAVEC_SET): Ditto.
(OPTION_MASK_ISA_CLFLUSHOPT_UNSET): Ditto.
(OPTION_MASK_ISA_XSAVES_UNSET): Ditto.
(OPTION_MASK_ISA_XSAVEC_UNSET): Ditto.
(ix86_handle_option): Handle OPT_mxsavec, OPT_mxsaves,
OPT_mclflushopt.
* config.gcc (i[34567]86-*-*): Add clflushoptintrin.h,
xsavecintrin.h, xsavesintrin.h.
(x86_64-*-*): Ditto.
* config/i386/clflushoptintrin.h: New.
* config/i386/xsavecintrin.h: Ditto.
* config/i386/xsavesintrin.h: Ditto.
* config/i386/cpuid.h (bit_CLFLUSHOPT): Define.
(bit_XSAVES): Ditto.
(bit_XSAVES): Ditto.
* config/i386/driver-i386.c (host_detect_local_cpu): Handle
-mclflushopt, -mxsavec, -mxsaves, -mno-xsaves, -mno-xsavec,
-mno-clflushopt.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_CLFLUSHOPT, OPTION_MASK_ISA_XSAVEC,
OPTION_MASK_ISA_XSAVES.
* config/i386/i386.c (ix86_target_string): Handle -mclflushopt,
-mxsavec, -mxsaves.
(PTA_CLFLUSHOPT) Define.
(PTA_XSAVEC): Ditto.
(PTA_XSAVES): Ditto.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_builtins): Add IX86_BUILTIN_XSAVEC, IX86_BUILTIN_XSAVEC64,
IX86_BUILTIN_XSAVES, IX86_BUILTIN_XRSTORS, IX86_BUILTIN_XSAVES64,
IX86_BUILTIN_XRSTORS64, IX86_BUILTIN_CLFLUSHOPT.
(bdesc_special_args): Add __builtin_ia32_xsaves, __builtin_ia32_xrstors,
__builtin_ia32_xsavec, __builtin_ia32_xsaves64, 
__builtin_ia32_xrstors64,
__builtin_ia32_xsavec64.
(ix86_init_mmx_sse_builtins): Add __builtin_ia32_clflushopt.
(ix86_expand_builtin): Handle new builtins.
* config/i386/i386.h (TARGET_CLFLUSHOPT) Define.
(TARGET_CLFLUSHOPT_P): Ditto.
(TARGET_XSAVEC): Ditto.
(TARGET_XSAVEC_P): Ditto.
(TARGET_XSAVES): Ditto.
(TARGET_XSAVES_P): Ditto.
* config/i386/i386.md (ANY_XSAVE): Add UNSPECV_XSAVEC, UNSPECV_XSAVES.
(ANY_XSAVE64) Add UNSPECV_XSAVEC64, UNSPECV_XSAVES64.
(attr xsave): Add xsavec, xsavec64, xsaves, xsaves64.
(ANY_XRSTOR): New.
(ANY_XRSTOR64): Ditto.
(xrstor): Ditto.
(xrstor): Change into xrstor.
(xrstor_rex64): Change into xrstor_rex64.
(xrstor64): Change into xrstor64
(clflushopt): New.
* config/i386/i386.opt (mclflushopt): New.
(mxsavec): Ditto.
(mxsaves): Ditto.
* config/i386/x86intrin.h: Add clflushoptintrin.h, xsavesintrin.h,
xsavecintrin.h.
* doc/invoke.texi: Document new options.

And for tests:

2014-05-12  Ilya Tocar  ilya.to...@intel.com
* gcc.target/i386/clflushopt-1.c: New.
* gcc.target/i386/xsavec-1.c: Ditto.
* gcc.target/i386/xsavec64-1.c: Ditto.
* gcc.target/i386/xsaves-1.c: Ditto.
* gcc.target/i386/xsaves64-1.c: Ditto.
* gcc.target/i386/sse-12.c: Test new options.
* gcc.target/i386/sse-13.c: Ditto. 
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto. 
* gcc.target/i386/sse-23.c: Ditto.
* g++.dg/other/i386-2.C: Ditto. 
* g++.dg/other/i386-3.C: Ditto. 

Updated patch below:

---
 gcc/common/config/i386/i386-common.c | 47 
 gcc/config.gcc   |  6 +-
 gcc/config/i386/clflushoptintrin.h   | 49 
 gcc/config/i386/cpuid.h  |  3 +
 gcc/config/i386/driver-i386.c| 12 +++-
 gcc/config/i386/i386-c.c |  6 ++
 gcc/config/i386/i386.c   | 83 +++-
 gcc/config/i386/i386.h   |  6 ++
 gcc/config/i386/i386.md  | 64 +
 gcc/config/i386/i386.opt | 12 
 gcc/config/i386/x86intrin.h  |  6 ++
 gcc/config/i386/xsavecintrin.h   | 58 +++
 gcc/config/i386/xsavesintrin.h   | 72 
 gcc/doc

[PATCH][x86] Support clflushopt, xsaves, xsavec.

2014-05-12 Thread Ilya Tocar

Hi,

This patch add support for xsavec, xsaves ISA extensions, introduced in
[1], and clflushopt introduced in [2].

[1]http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
[2]http://software.intel.com/en-us/file/319433-018pdf

Bootstraps, passes make-check.
Ok for trunk?

Changelog:

2014-05-12  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/i386-common.c
(OPTION_MASK_ISA_CLFLUSHOPT_SET): Define.
(OPTION_MASK_ISA_XSAVES_SET): Ditto.
(OPTION_MASK_ISA_XSAVEC_SET): Ditto.
(OPTION_MASK_ISA_CLFLUSHOPT_UNSET): Ditto.
(OPTION_MASK_ISA_XSAVES_UNSET): Ditto.
(OPTION_MASK_ISA_XSAVEC_UNSET): Ditto.
(ix86_handle_option): Handle OPT_mxsavec, OPT_mxsaves,
OPT_mclflushopt.
* config.gcc (i[34567]86-*-*): Add clflushoptintrin.h,
xsavecintrin.h, xsavesintrin.h.
(x86_64-*-*): Ditto.
* config/i386/clflushoptintrin.h: New.
* config/i386/xsavecintrin.h: Ditto.
* config/i386/xsavesintrin.h: Ditto.
* config/i386/cpuid.h (bit_CLFLUSHOPT): Define.
(bit_XSAVES): Ditto.
(bit_XSAVES): Ditto.
* config/i386/driver-i386.c (host_detect_local_cpu): Handle
-mclflushopt, -mxsavec, -mxsaves, -mno-xsaves, -mno-xsavec,
-mno-clflushopt.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_CLFLUSHOPT, OPTION_MASK_ISA_XSAVEC,
OPTION_MASK_ISA_XSAVES.
* config/i386/i386.c (ix86_target_string): Handle -mclflushopt,
-mxsavec, -mxsaves.
(PTA_CLFLUSHOPT) Define.
(PTA_XSAVEC): Ditto.
(PTA_XSAVES): Ditto.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_builtins): Add IX86_BUILTIN_XSAVEC, IX86_BUILTIN_XSAVEC64,
IX86_BUILTIN_XSAVES, IX86_BUILTIN_XRSTORS, IX86_BUILTIN_XSAVES64,
IX86_BUILTIN_XRSTORS64, IX86_BUILTIN_CLFLUSHOPT.
(bdesc_special_args): Add __builtin_ia32_xsaves, __builtin_ia32_xrstors,
__builtin_ia32_xsavec, __builtin_ia32_xsaves64, 
__builtin_ia32_xrstors64,
__builtin_ia32_xsavec64.
(ix86_init_mmx_sse_builtins): Add __builtin_ia32_clflushopt.
(ix86_expand_builtin): Handle new builtins.
* config/i386/i386.h (TARGET_CLFLUSHOPT) Define.
(TARGET_CLFLUSHOPT_P): Ditto.
(TARGET_XSAVEC): Ditto.
(TARGET_XSAVEC_P): Ditto.
(TARGET_XSAVES): Ditto.
(TARGET_XSAVES_P): Ditto.
* config/i386/i386.md (ANY_XSAVE): Add UNSPECV_XSAVEC, UNSPECV_XSAVES.
(ANY_XSAVE64) Add UNSPECV_XSAVEC64, UNSPECV_XSAVES64.
(attr xsave): Add xsavec, xsavec64, xsaves, xsaves64.
(ANY_XRSTOR): New.
(ANY_XRSTOR64): Ditto.
(xrstor): Ditto.
(xrstor): Change into xrstor.
(xrstor_rex64): Change into xrstor_rex64.
(xrstor64): Change into xrstor64
(clflushopt): New.
* config/i386/i386.opt (mclflushopt): New.
(mxsavec): Ditto.
(mxsaves): Ditto.
* config/i386/x86intrin.h: Add clflushoptintrin.h, xsavesintrin.h,
xsavecintrin.h.
* doc/invoke.texi: Document new options.

And for tests:

2014-05-12  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/clflushopt-1.c: New.
* gcc.target/i386/xsavec-1.c: Ditto.
* gcc.target/i386/xsavec64-1.c: Ditto.
* gcc.target/i386/xsaves-1.c: Ditto.
* gcc.target/i386/xsaves64-1.c: Ditto.


 gcc/common/config/i386/i386-common.c | 47 
 gcc/config.gcc   |  6 +-
 gcc/config/i386/clflushoptintrin.h   | 49 
 gcc/config/i386/cpuid.h  |  3 +
 gcc/config/i386/driver-i386.c| 12 +++-
 gcc/config/i386/i386-c.c |  6 ++
 gcc/config/i386/i386.c   | 83 +++-
 gcc/config/i386/i386.h   |  6 ++
 gcc/config/i386/i386.md  | 64 +
 gcc/config/i386/i386.opt | 12 
 gcc/config/i386/x86intrin.h  |  6 ++
 gcc/config/i386/xsavecintrin.h   | 58 +++
 gcc/config/i386/xsavesintrin.h   | 72 
 gcc/doc/invoke.texi  |  7 +++
 gcc/testsuite/gcc.target/i386/clflushopt-1.c | 11 
 gcc/testsuite/gcc.target/i386/xsavec-1.c | 11 
 gcc/testsuite/gcc.target/i386/xsavec64-1.c   | 11 
 gcc/testsuite/gcc.target/i386/xsaves-1.c | 13 +
 gcc/testsuite/gcc.target/i386/xsaves64-1.c   | 13 +
 19 files changed, 474 insertions(+), 16 deletions(-)
 create mode 100644 gcc/config/i386/clflushoptintrin.h
 create mode 100644 gcc/config/i386/xsavecintrin.h
 create mode 100644 gcc/config/i386/xsavesintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/clflushopt-1.c
 create mode 100644 gcc

Re: [PATCH] x86: Define _mm_undefined_

2014-03-18 Thread Ilya Tocar

On 17 Mar 22:18, Ulrich Drepper wrote:
 On Mon, Mar 17, 2014 at 7:39 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  undefined is similar in behavior to setzero, but it also clobbers
  flags. Maybe just define it to setzero for now?
 
 
 What do you mean by clobbers flags?  Do you have an example?

I've used follwing example:

#include x86intrin.h

extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_undefined_ps (void)
{
  __m512 __Y;
  __asm__ ( : =x (__Y));
  return __Y;
}


__m512 foo1(__m512 __A)
{
return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
  (__v16sf)
 _mm512_undefined_ps (),
  (__mmask16) -1);
}

__m512 foo2(__m512 __A)
{
return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
  (__v16sf)
 _mm512_setzero_ps (),
  (__mmask16) -1);
}


In foo1 asm statement is expanded into following rtl:

(insn 6 3 7 2 (parallel [
(set (reg:V16SF 87 [ __Y ])
(asm_operands:V16SF () (=x) 0 []
 []
 [] foo.c:8))
(clobber (reg:QI 18 fpsr))
(clobber (reg:QI 17 flags))
]) foo.c:8 -1

As you can see flags are clobbered by asm statement, while in setzero
case (foo2) i have just:
(insn 7 6 8 2 (set (reg:V16SF 88)
(const_vector:V16SF [
(const_double:SF 0.0 [0x0.0p+0])
(const_double:SF 0.0 [0x0.0p+0])
//rest of zeroes skipped.

Re: [PATCH] x86: Define _mm_undefined_

2014-03-17 Thread Ilya Tocar

On 16 Mar 07:12, Ulrich Drepper wrote:
[This patch is so far really meant for commenting. I haven't tested it
at all yet.]

Intel's intrinsic specification includes one set which currently is not
defined in gcc's headers: the _mm*_undefined_* intrinsics.
What specification are talking about? As far as I know they are present
in ICC headers, but not in manuals such as:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
The purpose of these instrinsics (currently three classes, three formats
each) is to create a pseudo-value the compiler does not assume is
uninitialized without incurring any code doing so. The purpose is to
use these intrinsics in places where it is known the value of a register
is never used. This is already important with AVX2 and becomes really
crucial with AVX512.

Currently three different techniques are used:

- _mm*_setzero_*() is used. Even though the XOR operation does not
cost anything it still messes with the instruction scheduling and
more code is generated.

- another parameter is duplicated. This leads most of the time to
one additional move instruction.

- uninitialized variables are used (this is in new AVX512 code). The
compiler should generate warnings for these headers. I haven't
tried it.
Uninitialized variables certainly are bad. Replacing them with
setzero/undefined is a good idea.
Also in most AVX512 cases those values shouldn't be present in code.
They are either optimized away in case of -1 mask or result in
zero-masking being applied. Do you know of any cases where xor is
generated (except for destination in gather/scatter)

Using the _mm*_undefined_*() intrinsics is much cleaner and also
potentially allows to generate better code.

For now the implementation uses an inline asm to suggest to the compiler
that the variable is initialized. This does not prevent a real register
to be allocated for this purpose but it saves the XOR instruction.

The correct and optimal implementation will require a compiler built-in
which will do something different based on how the value is used:

- if the value is never modified then any register should be picked.
In function/intrinsic calls the parameter simply need not be loaded at
all.

- if the value is modified (and allocated to a register or memory
location) no initialization for the variable is needed (equivalent
to the asm now).

The questions are:

- is there interest in adding the necessary compiler built-in?

- if yes, anyone interested in working on this?

- and: is it worth adding a patch like the on here in the meantime?

As it stands now gcc's instrinsics are not complete and programs following
Intel's manuals can fail to compile.

Compatibility with ICC is certainly good. I tried your patch, and
undefined is similar in behavior to setzero, but it also clobbers
flags. Maybe just define it to setzero for now?

2014-03-16 Ulrich Drepper drep...@gmail.com

* config/i386/avxintrin.h (_mm256_undefined_si256): Define.
(_mm256_undefined_ps): Define.
(_mm256_undefined_pd): Define.
* config/i386/emmintrin.h (_mm_undefined_si128): Define.
(_mm_undefined_pd): Define.
* config/i386/xmmintrin.h (_mm_undefined_ps): Define.
* config/i386/avx512fintrin.h (_mm512_undefined_si512): Define.
(_mm512_undefined_ps): Define.
(_mm512_undefined_pd): Define.
Use _mm*_undefined_*.
* config/i386/avx2intrin.h: Use _mm*_undefined_*.

Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-25 Thread Ilya Tocar

On 21 Feb 18:35, Uros Bizjak wrote:
 On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
   Latest version of AVX512 spec
   http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
   Has a few changes.
  
   1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
   We can either support new CPUID or disable PREFETCHWT1 from generating,
   without removing code, and enable it in 4.9.1/latest version.
   I am not sure that adding new -m flag and related stuff this late
   is a good idea. Should still add it?
 
  Please submit the patch anyway. We can relax release constraints on
  non-algorithmic patch a bit, weighting in benefits of having gcc
  release that fully conforms to some published specification.
 
  Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
  and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
  Ok for trunk?
 

  * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
 
 Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
 g++.dg/other/i386-{2,3} and new options to
 gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
 repost the patch.


I've added new switch to those tests. However when I add prefetchwt1
to pragma GCC target (sse) sse-22a.c test fails with:
pmmintrin.h: In function ‘_mm_loaddup_pd’:
emmintrin.h:119:1: error: inlining failed in call to always_inline
‘_mm_load1_pd’: target specific option mismatch

I've checked and this isn't a problem with prefetchwt1. I get the same
error when I add any other option (e. g. sha) to #pragma GCC target (sse).
So I haven't added anything there. As that was the only fail,
I'm reposting this patch.

ChangeLog for GCC:

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
(OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
(ix86_handle_option): Handle OPT_mprefetchwt1.
* config/i386/cpuid.h (bit_PREFETCHWT1): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PREFETCHWT1 CPUID.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_PREFETCHWT1.
* config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
(PTA_PREFETCHWT1): New.
(ix86_option_override_internal): Handle PTA_PREFETCHWT1.
(ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
* config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
  New.
* config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
(*prefetch_avx512pf_mode_: Change into ...
 (*prefetch_prefetchwt1_mode: This.
* config/i386/i386.opt (mprefetchwt1): New.
* config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
(_mm_prefetch): Handle intent to write.
* doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.

ChangeLog for tests:

* gcc.target/i386/avx-1.c: Update __builtin_prefetch.
* gcc.target/i386/prefetchwt1-1.c: New.
* g++.dg/other/i386-2.C: Add new option.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/sse-12.c: Ditto.
* gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option.
* gcc.target/i386/sse-22.c: Add new option.
* gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option.

---
 gcc/common/config/i386/i386-common.c  | 15 +++
 gcc/config/i386/cpuid.h   |  4 
 gcc/config/i386/driver-i386.c |  7 +--
 gcc/config/i386/i386-c.c  |  2 ++
 gcc/config/i386/i386.c|  6 ++
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/i386.md   | 13 ++---
 gcc/config/i386/i386.opt  |  4 
 gcc/config/i386/xmmintrin.h   |  6 --
 gcc/doc/invoke.texi   |  4 +++-
 gcc/testsuite/g++.dg/other/i386-2.C   |  2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |  2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/sse-12.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|  4 ++--
 gcc/testsuite/gcc.target/i386/sse-14.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  4 ++--
 19 files changed, 75 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index b7f9ff6..a6ab555 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET

Re: [PATCH][i386][AVX512] Match latest spec.

2014-02-25 Thread Ilya Tocar

On 20 Feb 17:23, Uros Bizjak wrote:
 On Thu, Feb 20, 2014 at 4:39 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.
 
  2)Currently for scatter/gather prefetches intrinsics we accept 1 as
  possible hint parameter. This is consistent with ICC. However as
  GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC
  (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces
  are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as
  hint. We can either change gather prefetches to accept 1 instead of 3 and
  hope that everyone will use _MM_HINT_T0 and not the raw value, or we can
  change _MM_HINT_T0 to be consistent with ICC. What solution do you
  prefer?
 
 Builtins, including __builtin_prefetch, are considered as internal
 implementation detail, so we can pass to them wharever we like. The
 published interface is in *.h files, and this includes _MM_HINT_T0.
 For now, I suggest to change prefetches, so they will accept
 _MM_HINT_T0, as this is the least invasive change.

Patch bellow changes prefetches to accept 3 (_MM_HINT_T0),
and replaces all hint's values in tests with corresponding _MM_HINT.
Testing passes. Ok for trunk?

ChangeLog:

2014-02-25  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/predicates.md (const1256_operand): Remove.
(const2356_operand): New.
(const_1_to_2_operand): Remove.
* config/i386/sse.md (avx512pf_gatherpfmodesf): Change hint value.
(*avx512pf_gatherpfmodesf_mask): Ditto.
(*avx512pf_gatherpfmodesf): Ditto.
(avx512pf_gatherpfmodedf): Ditto.
(*avx512pf_gatherpfmodedf_mask): Ditto.
(*avx512pf_gatherpfmodedf): Ditto.
(avx512pf_scatterpfmodesf): Ditto.
(*avx512pf_scatterpfmodesf_mask): Ditto.
(*avx512pf_scatterpfmodesf): Ditto.
(avx512pf_scatterpfmodedf): Ditto.
(*avx512pf_scatterpfmodedf_mask): Ditto.
(*avx512pf_scatterpfmodedf): Ditto.
* common/config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET0.

And for tests:

2014-02-25  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx-1.c: Use _MM_HINT_T0 in 
__builtin_ia32_gatherpfdps,
__builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdps,
__builtin_ia32_scatterpfqps, __builtin_ia32_gatherpfdpd,
__builtin_ia32_gatherpfqpd, __builtin_ia32_scatterpfdpd,
__builtin_ia32_scatterpfqpd.
* gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Use enum values instead
of raw ints.
* gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

---
 gcc/config/i386/predicates.md  | 11 ++
 gcc/config/i386/sse.md | 40 +++---
 gcc/config/i386/xmmintrin.h|  1 +
 gcc/testsuite/gcc.target/i386/avx-1.c  | 16 -
 .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf0dps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf0qps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1dps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vgatherpf1qps-1.c |  2 +-
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf0dps-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf0qps-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1dps-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c|  4 +--
 .../gcc.target/i386/avx512pf-vscatterpf1qps-1.c|  4 +--
 gcc/testsuite/gcc.target/i386/sse-14.c | 16

Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-21 Thread Ilya Tocar

  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.
 
  1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
  We can either support new CPUID or disable PREFETCHWT1 from generating,
  without removing code, and enable it in 4.9.1/latest version.
  I am not sure that adding new -m flag and related stuff this late
  is a good idea. Should still add it?
 
 Please submit the patch anyway. We can relax release constraints on
 non-algorithmic patch a bit, weighting in benefits of having gcc
 release that fully conforms to some published specification.

Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
Ok for trunk?

ChangeLog:

2014-02-21  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
(OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
(ix86_handle_option): Handle OPT_mprefetchwt1.
* config/i386/cpuid.h (bit_PREFETCHWT1): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PREFETCHWT1 CPUID.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_PREFETCHWT1.
* config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
(PTA_PREFETCHWT1): New.
(ix86_option_override_internal): Handle PTA_PREFETCHWT1.
(ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
* config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
  New.
* config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
(*prefetch_avx512pf_mode_: Change into ...
 (*prefetch_prefetchwt1_mode: This.
* config/i386/i386.opt (mprefetchwt1): New.
* config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
(_mm_prefetch): Handle intent to write.
* doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. 

And for tests:

2014-02-22  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx-1.c: Update __builtin_prefetch.
* gcc.target/i386/prefetchwt1-1.c: New.
* gcc.target/i386/sse-13.c: Update __builtin_prefetch.
* gcc.target/i386/sse-23.c: Ditto. 

---
 gcc/common/config/i386/i386-common.c  | 15 +++
 gcc/config/i386/cpuid.h   |  4 
 gcc/config/i386/driver-i386.c |  7 +--
 gcc/config/i386/i386-c.c  |  2 ++
 gcc/config/i386/i386.c|  6 ++
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/i386.md   | 13 ++---
 gcc/config/i386/i386.opt  |  4 
 gcc/config/i386/xmmintrin.h   |  6 --
 gcc/doc/invoke.texi   |  4 +++-
 gcc/testsuite/gcc.target/i386/avx-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/sse-13.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  2 +-
 14 files changed, 68 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index b7f9ff6..a6ab555 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_SET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -154,6 +155,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_UNSET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_UNSET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
as -mno-sse4.1. */
@@ -757,6 +759,19 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
+case OPT_mprefetchwt1:
+  if (value)
+   {
+ opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+   }
+  else
+   {
+ opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+   }
+  return true;
+
   /* Comes from final.c -- no real reason to change it.  */
 #define MAX_CODE_ALIGN 16
 
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index c7a53dd..8c323ae 100644
--- a/gcc/config

[PATCH][i386][AVX512] Match latest spec.

2014-02-20 Thread Ilya Tocar

Hi,
Latest version of AVX512 spec
http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
Has a few changes.
This patch fixes first of them:
Vptestnmd and vptestnmq instructions now have CPUID AVX512F instead of
AVX512CD. This path changes thier CPUID accordingly.
However I have a question about other changes:

1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
We can either support new CPUID or disable PREFETCHWT1 from generating,
without removing code, and enable it in 4.9.1/latest version.
I am not sure that adding new -m flag and related stuff this late
is a good idea. Should still add it?

2)Currently for scatter/gather prefetches intrinsics we accept 1 as
possible hint parameter. This is consistent with ICC. However as
GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces
are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as
hint. We can either change gather prefetches to accept 1 instead of 3 and
hope that everyone will use _MM_HINT_T0 and not the raw value, or we can
change _MM_HINT_T0 to be consistent with ICC. What solution do you
prefer?

Patch bellow changes CPUID of vptestnmq/vptestnmd and changes some bogus
%v to v. Bootstraps, passes make check. Ok for trunk?

ChangeLog

2014-02-20  Ilya Tocar  ilya.to...@intel.com
 
* config/i386/avx512fintrin.h (_mm512_testn_epi32_mask),
(_mm512_mask_testn_epi32_mask), (_mm512_testn_epi64_mask),
(_mm512_mask_testn_epi64_mask): Move to ...
* config/i386/avx512cdintrin.h: Here.
* config/i386/i386.c (bdesc_args): Change MASK_ISA for testnm.
* config/i386/sse.md (avx512f_vmscalefmoderound_name): Remove %.
(avx512f_scalefmodemask_nameround_name): Ditto.
(avx512f_testnmmode3mask_scalar_merge_name): Change conditon to
TARGET_AVX512F from TARGET_AVX512CD.

And for testsuite

2014-02-20  Ilya Tocar  ilya.to...@intel.com
 
* gcc.target/i386/avx512cd-vptestnmd-1.c: Change into ...
* gcc.target/i386/avx512f-vptestnmd-1.c: This.
* gcc.target/i386/avx512cd-vptestnmq-1.c: Change into ...
* gcc.target/i386/avx512f-vptestnmq-1.c: This.
* gcc.target/i386/avx512cd-vptestnmd-2.c: Change into ...
* gcc.target/i386/avx512f-vptestnmd-2.c: This.
* gcc.target/i386/avx512cd-vptestnmq-2.c: Change into ...
* gcc.target/i386/avx512f-vptestnmq-2.c: This.


---
 gcc/config/i386/avx512cdintrin.h   | 34 --
 gcc/config/i386/avx512fintrin.h| 34 ++
 gcc/config/i386/i386.c |  4 +-
 gcc/config/i386/sse.md |  8 ++--
 .../gcc.target/i386/avx512cd-vptestnmd-1.c | 16 ---
 .../gcc.target/i386/avx512cd-vptestnmd-2.c | 52 --
 .../gcc.target/i386/avx512cd-vptestnmq-1.c | 16 ---
 .../gcc.target/i386/avx512cd-vptestnmq-2.c | 52 --
 .../gcc.target/i386/avx512f-vptestnmd-1.c  | 16 +++
 .../gcc.target/i386/avx512f-vptestnmd-2.c  | 52 ++
 .../gcc.target/i386/avx512f-vptestnmq-1.c  | 16 +++
 .../gcc.target/i386/avx512f-vptestnmq-2.c  | 52 ++
 12 files changed, 176 insertions(+), 176 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmd-1.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmd-2.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmq-1.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-vptestnmq-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vptestnmq-2.c

diff --git a/gcc/config/i386/avx512cdintrin.h b/gcc/config/i386/avx512cdintrin.h
index 3935b77..a4939f7a 100644
--- a/gcc/config/i386/avx512cdintrin.h
+++ b/gcc/config/i386/avx512cdintrin.h
@@ -176,40 +176,6 @@ _mm512_broadcastmw_epi32 (__mmask16 __A)
   return (__m512i) __builtin_ia32_broadcastmw512 (__A);
 }
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
-{
-  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
-(__v16si) __B,
-(__mmask16) -1);
-}
-
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
-{
-  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
-(__v16si) __B, __U);
-}
-
-extern __inline __mmask8
-__attribute__

[PATCH][i386][AVX512] PR60204 - update abi for large structs.

2014-02-19 Thread Ilya Tocar

Hi everyone,
As AVX512 abi for passing/returing structs was recently changed in
https://github.com/hjl-tools/x86-64-psABI/commit/6d7ccd614fe67111d2aecec853c3df0310b372d2
We need to update GCC accordingly. This patch does it.
It bootstraps, passes make check (including updated abi tests), spec2006
is ok. Ok for trunk?
ChangeLog bellow:

2014-02-19  Ilya Tocar  ilya.to...@intel.com

* config/i386/i386.c (classify_argument): Update to reflect abi fix.

And for testsuite:

2014-02-19  Ilya Tocar  ilya.to...@intel.com

* gcc.target/x86_64/abi/avx512f/test_passing_structs.c: Update to
reflect abi fix.
* gcc.target/x86_64/abi/avx512f/test_passing_unions.c: Ditto.

---
 gcc/config/i386/i386.c |  4 +-
 .../x86_64/abi/avx512f/test_passing_structs.c  | 12 +---
 .../x86_64/abi/avx512f/test_passing_unions.c   | 78 +++---
 3 files changed, 12 insertions(+), 82 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index acfc021..2d16fb9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6431,8 +6431,8 @@ classify_argument (enum machine_mode mode, const_tree 
type,
   tree field;
   enum x86_64_reg_class subclasses[MAX_CLASSES];
 
-  /* On x86-64 we pass structures larger than 32 bytes on the stack.  */
-  if (bytes  32)
+  /* On x86-64 we pass structures larger than 64 bytes on the stack.  */
+  if (bytes  64)
return 0;
 
   for (i = 0; i  words; i++)
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_structs.c 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_structs.c
index a5e1477..8daa676 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_structs.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_structs.c
@@ -26,16 +26,8 @@ check_struct_passing1 (struct m512_struct ms1 
ATTRIBUTE_UNUSED,
   struct m512_struct ms7 ATTRIBUTE_UNUSED,
   struct m512_struct ms8 ATTRIBUTE_UNUSED)
 {
-  /* Check the passing on the stack by comparing the address of the
- stack elements to the expected place on the stack.  */
-  assert ((unsigned long)ms1.x == rsp+8);
-  assert ((unsigned long)ms2.x == rsp+72);
-  assert ((unsigned long)ms3.x == rsp+136);
-  assert ((unsigned long)ms4.x == rsp+200);
-  assert ((unsigned long)ms5.x == rsp+264);
-  assert ((unsigned long)ms6.x == rsp+328);
-  assert ((unsigned long)ms7.x == rsp+392);
-  assert ((unsigned long)ms8.x == rsp+456);
+  /* Check register contents.  */
+  check_m512_arguments;
 }
 
 void
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_unions.c 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_unions.c
index 9712290..370d15b6 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_unions.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_unions.c
@@ -52,24 +52,8 @@ check_union_passing1(union un1 u1 ATTRIBUTE_UNUSED,
 union un1 u7 ATTRIBUTE_UNUSED,
 union un1 u8 ATTRIBUTE_UNUSED)
 {
-   /* Check the passing on the stack by comparing the address of the
-  stack elements to the expected place on the stack.  */
-  assert ((unsigned long)u1.x == rsp+8);
-  assert ((unsigned long)u1.f == rsp+8);
-  assert ((unsigned long)u2.x == rsp+72);
-  assert ((unsigned long)u2.f == rsp+72);
-  assert ((unsigned long)u3.x == rsp+136);
-  assert ((unsigned long)u3.f == rsp+136);
-  assert ((unsigned long)u4.x == rsp+200);
-  assert ((unsigned long)u4.f == rsp+200);
-  assert ((unsigned long)u5.x == rsp+264);
-  assert ((unsigned long)u5.f == rsp+264);
-  assert ((unsigned long)u6.x == rsp+328);
-  assert ((unsigned long)u6.f == rsp+328);
-  assert ((unsigned long)u7.x == rsp+392);
-  assert ((unsigned long)u7.f == rsp+392);
-  assert ((unsigned long)u8.x == rsp+456);
-  assert ((unsigned long)u8.f == rsp+456);
+  /* Check register contents.  */
+  check_m512_arguments;
 }
 
 void
@@ -82,24 +66,8 @@ check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED,
 union un2 u7 ATTRIBUTE_UNUSED,
 union un2 u8 ATTRIBUTE_UNUSED)
 {
-   /* Check the passing on the stack by comparing the address of the
-  stack elements to the expected place on the stack.  */
-  assert ((unsigned long)u1.x == rsp+8);
-  assert ((unsigned long)u1.d == rsp+8);
-  assert ((unsigned long)u2.x == rsp+72);
-  assert ((unsigned long)u2.d == rsp+72);
-  assert ((unsigned long)u3.x == rsp+136);
-  assert ((unsigned long)u3.d == rsp+136);
-  assert ((unsigned long)u4.x == rsp+200);
-  assert ((unsigned long)u4.d == rsp+200);
-  assert ((unsigned long)u5.x == rsp+264);
-  assert ((unsigned long)u5.d == rsp+264);
-  assert ((unsigned long)u6.x == rsp+328);
-  assert ((unsigned long)u6.d == rsp+328);
-  assert ((unsigned long)u7.x == rsp+392);
-  assert ((unsigned long)u7.d == rsp+392);
-  assert ((unsigned long)u8.x == rsp+456

Re: [PATCH][testsuite] Avoid division by zero.

2014-01-31 Thread Ilya Tocar

On 30 Jan 19:24, Uros Bizjak wrote:
 On Thu, Jan 30, 2014 at 5:41 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  This patch removes possible division by zero.
  Make check passes. Ok for trunk?
 
  2014-01-30  Ilya Tocar  ilya.to...@intel.com
 
  * gcc.target/i386/m512-check.h: Use correct rounding values.
 
  ---
   gcc/testsuite/gcc.target/i386/m512-check.h | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
 
  diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h 
  b/gcc/testsuite/gcc.target/i386/m512-check.h
  index 3209039..8441784 100644
  --- a/gcc/testsuite/gcc.target/i386/m512-check.h
  +++ b/gcc/testsuite/gcc.target/i386/m512-check.h
  @@ -58,7 +58,8 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE 
  *v,  \
  \
 for (i = 0; i  ARRAY_SIZE (u.a); i++)   \
   {  \
  -  VALUE_TYPE rel_err = (u.a[i] - v[i]) / v[i]; \
  +  VALUE_TYPE rel_err;  \
  +  rel_err = v[i] != 0 ? (u.a[i] - v[i]) / v[i] : u.a[i];   \
 if (((rel_err  0) ? -rel_err : rel_err)  eps)  \
  {   \
err++;\
 
 We won't get zero from exponential function, so expecting zero result
 is flawed anyway.
 
 If we would like to introduce universal epsilon comparisons into the
 testsuite, then please read [1]. Being overly pedantic, the definition
 should be |(v[i] - u.a[i]) / v[i]|, as stated in [2].
 
 [1] 
 http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
 [2] http://en.wikipedia.org/wiki/Relative_error


We get zero from testing zero-masking. Currently we produce 0/0 = NaN.
Comparison with NaN is always false, so tests pass. But I think that
this should be fixed to avoid division by zero. As for being more
pedantic about comparison, I doubt that its useful, when we use
0.0001 as eps.

Re: [PATCH][testsuite] Avoid division by zero.

2014-01-31 Thread Ilya Tocar

  We won't get zero from exponential function, so expecting zero result
  is flawed anyway.
 
  If we would like to introduce universal epsilon comparisons into the
  testsuite, then please read [1]. Being overly pedantic, the definition
  should be |(v[i] - u.a[i]) / v[i]|, as stated in [2].
 
  [1] 
  http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
  [2] http://en.wikipedia.org/wiki/Relative_error
 
 
  We get zero from testing zero-masking. Currently we produce 0/0 = NaN.
  Comparison with NaN is always false, so tests pass. But I think that
  this should be fixed to avoid division by zero. As for being more
  pedantic about comparison, I doubt that its useful, when we use
  0.0001 as eps.
 
 In this case, please add simple check for zero, with the above
 comment. We don't test exp function, but masking.


Something like this?

---
 gcc/testsuite/gcc.target/i386/m512-check.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h 
b/gcc/testsuite/gcc.target/i386/m512-check.h
index 3209039..a96a103 100644
--- a/gcc/testsuite/gcc.target/i386/m512-check.h
+++ b/gcc/testsuite/gcc.target/i386/m512-check.h
@@ -58,6 +58,16 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE *v, 
\
\
   for (i = 0; i  ARRAY_SIZE (u.a); i++)   \
 {  \
+  /* We will always have v[i] == 0 == u.a[i]  for some i,  \
+ when we test zero-masking.  */\
+  if (v[i] == 0.0  u.a[i] == 0.0)\
+   continue;   \
+  if (v[i] == 0.0  u.a[i] != 0.0)\
+   {   \
+ err++;\
+ PRINTF (%i:  FMT  !=  FMT \n,   \
+ i, v[i], u.a[i]); \
+   }   \
   VALUE_TYPE rel_err = (u.a[i] - v[i]) / v[i]; \
   if (((rel_err  0) ? -rel_err : rel_err)  eps)  \
{   \
-- 
1.8.3.1

[PATCH][AVX512] Swap Yk and k constraints.

2014-01-30 Thread Ilya Tocar

Hi,

Turns out that for Icc meaning of Yk and k constraints
(exposed through inline asm) is opposite to current GCC implementation.
As Icc with such behavior was already releases and GCC wasn't. I propose
to swap meaning of Yk and k constraints. Changes are pretty mechanical.
Bootstraps/passes make check/SPEC2006. Ok for trunk?

Here is ChangeLog:

2014-01-30  Ilya Tocar  ilya.to...@intel.com

* config/i386/constraints.md (Yk): Swap meaning with k.
* config/i386/i386.md (movhi_internal): Change Yk to k.
(movqi_internal): Ditto.
(*klogicmode): Ditto.
(*andhi_1): Ditto.
(*andqi_1): Ditto.
(kandnmode): Ditto.
(*codehi_1): Ditto.
(*codeqi_1): Ditto.
(kxnormode): Ditto.
(kortestzhi): Ditto.
(kortestchi): Ditto.
(kunpckhi): Ditto.
(*one_cmplhi2_1): Ditto.
(*one_cmplqi2_1): Ditto.
* config/i386/sse.md (): Change k to Yk.
(avx512f_loadmode_mask): Ditto.
(avx512f_blendmmode): Ditto.
(avx512f_storemode_mask): Ditto.
(avx512f_storeussemodesuffix512_mask): Ditto.
(avx512f_storedqumode_mask): Ditto.
(avx512f_cmpmode3mask_scalar_merge_nameround_saeonly_name): Ditto.
(avx512f_ucmpmode3mask_scalar_merge_name): Ditto.
(avx512f_vmcmpmode3round_saeonly_name): Ditto.
(avx512f_vmcmpmode3_maskround_saeonly_name): Ditto.
(avx512f_maskcmpmode3): Ditto.
(avx512f_fmadd_mode_maskround_name): Ditto.
(avx512f_fmadd_mode_mask3round_name): Ditto.
(avx512f_fmsub_mode_maskround_name): Ditto.
(avx512f_fmsub_mode_mask3round_name): Ditto.
(avx512f_fnmadd_mode_maskround_name): Ditto.
(avx512f_fnmadd_mode_mask3round_name): Ditto.
(avx512f_fnmsub_mode_maskround_name): Ditto.
(avx512f_fnmsub_mode_mask3round_name): Ditto.
(avx512f_fmaddsub_mode_maskround_name): Ditto.
(avx512f_fmaddsub_mode_mask3round_name): Ditto.
(avx512f_fmsubadd_mode_maskround_name): Ditto.
(avx512f_fmsubadd_mode_mask3round_name): Ditto.
(avx512f_vextractshuffletype32x4_1_maskm): Ditto.
(vec_extract_lo_mode_maskm): Ditto.
(vec_extract_hi_mode_maskm): Ditto.
(avx512f_vternlogmode_mask): Ditto.
(avx512f_fixupimmmode_maskround_saeonly_name): Ditto.
(avx512f_sfixupimmmode_maskround_saeonly_name): Ditto.
(avx512f_codepmov_src_lowermode2_mask): Ditto.
(avx512f_codev8div16qi2_mask): Ditto.
(avx512f_codev8div16qi2_mask_store): Ditto.
(avx512f_eqmode3mask_scalar_merge_name_1): Ditto.
(avx512f_gtmode3mask_scalar_merge_name): Ditto.
(avx512f_testmmode3mask_scalar_merge_name): Ditto.
(avx512f_testnmmode3mask_scalar_merge_name): Ditto.
(*avx512pf_gatherpfmodesf_mask): Ditto.
(*avx512pf_gatherpfmodedf_mask): Ditto.
(*avx512pf_scatterpfmodesf_mask): Ditto.
(*avx512pf_scatterpfmodedf_mask): Ditto.
(avx512cd_maskb_vec_dupv8di): Ditto.
(avx512cd_maskw_vec_dupv16si): Ditto.
(avx512f_vpermi2varmode3_maskz): Ditto.
(avx512f_vpermi2varmode3_mask): Ditto.
(avx512f_vpermi2varmode3_mask): Ditto.
(avx512f_vpermt2varmode3_maskz): Ditto.
(*avx512f_gathersimode): Ditto.
(*avx512f_gathersimode_2): Ditto.
(*avx512f_gatherdimode): Ditto.
(*avx512f_gatherdimode_2): Ditto.
(*avx512f_scattersimode): Ditto.
(*avx512f_scatterdimode): Ditto.
(avx512f_compressmode_mask): Ditto.
(avx512f_compressstoremode_mask): Ditto.
(avx512f_expandmode_mask): Ditto.
* config/i386/subst.md (mask): Change k to Yk.
(mask_scalar_merge): Ditto.
(sd): Ditto.

And for tests:

2014-01-30  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx512f-inline-asm.c: Swap Yk and k.
* gcc.target/i386/avx512f-kmovw-1.c: Also allow k0.


Patch bellow:

---
 gcc/config/i386/constraints.md |   4 +-
 gcc/config/i386/i386.md|  72 +++---
 gcc/config/i386/sse.md | 110 ++---
 gcc/config/i386/subst.md   |   6 +-
 gcc/testsuite/gcc.target/i386/avx512f-inline-asm.c |   6 +-
 gcc/testsuite/gcc.target/i386/avx512f-kmovw-1.c|   2 +-
 6 files changed, 100 insertions(+), 100 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 0d61c87..65335f1 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -78,10 +78,10 @@
  TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS
  Second from top of 80387 floating-point stack (@code{%st(1)}).)
 
-(define_register_constraint k TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS
+(define_register_constraint Yk TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS
 @internal Any mask register that can be used as predicate, i.e. k1-k7

Re: [PATCH][AVX512] Swap Yk and k constraints.

2014-01-30 Thread Ilya Tocar

2014-01-30 H.J. Lu hjl.to...@gmail.com:
 On Thu, Jan 30, 2014 at 2:54 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 Hi,

 Turns out that for Icc meaning of Yk and k constraints
 (exposed through inline asm) is opposite to current GCC implementation.
 As Icc with such behavior was already releases and GCC wasn't. I propose
 to swap meaning of Yk and k constraints. Changes are pretty mechanical.
 Bootstraps/passes make check/SPEC2006. Ok for trunk?


 Does KNC GCC support Yk/k? How are they handled?

In KNC GCC Yk is MASK_WRITE_REGS and k is MASK_REGS.
So it looks like it's consistent with Icc.

[PATCH][testsuite] Avoid division by zero.

2014-01-30 Thread Ilya Tocar

Hi,
This patch removes possible division by zero.
Make check passes. Ok for trunk?

2014-01-30  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/m512-check.h: Use correct rounding values.

---
 gcc/testsuite/gcc.target/i386/m512-check.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h 
b/gcc/testsuite/gcc.target/i386/m512-check.h
index 3209039..8441784 100644
--- a/gcc/testsuite/gcc.target/i386/m512-check.h
+++ b/gcc/testsuite/gcc.target/i386/m512-check.h
@@ -58,7 +58,8 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE *v,  
\
\
   for (i = 0; i  ARRAY_SIZE (u.a); i++)   \
 {  \
-  VALUE_TYPE rel_err = (u.a[i] - v[i]) / v[i]; \
+  VALUE_TYPE rel_err;  \
+  rel_err = v[i] != 0 ? (u.a[i] - v[i]) / v[i] : u.a[i];   \
   if (((rel_err  0) ? -rel_err : rel_err)  eps)  \
{   \
  err++;\
-- 
1.8.3.1

Re: [PATCH][AVX512] Add forgotten intrinsics.

2014-01-23 Thread Ilya Tocar

   I found out that we forgot to implement some of AVX512 intrinsics.
   Here is a patch that adds them. Sorry for huge patch, but changes are
   mostly trivial.
   Ok for trunk?
...
 
  This is the same as the second alternative of the
  avx512f_codepmov_src_lower2_mask pattern. Please change the above
  into an expander to reuse existing pattern.
 
  Uros.
 
  Fixed.
 
 OK for mainline if tested properly (you didn't say how the patch was tested).

Yeah, it bootstraps, passes make-check, runs spec2006.
I'll ask Kirill to commit, thanks.

Re: [PATCH i386 9/8] [AVX512] Add forgotten kmovw insn, built-in and test.

2013-12-31 Thread Ilya Tocar

 RA figured out that operation with general registers results in less
 moves (you already have x1 in general reg). This is exaclty the reason
 why I think unspecs are not needed. It is the job of the compiler to
 choose most appropriate approach, and its behavior should be adjusted
 with appropriate cost functions.
 
 I guess that if you load x from memory, RA will allocate mask
 registers all the way to add insn.

I tried loading from memory and result is the same. Without unspec this
intrinsic is just return __A and is completely useless. As for RA
choosing best approach, big concern was generating klogic for normal
instructions, so current implementation of masks is conservative and RA
chooses gpr alternatives. So i think, that kmov intrinsic with unspec
has it's uses as a hint to complier. If you are against this approach
here is version without unspec.

---
 gcc/config/i386/avx512fintrin.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 0a43b1e..e0e74cf 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -14826,6 +14826,13 @@ _mm_maskz_fnmsub_ss (__mmask8 __U, __m128 __W, __m128 
__A, __m128 __B)
  _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_kmov (__mmask16 __A)
+{
+  return __A;
+}
+
 #ifdef __DISABLE_AVX512F__
 #undef __DISABLE_AVX512F__
 #pragma GCC pop_options
-- 
1.8.3.1

Re: [PATCH i386 9/8] [AVX512] Add forgotten kmovw insn, built-in and test.

2013-12-30 Thread Ilya Tocar

 You don't need an unspec (or corresponding __builtin), generic movhi
 pattern should be able to generate correct insn.
 
 Uros.

Hi,

Generic movhi genrates simple mov.
Actually the whole purpose of this intrinsic is to let complier know,
that this variable should pe placed on mask register and modified with
klogic instructions.

For example when compiling following with -O2 -mavx512f

bar (short x1,short y1,short z1, short f1)
{
  short x,y,z,f;
  x = _mm512_kmov(x1);
  y = _mm512_kmov(11);
  x ^= y;
  a = _mm512_mask_add_ps (a,x,b,c);
}

Version with movhi produces xorl and no kmovw,
while version with unspec produces kxorw and kmovw.

[PATCH][x86] march aliases

2013-12-20 Thread Ilya Tocar

  Perhaps we should add sandybridge, ivybridge and haswell aliases for
  corei7-avx, core-avx-i, core-avx2?  I mean, it is a nightmare to remember
  which one has the i7 in and which doesn't even for me.
 
 Yes please, I think this is a good idea.

I've added aliases for haswell, sandybridge, ivybridge, bonnell,
nehalem and silvermont.

BTW, I wonder if we add a bunch of new names to the table it isn't a right
time to also introduce macros for some common PTA_* flag combinations,

IMO full list of PTA_* helps quickly identify what is supported.

2013-12-20  Tocar Ilya  ilya.to...@intel.com 

* config/i386/i386.c (ix86_option_override_internal): Add
haswell, ivybridge, sandybridge, nehalem, bonnell, silvermont.
* doc/invoke.texi: Document them.
---
 gcc/config/i386/i386.c | 27 +++
 gcc/doc/invoke.texi| 32 
 2 files changed, 59 insertions(+)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1710e8c..fcf2afe 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3111,9 +3111,17 @@ ix86_option_override_internal (bool main_args_p,
   {core2, PROCESSOR_CORE2, CPU_CORE2,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_CX16 | PTA_FXSR},
+  {nehalem, PROCESSOR_COREI7, CPU_COREI7,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3
+   | PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_POPCNT | PTA_FXSR},
   {corei7, PROCESSOR_COREI7, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3
| PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_POPCNT | PTA_FXSR},
+  {sandybridge, PROCESSOR_COREI7_AVX, CPU_COREI7,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+   | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
+   | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL
+   | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
   {corei7-avx, PROCESSOR_COREI7_AVX, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
@@ -3124,6 +3132,11 @@ ix86_option_override_internal (bool main_args_p,
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+  {ivybridge, PROCESSOR_COREI7_AVX, CPU_COREI7,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+   | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
+   | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
+   | PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
   {core-avx2, PROCESSOR_HASWELL, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
@@ -3131,6 +3144,13 @@ ix86_option_override_internal (bool main_args_p,
| PTA_RDRND | PTA_F16C | PTA_BMI | PTA_BMI2 | PTA_LZCNT
| PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE | PTA_FXSR | PTA_XSAVE
| PTA_XSAVEOPT},
+  {haswell, PROCESSOR_HASWELL, CPU_COREI7,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+   | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
+   | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
+   | PTA_RDRND | PTA_F16C | PTA_BMI | PTA_BMI2 | PTA_LZCNT
+   | PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE | PTA_FXSR | PTA_XSAVE
+   | PTA_XSAVEOPT},
   {broadwell, PROCESSOR_HASWELL, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
@@ -3138,9 +3158,16 @@ ix86_option_override_internal (bool main_args_p,
| PTA_RDRND | PTA_F16C | PTA_BMI | PTA_BMI2 | PTA_LZCNT
| PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE | PTA_FXSR | PTA_XSAVE
| PTA_XSAVEOPT | PTA_ADX | PTA_PRFCHW | PTA_RDSEED},
+  {bonnell, PROCESSOR_ATOM, CPU_ATOM,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+   | PTA_SSSE3 | PTA_CX16 | PTA_MOVBE | PTA_FXSR},
   {atom, PROCESSOR_ATOM, CPU_ATOM,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_CX16 | PTA_MOVBE | PTA_FXSR},
+  {silvermont, PROCESSOR_SLM, CPU_SLM,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3
+   | PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_POPCNT | PTA_AES
+   | PTA_PCLMUL | PTA_RDRND | PTA_MOVBE | PTA_FXSR},
   {slm, PROCESSOR_SLM, CPU_SLM,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3
| PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_POPCNT | PTA_AES
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dcc1893..365ddbf 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14645,19 +14645,41 @@ SSE2 and SSE3 instruction set support.
 Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
 instruction set

[PATCH] Add -march=bdw support

2013-12-19 Thread Ilya Tocar

Hi,
This patch adds march for broadwell cpu.
-march=bdw is the same as -march=core-avx2  but with support for rdseed,
adcx, prefetchw. OK for trunk?

Thanks.

2013-12-19  Tocar Ilya  ilya.to...@intel.com 

* config.gcc: Support march=bdw.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect broadwell.
* config/i386/i386.c (ix86_option_override_internal): Add bdw.
* doc/invoke.texi: Document march=bdw.

---
 gcc/config.gcc| 2 +-
 gcc/config/i386/driver-i386.c | 5 -
 gcc/config/i386/i386.c| 7 +++
 gcc/doc/invoke.texi   | 5 +
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 8464d8f..1edbd4d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3676,7 +3676,7 @@ case ${target} in
| opteron-sse3 | athlon-fx | bdver4 | bdver3 | bdver2 \
| bdver1 | btver2 |  btver1 | amdfam10 | barcelona \
| nocona | core2 | corei7 | corei7-avx | core-avx-i \
-   | core-avx2 | atom | slm)
+   | core-avx2 | bdw | atom | slm)
# OK
;;
*)
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 0b8af3f..6a5c654 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -689,7 +689,10 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  if (arch)
{
  /* This is unknown family 0x6 CPU.  */
- if (has_avx2)
+ if (has_adx)
+   /* Assume Broadwell.  */
+   cpu = bdw;
+ else if (has_avx2)
/* Assume Haswell.  */
cpu = core-avx2;
  else if (has_avx)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cdd63e5..5f2f13b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3136,6 +3136,13 @@ ix86_option_override_internal (bool main_args_p,
| PTA_RDRND | PTA_F16C | PTA_BMI | PTA_BMI2 | PTA_LZCNT
| PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE | PTA_FXSR | PTA_XSAVE
| PTA_XSAVEOPT},
+  {bdw, PROCESSOR_HASWELL, CPU_COREI7,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+   | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
+   | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
+   | PTA_RDRND | PTA_F16C | PTA_BMI | PTA_BMI2 | PTA_LZCNT
+   | PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE | PTA_FXSR | PTA_XSAVE
+   | PTA_XSAVEOPT | PTA_ADX | PTA_PRFCHW | PTA_RDSEED},
   {atom, PROCESSOR_ATOM, CPU_ATOM,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_CX16 | PTA_MOVBE | PTA_FXSR},
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 782a472..0ba03e9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14647,6 +14647,11 @@ Intel Core CPU with 64-bit extensions, MOVBE, MMX, 
SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
 BMI, BMI2 and F16C instruction set support.
 
+@item bdw
+Intel Core CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
+BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW instruction set support.
+
 @item atom
 Intel Atom CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3
 instruction set support.
-- 
1.8.3.1

Re: [PATCH] Add -march=bdw support

2013-12-19 Thread Ilya Tocar

  Why not -march=broadwell instead?
 
 
 If people don't mind long names, broadwell works for me.

Done.

  * config/i386/driver-i386.c (host_detect_local_cpu): Detect broadwell.

^Capital B.
Thanks, fixed.

 Just say Intel Broadwell CPU.
Done. Other options report instruction sets, so i left them.

OK for trunk?

2013-12-19  Tocar Ilya  ilya.to...@intel.com 

* config.gcc: Support march=broadwell.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect Broadwell.
* config/i386/i386.c (ix86_option_override_internal): Add broadwell.
* doc/invoke.texi: Document march=broadwell.

---
 gcc/config.gcc| 2 +-
 gcc/config/i386/driver-i386.c | 4 +++-
 gcc/config/i386/i386.c| 7 +++
 gcc/doc/invoke.texi   | 5 +
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 8464d8f..c066e2a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3676,7 +3676,7 @@ case ${target} in
| opteron-sse3 | athlon-fx | bdver4 | bdver3 | bdver2 \
| bdver1 | btver2 |  btver1 | amdfam10 | barcelona \
| nocona | core2 | corei7 | corei7-avx | core-avx-i \
-   | core-avx2 | atom | slm)
+   | core-avx2 | broadwell | atom | slm)
# OK
;;
*)
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 0b8af3f..26ae601 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -689,7 +689,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  if (arch)
{
  /* This is unknown family 0x6 CPU.  */
- if (has_avx2)
+ if (has_adx)
+   cpu = broadwell;
+ else if (has_avx2)
/* Assume Haswell.  */
cpu = core-avx2;
  else if (has_avx)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f82d1a4..1710e8c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3131,6 +3131,13 @@ ix86_option_override_internal (bool main_args_p,
| PTA_RDRND | PTA_F16C | PTA_BMI | PTA_BMI2 | PTA_LZCNT
| PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE | PTA_FXSR | PTA_XSAVE
| PTA_XSAVEOPT},
+  {broadwell, PROCESSOR_HASWELL, CPU_COREI7,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+   | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
+   | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
+   | PTA_RDRND | PTA_F16C | PTA_BMI | PTA_BMI2 | PTA_LZCNT
+   | PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE | PTA_FXSR | PTA_XSAVE
+   | PTA_XSAVEOPT | PTA_ADX | PTA_PRFCHW | PTA_RDSEED},
   {atom, PROCESSOR_ATOM, CPU_ATOM,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_CX16 | PTA_MOVBE | PTA_FXSR},
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6e888bd..dcc1893 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14663,6 +14663,11 @@ Intel Core CPU with 64-bit extensions, MOVBE, MMX, 
SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
 BMI, BMI2 and F16C instruction set support.
 
+@item broadwell
+Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
+BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW instruction set support.
+
 @item atom
 Intel Atom CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3
 instruction set support.
-- 
1.8.3.1

Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-11-19 Thread Ilya Tocar

On 14 Nov 11:27, Richard Biener wrote:
  +  /* Set when symbol needs to be dumped for lto/offloading.  */
  +  unsigned need_dump : 1;
  +
 
 That's very non-descriptive.  What's offloading?  But yes, something
 like this is what I was asking for.

I've changed it into:
Set when symbol needs to be dumped into LTO bytecode for LTO,
or in pragma omp target case, for separate compilation targeting
a different architecture.

Ok for gomp4 branch now?

2013-11-19 Ilya Tocar  ilya.to...@intel.com 

* cgraph.h (symtab_node): Add need_dump.
* cgraphunit.c (ipa_passes): Run ipa_write_summaries for omp.
(compile): Intialize streamer for omp. 
* ipa-inline-analysis.c (inline_generate_summary): Add flag_openmp.
* lto-cgraph.c (lto_set_symtab_encoder_in_partition): Respect
need_dump flag.
(select_what_to_dump): New.
* lto-streamer.c (section_name_prefix): New.
(lto_get_section_name): Use section_name_prefix.
(lto_streamer_init): Add flag_openmp.
* lto-streamer.h (OMP_SECTION_NAME_PREFIX): New.
(section_name_prefix): Ditto.
(select_what_to_dump): Ditto.
* lto/lto-partition.c (add_symbol_to_partition_1): Set need_dump.
(lto_promote_cross_file_statics): Dump everyhtinh.
* passes.c (ipa_write_summaries): Add parameter,
call select_what_to_dump.
* tree-pass.h (void ipa_write_summaries): Add parameter.


---
 gcc/cgraph.h  |  5 +
 gcc/cgraphunit.c  | 15 +--
 gcc/ipa-inline-analysis.c |  2 +-
 gcc/lto-cgraph.c  | 14 ++
 gcc/lto-streamer.c|  5 +++--
 gcc/lto-streamer.h|  6 ++
 gcc/lto/lto-partition.c   |  3 +++
 gcc/passes.c  |  6 --
 gcc/tree-pass.h   |  2 +-
 9 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index fb0fe93..9f799f4 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -105,6 +105,11 @@ public:
   /* Set when symbol has address taken. */
   unsigned address_taken : 1;
 
+  /* Set when symbol needs to be dumped into LTO bytecode for LTO,
+ or in pragma omp target case, for separate compilation targeting
+ a different architecture.  */
+  unsigned need_dump : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index c3a8967..53cd250 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2019,7 +2019,18 @@ ipa_passes (void)
  passes-all_lto_gen_passes);
 
   if (!in_lto_p)
-ipa_write_summaries ();
+{
+  if (flag_openmp)
+   {
+ section_name_prefix = OMP_SECTION_NAME_PREFIX;
+ ipa_write_summaries (true);
+   }
+  if (flag_lto)
+   {
+ section_name_prefix = LTO_SECTION_NAME_PREFIX;
+ ipa_write_summaries (false);
+   }
+}
 
   if (flag_generate_lto)
 targetm.asm_out.lto_end ();
@@ -2110,7 +2121,7 @@ compile (void)
   cgraph_state = CGRAPH_STATE_IPA;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_lto || flag_openmp)
 lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 4458723..62faa52 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -3813,7 +3813,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
  because edge redirection needs to happen there.  */
-  if (!optimize  !flag_lto  !flag_wpa)
+  if (!optimize  !flag_lto  !flag_wpa  !flag_openmp)
 return;
 
   function_insertion_hook_holder =
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 6a52da8..697c069 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -238,6 +238,9 @@ void
 lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
 symtab_node *node)
 {
+  /* Ignore not needed nodes.  */
+  if (!node-need_dump)
+return;
   int index = lto_symtab_encoder_encode (encoder, node);
   encoder-nodes[index].in_partition = true;
 }
@@ -751,6 +754,17 @@ add_references (lto_symtab_encoder_t encoder,
   lto_symtab_encoder_encode (encoder, ref-referred);
 }
 
+/* Select what needs to be dumped. In lto case dump everything.
+   In omp target case only dump stuff makrked with attribute.  */
+void
+select_what_to_dump (bool is_omp)
+{
+  struct symtab_node *snode;
+  FOR_EACH_SYMBOL(snode)
+snode-need_dump = !is_omp || lookup_attribute (omp declare target,
+   DECL_ATTRIBUTES 
(snode-decl));
+}
+
 /* Find all symbols we want to stream into given partition and insert them
to encoders.
 
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 1540e4c..ffafb0e 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -43,6 +43,7

Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-11-14 Thread Ilya Tocar

On 18 Oct 13:30, Richard Biener wrote:
 Certainly better than the first version.  Jakub should decide for the branch
 and eventually Honza for the merge to trunk.  It still looks somewhat hackish,
 but I suppose that's because we don't have a LTO-state object where we
 can encapsulate all this.
 
 Also I still don't like the attribute lookup
 
  +  /* Ignore non omp target nodes for omp case.  */
  +  if (is_omp  !lookup_attribute (omp declare target,
  +  DECL_ATTRIBUTES (node-symbol.decl)))
  +return;
 
 can we instead please add a flag in cgraph_node?

You mean symtab_node (we also need to dump marked variables)?
Is patch bellow ok for gomp4 branch?

---
 gcc/cgraph.h  |  3 +++
 gcc/cgraphunit.c  | 15 +--
 gcc/ipa-inline-analysis.c |  2 +-
 gcc/lto-cgraph.c  | 14 ++
 gcc/lto-streamer.c|  5 +++--
 gcc/lto-streamer.h|  6 ++
 gcc/lto/lto-partition.c   |  3 +++
 gcc/passes.c  |  6 --
 gcc/tree-pass.h   |  2 +-
 9 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index fb0fe93..601094a 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -105,6 +105,9 @@ public:
   /* Set when symbol has address taken. */
   unsigned address_taken : 1;
 
+  /* Set when symbol needs to be dumped for lto/offloading.  */
+  unsigned need_dump : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index c3a8967..53cd250 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2019,7 +2019,18 @@ ipa_passes (void)
  passes-all_lto_gen_passes);
 
   if (!in_lto_p)
-ipa_write_summaries ();
+{
+  if (flag_openmp)
+   {
+ section_name_prefix = OMP_SECTION_NAME_PREFIX;
+ ipa_write_summaries (true);
+   }
+  if (flag_lto)
+   {
+ section_name_prefix = LTO_SECTION_NAME_PREFIX;
+ ipa_write_summaries (false);
+   }
+}
 
   if (flag_generate_lto)
 targetm.asm_out.lto_end ();
@@ -2110,7 +2121,7 @@ compile (void)
   cgraph_state = CGRAPH_STATE_IPA;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_lto || flag_openmp)
 lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 4458723..62faa52 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -3813,7 +3813,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
  because edge redirection needs to happen there.  */
-  if (!optimize  !flag_lto  !flag_wpa)
+  if (!optimize  !flag_lto  !flag_wpa  !flag_openmp)
 return;
 
   function_insertion_hook_holder =
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 6a52da8..697c069 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -238,6 +238,9 @@ void
 lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
 symtab_node *node)
 {
+  /* Ignore not needed nodes.  */
+  if (!node-need_dump)
+return;
   int index = lto_symtab_encoder_encode (encoder, node);
   encoder-nodes[index].in_partition = true;
 }
@@ -751,6 +754,17 @@ add_references (lto_symtab_encoder_t encoder,
   lto_symtab_encoder_encode (encoder, ref-referred);
 }
 
+/* Select what needs to be dumped. In lto case dump everything.
+   In omp target case only dump stuff makrked with attribute.  */
+void
+select_what_to_dump (bool is_omp)
+{
+  struct symtab_node *snode;
+  FOR_EACH_SYMBOL(snode)
+snode-need_dump = !is_omp || lookup_attribute (omp declare target,
+   DECL_ATTRIBUTES 
(snode-decl));
+}
+
 /* Find all symbols we want to stream into given partition and insert them
to encoders.
 
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 1540e4c..ffafb0e 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -43,6 +43,7 @@ struct lto_stats_d lto_stats;
 static bitmap_obstack lto_obstack;
 static bool lto_obstack_initialized;
 
+const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
 
 /* Return a string representing LTO tag TAG.  */
 
@@ -172,7 +173,7 @@ lto_get_section_name (int section_type, const char *name, 
struct lto_file_decl_d
 sprintf (post, . HOST_WIDE_INT_PRINT_HEX_PURE, f-id);
   else
 sprintf (post, . HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
+  return concat (section_name_prefix, sep, add, post, NULL);
 }
 
 
@@ -310,7 +311,7 @@ lto_streamer_init (void)
 bool
 gate_lto_out (void)
 {
-  return ((flag_generate_lto || in_lto_p)
+  return ((flag_generate_lto || in_lto_p || flag_openmp)
  /* Don't bother doing anything if the program has errors.  */

Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-10-17 Thread Ilya Tocar

Ping.

On 09 Oct 19:12, Ilya Tocar wrote:
 Ping.
 
 On 03 Oct 20:05, Ilya Tocar wrote:
  On 26 Sep 21:21, Ilya Tocar wrote:
   On 25 Sep 15:48, Richard Biener wrote:
On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com 
wrote:
 On 24 Sep 11:02, Richard Biener wrote:
 On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar 
 tocarip.in...@gmail.com wrote:
  thus consider assigning the section
 name in a different place.

 Richard.

 What do you mean  by different place?
 I can add global dumping_omp_target variable to choose correct name,
 depending on it's value (patch below). Is it better?

More like passing down a different abstraction, like for

 @@ -907,9 +907,15 @@ output_symtab (void)
  {
symtab_node node = lto_symtab_encoder_deref (encoder, i);
if (cgraph_node *cnode = dyn_cast cgraph_node (node))
 -lto_output_node (ob, cnode, encoder);
 +   {
 + if (!dumping_omp_target || lookup_attribute (omp declare 
 target,
 + DECL_ATTRIBUTES 
 (node-symbol.decl)))
 +   lto_output_node (ob, cnode, encoder);
 +   }
else
 -lto_output_varpool_node (ob, varpool (node), encoder);
 + if (!dumping_omp_target || lookup_attribute (omp declare 
 target,
 + DECL_ATTRIBUTES 
 (node-symbol.decl)))
 +   lto_output_varpool_node (ob, varpool (node), encoder);

  }

have the symtab encoder already not contain the varpool nodes you
don't need.

And instead of looking up attributes, mark the symtab node with a flag.
   
   Good idea!
   I've tried creating 2 encoders, and adding only nodes with
   omp declare target attribute in omp case. There is still some is_omp
   passing to control  lto_set_symtab_encoder_in_partition behaivor, 
   because i think it's better than global var.
   What do you think?
  
  Updated version of the patch. I've checked that it doesn't break lto on
  SPEC 2006. Streaming for omp is enabled by -fopnemp flag. Works with and
  without enabled lto. Ok for gomp4 branch?
  
  
  ---
   gcc/cgraphunit.c  | 15 +--
   gcc/ipa-inline-analysis.c |  2 +-
   gcc/lto-cgraph.c  | 15 ++-
   gcc/lto-streamer.c|  5 +++--
   gcc/lto-streamer.h| 10 --
   gcc/lto/lto-partition.c   |  4 ++--
   gcc/passes.c  | 12 ++--
   gcc/tree-pass.h   |  2 +-
   8 files changed, 44 insertions(+), 21 deletions(-)
  
  diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
  index 1644ca9..d595475 100644
  --- a/gcc/cgraphunit.c
  +++ b/gcc/cgraphunit.c
  @@ -2016,7 +2016,18 @@ ipa_passes (void)
passes-all_lto_gen_passes);
   
 if (!in_lto_p)
  -ipa_write_summaries ();
  +{
  +  if (flag_openmp)
  +   {
  + section_name_prefix = OMP_SECTION_NAME_PREFIX;
  + ipa_write_summaries (true);
  +   }
  +  if (flag_lto)
  +   {
  + section_name_prefix = LTO_SECTION_NAME_PREFIX;
  + ipa_write_summaries (false);
  +   }
  +}
   
 if (flag_generate_lto)
   targetm.asm_out.lto_end ();
  @@ -2107,7 +2118,7 @@ compile (void)
 cgraph_state = CGRAPH_STATE_IPA;
   
 /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
  -  if (flag_lto)
  +  if (flag_lto || flag_openmp)
   lto_streamer_hooks_init ();
   
 /* Don't run the IPA passes if there was any error or sorry messages.  */
  diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
  index ba6221e..4420213 100644
  --- a/gcc/ipa-inline-analysis.c
  +++ b/gcc/ipa-inline-analysis.c
  @@ -3721,7 +3721,7 @@ inline_generate_summary (void)
   
 /* When not optimizing, do not bother to analyze.  Inlining is still done
because edge redirection needs to happen there.  */
  -  if (!optimize  !flag_lto  !flag_wpa)
  +  if (!optimize  !flag_lto  !flag_wpa  !flag_openmp)
   return;
   
 function_insertion_hook_holder =
  diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
  index 952588d..4a7d179 100644
  --- a/gcc/lto-cgraph.c
  +++ b/gcc/lto-cgraph.c
  @@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p 
  (lto_symtab_encoder_t encoder,
   
   void
   lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
  -symtab_node node)
  +symtab_node node, bool is_omp)
   {
  +  /* Ignore non omp target nodes for omp case.  */
  +  if (is_omp  !lookup_attribute (omp declare target,
  +  DECL_ATTRIBUTES (node-symbol.decl)))
  +return;
  +
 int index = lto_symtab_encoder_encode (encoder, (symtab_node)node);
 encoder-nodes[index].in_partition = true;
   }
  @@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder,
  ignored

Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-10-09 Thread Ilya Tocar

Ping.

On 03 Oct 20:05, Ilya Tocar wrote:
 On 26 Sep 21:21, Ilya Tocar wrote:
  On 25 Sep 15:48, Richard Biener wrote:
   On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com 
   wrote:
On 24 Sep 11:02, Richard Biener wrote:
On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com 
wrote:
 thus consider assigning the section
name in a different place.
   
Richard.
   
What do you mean  by different place?
I can add global dumping_omp_target variable to choose correct name,
depending on it's value (patch below). Is it better?
   
   More like passing down a different abstraction, like for
   
@@ -907,9 +907,15 @@ output_symtab (void)
 {
   symtab_node node = lto_symtab_encoder_deref (encoder, i);
   if (cgraph_node *cnode = dyn_cast cgraph_node (node))
-lto_output_node (ob, cnode, encoder);
+   {
+ if (!dumping_omp_target || lookup_attribute (omp declare 
target,
+ DECL_ATTRIBUTES 
(node-symbol.decl)))
+   lto_output_node (ob, cnode, encoder);
+   }
   else
-lto_output_varpool_node (ob, varpool (node), encoder);
+ if (!dumping_omp_target || lookup_attribute (omp declare 
target,
+ DECL_ATTRIBUTES 
(node-symbol.decl)))
+   lto_output_varpool_node (ob, varpool (node), encoder);
   
 }
   
   have the symtab encoder already not contain the varpool nodes you
   don't need.
   
   And instead of looking up attributes, mark the symtab node with a flag.
  
  Good idea!
  I've tried creating 2 encoders, and adding only nodes with
  omp declare target attribute in omp case. There is still some is_omp
  passing to control  lto_set_symtab_encoder_in_partition behaivor, 
  because i think it's better than global var.
  What do you think?
 
 Updated version of the patch. I've checked that it doesn't break lto on
 SPEC 2006. Streaming for omp is enabled by -fopnemp flag. Works with and
 without enabled lto. Ok for gomp4 branch?
 
 
 ---
  gcc/cgraphunit.c  | 15 +--
  gcc/ipa-inline-analysis.c |  2 +-
  gcc/lto-cgraph.c  | 15 ++-
  gcc/lto-streamer.c|  5 +++--
  gcc/lto-streamer.h| 10 --
  gcc/lto/lto-partition.c   |  4 ++--
  gcc/passes.c  | 12 ++--
  gcc/tree-pass.h   |  2 +-
  8 files changed, 44 insertions(+), 21 deletions(-)
 
 diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
 index 1644ca9..d595475 100644
 --- a/gcc/cgraphunit.c
 +++ b/gcc/cgraphunit.c
 @@ -2016,7 +2016,18 @@ ipa_passes (void)
 passes-all_lto_gen_passes);
  
if (!in_lto_p)
 -ipa_write_summaries ();
 +{
 +  if (flag_openmp)
 + {
 +   section_name_prefix = OMP_SECTION_NAME_PREFIX;
 +   ipa_write_summaries (true);
 + }
 +  if (flag_lto)
 + {
 +   section_name_prefix = LTO_SECTION_NAME_PREFIX;
 +   ipa_write_summaries (false);
 + }
 +}
  
if (flag_generate_lto)
  targetm.asm_out.lto_end ();
 @@ -2107,7 +2118,7 @@ compile (void)
cgraph_state = CGRAPH_STATE_IPA;
  
/* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
 -  if (flag_lto)
 +  if (flag_lto || flag_openmp)
  lto_streamer_hooks_init ();
  
/* Don't run the IPA passes if there was any error or sorry messages.  */
 diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
 index ba6221e..4420213 100644
 --- a/gcc/ipa-inline-analysis.c
 +++ b/gcc/ipa-inline-analysis.c
 @@ -3721,7 +3721,7 @@ inline_generate_summary (void)
  
/* When not optimizing, do not bother to analyze.  Inlining is still done
   because edge redirection needs to happen there.  */
 -  if (!optimize  !flag_lto  !flag_wpa)
 +  if (!optimize  !flag_lto  !flag_wpa  !flag_openmp)
  return;
  
function_insertion_hook_holder =
 diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
 index 952588d..4a7d179 100644
 --- a/gcc/lto-cgraph.c
 +++ b/gcc/lto-cgraph.c
 @@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t 
 encoder,
  
  void
  lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
 -  symtab_node node)
 +  symtab_node node, bool is_omp)
  {
 +  /* Ignore non omp target nodes for omp case.  */
 +  if (is_omp  !lookup_attribute (omp declare target,
 +DECL_ATTRIBUTES (node-symbol.decl)))
 +return;
 +
int index = lto_symtab_encoder_encode (encoder, (symtab_node)node);
encoder-nodes[index].in_partition = true;
  }
 @@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder,
 ignored by the partitioning logic earlier.  */
  
  lto_symtab_encoder_t 
 -compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 +compute_ltrans_boundary (lto_symtab_encoder_t

Ping Re: [gomp4] Dumping gimple for offload.

2013-10-03 Thread Ilya Tocar

On 26 Sep 21:21, Ilya Tocar wrote:
 On 25 Sep 15:48, Richard Biener wrote:
  On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
   On 24 Sep 11:02, Richard Biener wrote:
   On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com 
   wrote:
thus consider assigning the section
   name in a different place.
  
   Richard.
  
   What do you mean  by different place?
   I can add global dumping_omp_target variable to choose correct name,
   depending on it's value (patch below). Is it better?
  
  More like passing down a different abstraction, like for
  
   @@ -907,9 +907,15 @@ output_symtab (void)
{
  symtab_node node = lto_symtab_encoder_deref (encoder, i);
  if (cgraph_node *cnode = dyn_cast cgraph_node (node))
   -lto_output_node (ob, cnode, encoder);
   +   {
   + if (!dumping_omp_target || lookup_attribute (omp declare 
   target,
   + DECL_ATTRIBUTES 
   (node-symbol.decl)))
   +   lto_output_node (ob, cnode, encoder);
   +   }
  else
   -lto_output_varpool_node (ob, varpool (node), encoder);
   + if (!dumping_omp_target || lookup_attribute (omp declare 
   target,
   + DECL_ATTRIBUTES 
   (node-symbol.decl)))
   +   lto_output_varpool_node (ob, varpool (node), encoder);
  
}
  
  have the symtab encoder already not contain the varpool nodes you
  don't need.
  
  And instead of looking up attributes, mark the symtab node with a flag.
 
 Good idea!
 I've tried creating 2 encoders, and adding only nodes with
 omp declare target attribute in omp case. There is still some is_omp
 passing to control  lto_set_symtab_encoder_in_partition behaivor, 
 because i think it's better than global var.
 What do you think?

Updated version of the patch. I've checked that it doesn't break lto on
SPEC 2006. Streaming for omp is enabled by -fopnemp flag. Works with and
without enabled lto. Ok for gomp4 branch?


---
 gcc/cgraphunit.c  | 15 +--
 gcc/ipa-inline-analysis.c |  2 +-
 gcc/lto-cgraph.c  | 15 ++-
 gcc/lto-streamer.c|  5 +++--
 gcc/lto-streamer.h| 10 --
 gcc/lto/lto-partition.c   |  4 ++--
 gcc/passes.c  | 12 ++--
 gcc/tree-pass.h   |  2 +-
 8 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 1644ca9..d595475 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2016,7 +2016,18 @@ ipa_passes (void)
  passes-all_lto_gen_passes);
 
   if (!in_lto_p)
-ipa_write_summaries ();
+{
+  if (flag_openmp)
+   {
+ section_name_prefix = OMP_SECTION_NAME_PREFIX;
+ ipa_write_summaries (true);
+   }
+  if (flag_lto)
+   {
+ section_name_prefix = LTO_SECTION_NAME_PREFIX;
+ ipa_write_summaries (false);
+   }
+}
 
   if (flag_generate_lto)
 targetm.asm_out.lto_end ();
@@ -2107,7 +2118,7 @@ compile (void)
   cgraph_state = CGRAPH_STATE_IPA;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_lto || flag_openmp)
 lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index ba6221e..4420213 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -3721,7 +3721,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
  because edge redirection needs to happen there.  */
-  if (!optimize  !flag_lto  !flag_wpa)
+  if (!optimize  !flag_lto  !flag_wpa  !flag_openmp)
 return;
 
   function_insertion_hook_holder =
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 952588d..4a7d179 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t 
encoder,
 
 void
 lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
-symtab_node node)
+symtab_node node, bool is_omp)
 {
+  /* Ignore non omp target nodes for omp case.  */
+  if (is_omp  !lookup_attribute (omp declare target,
+  DECL_ATTRIBUTES (node-symbol.decl)))
+return;
+
   int index = lto_symtab_encoder_encode (encoder, (symtab_node)node);
   encoder-nodes[index].in_partition = true;
 }
@@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder,
ignored by the partitioning logic earlier.  */
 
 lto_symtab_encoder_t 
-compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
+compute_ltrans_boundary (lto_symtab_encoder_t in_encoder, bool is_omp)
 {
   struct cgraph_node *node;
   struct cgraph_edge *edge;
@@ -779,7 +784,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t

Re: [gomp4] Library side of depend clause support

2013-09-30 Thread Ilya Tocar

On 27 Sep 12:08, Jakub Jelinek wrote:

Looks like you forgot some files. I've checked
http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=202968
And e. g. hashtab.h is missing. So currently branch is failing to build,
with task.c:46:21: fatal error: hashtab.h: No such file or directory

 Here is what I've committed now, the incremental changes were really only
 using a structure with flex array member for the dependers vectors,
 removing/making redundant earlier !ent-is_in when adding !is_in into the
 chain and addition of new testcases.
 
 Let's improve it incrementally later.
 
 2013-09-27  Jakub Jelinek  ja...@redhat.com
 
   * libgomp.h: Include stdlib.h.
   (struct gomp_task_depend_entry,
   struct gomp_dependers_vec): New types.
   (struct gomp_task): Add dependers, depend_hash, depend_count,
   num_dependees and depend fields.
   (struct gomp_taskgroup): Add num_children field.
   (gomp_finish_task): Free depend_hash if non-NULL.
   * libgomp_g.h (GOMP_task): Add depend argument.
   * hashtab.h: New file.
   * task.c: Include hashtab.h.
   (hash_entry_type): New typedef.
   (htab_alloc, htab_free, htab_hash, htab_eq): New inlines.
   (gomp_init_task): Clear dependers, depend_hash and depend_count
   fields.
   (GOMP_task): Add depend argument, handle depend clauses.  Increment
   num_children field in taskgroup.
   (gomp_task_run_pre): Don't increment task_running_count here,
   nor clear task_pending bit.
   (gomp_task_run_post_handle_depend_hash,
   gomp_task_run_post_handle_dependers,
   gomp_task_run_post_handle_depend): New functions.
   (gomp_task_run_post_remove_parent): Clear in_taskwait before
   signalling corresponding semaphore.
   (gomp_task_run_post_remove_taskgroup): Decrement num_children
   field and make the decrement to 0 MEMMODEL_RELEASE operation,
   rather than storing NULL to taskgroup-children.  Clear
   in_taskgroup_wait before signalling corresponding semaphore.
   (gomp_barrier_handle_tasks): Move task_running_count increment
   and task_pending bit clearing here.  Call
   gomp_task_run_post_handle_depend.  If more than one new tasks
   have been queued, wake other threads if needed.
   (GOMP_taskwait): Call gomp_task_run_post_handle_depend.  If more
   than one new tasks have been queued, wake other threads if needed.
   After waiting on taskwait_sem, enter critical section again.
   (GOMP_taskgroup_start): Initialize num_children field.
   (GOMP_taskgroup_end): Check num_children instead of children
   before critical section.  If children is NULL, but num_children
   is non-zero, wait on taskgroup_sem.  Call
   gomp_task_run_post_handle_depend.  If more than one new tasks have
   been queued, wake other threads if needed.  After waiting on
   taskgroup_sem, enter critical section again.
   * testsuite/libgomp.c/depend-1.c: New test.
   * testsuite/libgomp.c/depend-2.c: New test.
   * testsuite/libgomp.c/depend-3.c: New test.
   * testsuite/libgomp.c/depend-4.c: New test.

Re: [gomp4] Dumping gimple for offload.

2013-09-26 Thread Ilya Tocar

On 25 Sep 15:48, Richard Biener wrote:
 On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  On 24 Sep 11:02, Richard Biener wrote:
  On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com 
  wrote:
   thus consider assigning the section
  name in a different place.
 
  Richard.
 
  What do you mean  by different place?
  I can add global dumping_omp_target variable to choose correct name,
  depending on it's value (patch below). Is it better?
 
 More like passing down a different abstraction, like for
 
  @@ -907,9 +907,15 @@ output_symtab (void)
   {
 symtab_node node = lto_symtab_encoder_deref (encoder, i);
 if (cgraph_node *cnode = dyn_cast cgraph_node (node))
  -lto_output_node (ob, cnode, encoder);
  +   {
  + if (!dumping_omp_target || lookup_attribute (omp declare target,
  + DECL_ATTRIBUTES 
  (node-symbol.decl)))
  +   lto_output_node (ob, cnode, encoder);
  +   }
 else
  -lto_output_varpool_node (ob, varpool (node), encoder);
  + if (!dumping_omp_target || lookup_attribute (omp declare target,
  + DECL_ATTRIBUTES 
  (node-symbol.decl)))
  +   lto_output_varpool_node (ob, varpool (node), encoder);
 
   }
 
 have the symtab encoder already not contain the varpool nodes you
 don't need.
 
 And instead of looking up attributes, mark the symtab node with a flag.

Good idea!
I've tried creating 2 encoders, and adding only nodes with
omp declare target attribute in omp case. There is still some is_omp
passing to control  lto_set_symtab_encoder_in_partition behaivor, 
because i think it's better than global var.
What do you think?

---
 gcc/cgraphunit.c| 10 +-
 gcc/lto-cgraph.c| 15 ++-
 gcc/lto-streamer.c  |  3 ++-
 gcc/lto-streamer.h  | 10 --
 gcc/lto/lto-partition.c |  4 ++--
 gcc/passes.c| 10 +-
 gcc/tree-pass.h |  2 +-
 7 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 1644ca9..9e0fc77 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2016,7 +2016,15 @@ ipa_passes (void)
  passes-all_lto_gen_passes);
 
   if (!in_lto_p)
-ipa_write_summaries ();
+{
+  if (flag_openmp)
+   {
+ section_name_prefix = OMP_SECTION_NAME_PREFIX;
+ ipa_write_summaries (true);
+   }
+  section_name_prefix = LTO_SECTION_NAME_PREFIX;
+  ipa_write_summaries (false);
+}
 
   if (flag_generate_lto)
 targetm.asm_out.lto_end ();
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 952588d..4a7d179 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t 
encoder,
 
 void
 lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
-symtab_node node)
+symtab_node node, bool is_omp)
 {
+  /* Ignore non omp target nodes for omp case.  */
+  if (is_omp  !lookup_attribute (omp declare target,
+  DECL_ATTRIBUTES (node-symbol.decl)))
+return;
+
   int index = lto_symtab_encoder_encode (encoder, (symtab_node)node);
   encoder-nodes[index].in_partition = true;
 }
@@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder,
ignored by the partitioning logic earlier.  */
 
 lto_symtab_encoder_t 
-compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
+compute_ltrans_boundary (lto_symtab_encoder_t in_encoder, bool is_omp)
 {
   struct cgraph_node *node;
   struct cgraph_edge *edge;
@@ -779,7 +784,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 {
   node = lsei_cgraph_node (lsei);
   add_node_to (encoder, node, true);
-  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)node);
+  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)node, is_omp);
   add_references (encoder, node-symbol.ref_list);
   /* For proper debug info, we need to ship the origins, too.  */
   if (DECL_ABSTRACT_ORIGIN (node-symbol.decl))
@@ -794,7 +799,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 {
   struct varpool_node *vnode = lsei_varpool_node (lsei);
 
-  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)vnode);
+  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)vnode, 
is_omp);
   lto_set_symtab_encoder_encode_initializer (encoder, vnode);
   add_references (encoder, vnode-symbol.ref_list);
   /* For proper debug info, we need to ship the origins, too.  */
@@ -802,7 +807,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
{
  struct varpool_node *origin_node
  = varpool_get_node (DECL_ABSTRACT_ORIGIN (node-symbol.decl));
- lto_set_symtab_encoder_in_partition (encoder

[gomp4] Dumping gimple for offload.

2013-09-23 Thread Ilya Tocar

Hi,

I've rebased my patch.
Is it ok for gomp4 


2013/9/13 Ilya Tocar tocarip.in...@gmail.com:
 Hi,

 I'm working on dumping gimple for omp pragma target stuff into
 gnu.target_lto_ sections.
 I've tried to reuse current lto infrastructure as much as possible.

 Could you please take a look at attached patch?


---
 gcc/ipa-inline-analysis.c |   2 +-
 gcc/ipa-profile.c |   2 +-
 gcc/ipa-prop.c|   4 +-
 gcc/ipa-pure-const.c  |   2 +-
 gcc/ipa-reference.c   |   2 +-
 gcc/lto-cgraph.c  |  22 +++--
 gcc/lto-opts.c|   2 +-
 gcc/lto-section-out.c |  14 ++-
 gcc/lto-streamer-out.c| 215 +++---
 gcc/lto-streamer.c|   6 +-
 gcc/lto-streamer.h|  13 +--
 gcc/lto/lto.c |   2 +-
 gcc/passes.c  |   3 +-
 gcc/passes.def|   2 +
 gcc/timevar.def   |   2 +
 gcc/tree-pass.h   |   2 +
 16 files changed, 237 insertions(+), 58 deletions(-)

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index ba6221e..ea3fc90 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4023,7 +4023,7 @@ inline_write_summary (void)
}
 }
   streamer_write_char_stream (ob-main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, false);
   destroy_output_block (ob);
 
   if (optimize  !flag_ipa_cp)
diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
index 424e4a6..b16ba6c 100644
--- a/gcc/ipa-profile.c
+++ b/gcc/ipa-profile.c
@@ -247,7 +247,7 @@ ipa_profile_write_summary (void)
   streamer_write_uhwi_stream (ob-main_stream, histogram[i]-time);
   streamer_write_uhwi_stream (ob-main_stream, histogram[i]-size);
 }
-  lto_destroy_simple_output_block (ob);
+  lto_destroy_simple_output_block (ob, false);
 }
 
 /* Deserialize the ipa info for lto.  */
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index c09ec2f..69603c9 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -4234,7 +4234,7 @@ ipa_prop_write_jump_functions (void)
 ipa_write_node_info (ob, node);
 }
   streamer_write_char_stream (ob-main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, false);
   destroy_output_block (ob);
 }
 
@@ -4409,7 +4409,7 @@ ipa_prop_write_all_agg_replacement (void)
write_agg_replacement_chain (ob, node);
 }
   streamer_write_char_stream (ob-main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, false);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 55b679d..d6bbd52 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -988,7 +988,7 @@ pure_const_write_summary (void)
}
 }
 
-  lto_destroy_simple_output_block (ob);
+  lto_destroy_simple_output_block (ob, false);
 }
 
 
diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index e6f19fd..0593c77 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -1022,7 +1022,7 @@ ipa_reference_write_optimization_summary (void)
  }
   }
   BITMAP_FREE (ltrans_statics);
-  lto_destroy_simple_output_block (ob);
+  lto_destroy_simple_output_block (ob, false);
   splay_tree_delete (reference_vars_to_consider);
 }
 
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 952588d..831e74d 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -690,7 +690,7 @@ output_outgoing_cgraph_edges (struct cgraph_edge *edge,
 /* Output the part of the cgraph in SET.  */
 
 static void
-output_refs (lto_symtab_encoder_t encoder)
+output_refs (lto_symtab_encoder_t encoder, bool is_omp)
 {
   lto_symtab_encoder_iterator lsei;
   struct lto_simple_output_block *ob;
@@ -719,7 +719,7 @@ output_refs (lto_symtab_encoder_t encoder)
 
   streamer_write_uhwi_stream (ob-main_stream, 0);
 
-  lto_destroy_simple_output_block (ob);
+  lto_destroy_simple_output_block (ob, is_omp);
 }
 
 /* Add NODE into encoder as well as nodes it is cloned from.
@@ -878,7 +878,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 /* Output the part of the symtab in SET and VSET.  */
 
 void
-output_symtab (void)
+output_symtab (bool is_omp)
 {
   struct cgraph_node *node;
   struct lto_simple_output_block *ob;
@@ -907,9 +907,15 @@ output_symtab (void)
 {
   symtab_node node = lto_symtab_encoder_deref (encoder, i);
   if (cgraph_node *cnode = dyn_cast cgraph_node (node))
-lto_output_node (ob, cnode, encoder);
+   {
+ if (!is_omp || lookup_attribute (omp declare target,
+ DECL_ATTRIBUTES (node-symbol.decl)))
+ lto_output_node (ob, cnode, encoder);
+   }
   else
-lto_output_varpool_node (ob, varpool (node), encoder);
+ if (!is_omp || lookup_attribute (omp declare target,
+ DECL_ATTRIBUTES (node-symbol.decl)))
+   lto_output_varpool_node (ob, varpool (node), encoder);

 }
 
@@ -924,7 +930,7 @@ output_symtab (void

[patch RFC,PR50038]

2011-10-04 Thread Ilya Tocar

Hi everyone,

This patch fixes PR 50038 (redundant zero extensions) by modifying
implicit-zee pass
to also remove unneeded zero extensions from QImode to SImode.
There is  6% improvement in rgbyiqv test from EEMBC 2.0 benchmark on x86-64.
I am not sure if this is correct approach ( tom modify  implicit-zee
pass), so comments are welcome.
Also if this aproach is correct we will need to enable  implicit-zee
pass on some new targets ( for example x86 32bit).

It passes bootstrap and make-check.

Here is a Changelog:

2011-09-27  Ilya Tocar  ilya.to...@intel.com

* implicit-zee.c: Added 2011 to copyright.
(combine_set_zero_extend): Add QImode.
(merge_def_and_ze): Likewise.
(add_removable_zero_extend): Likewise.
(not_qi_to_si): New.
(make_defs_and_copies_lists): Add check for QImode.


zee.patch
Description: Binary data

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-25 Thread Ilya Tocar

Sorry. Like this?

Changelog:

2011-08-25  Ilya Tocar  ilya.to...@intel.com

             * config/i386/fmaintrin.h: New.
             * config.gcc: Add fmaintrin.h.
             * config/i386/i386.c
 (enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New.
             IX86_BUILTIN_VFMADDSD3: Likewise.
             * config/i386/sse.md (fmai_vmfmadd_mode): New.
             (*fmai_fmadd_mode): Likewise.
             (*fmai_fmsub_mode): Likewise.
             (*fmai_fnmadd_mode): Likewise.
             (*fmai_fnmsub_mode): Likewise.
             * config/i386/x86intrin.h: Add fmaintrin.h.

And Changelog for testsuite:

2011-08-25  Ilya Tocar ilya.to...@intel.com

             * gcc.target/i386/fma-check.h: New.
             * gcc.target/i386/fma-256-fmaddXX.c: New testcase.
             * gcc.target/i386/fma-256-fmaddsubXX.c: Likewise.
             * gcc.target/i386/fma-256-fmsubXX.c: Likewise.
             * gcc.target/i386/fma-256-fmsubaddXX.c: Likewise.
             * gcc.target/i386/fma-256-fnmaddXX.c: Likewise.
             * gcc.target/i386/fma-256-fnmsubXX.c: Likewise.
             * gcc.target/i386/fma-fmaddXX.c: Likewise.
             * gcc.target/i386/fma-fmaddsubXX.c: Likewise.
             * gcc.target/i386/fma-fmsubXX.c: Likewise.
             * gcc.target/i386/fma-fmsubaddXX.c: Likewise.
             * gcc.target/i386/fma-fnmaddXX.c: Likewise.
             * gcc.target/i386/fma-fnmsubXX.c: Likewise.
             * gcc.target/i386/fma-compile.c: Likewise.
             * gcc.target/i386/i386.exp (check_effective_target_fma): New.
             * gcc.target/i386/sse-12.c: Add -mfma.
             * gcc.target/i386/sse-13.c: Likewise.
             * gcc.target/i386/sse-14.c: Likewise.
             * gcc.target/i386/sse-22.c: Likewise.
             * gcc.target/i386/sse-23.c: Likewise.
             * gcc.target/i386/sse-13.c: Likewise.
             * g++.dg/other/i386-2.c: Likewise.
             * g++.dg/other/i386-3.c: Likewise.

2011/8/25 Uros Bizjak ubiz...@gmail.com:
 On Thu, Aug 25, 2011 at 10:18 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 Changelog:

 2011-08-25  Ilya Tocar  ilya.to...@intel.com

               * config/i386/fmaintrin.h: New.
               * config.gcc: Add fmaintrin.h.
               * config/i386/i386.c
               * ix86_builtins (IX86_BUILTIN_VFMADDSS3): New.
               (IX86_BUILTIN_VFMADDSD3): Likewise.

 (enum ix86_builtins) IX86_...: New.
 IX86_...: Likewise.

               * config/i386/sse.md (fmai_vmfmadd_mode): New.
               (*fmai_fmadd_mode): Likewise.
               (*fmai_fmsub_mode): Likewise.
               (*fmai_fnmadd_mode): Likewise.
               (*fmai_fnmsub_mode): Likewise.
               * config/i386/x86intrin.h: Add fmaintrin.h.

 And Changelog for testsuite:

 2011-08-25  Ilya Tocar ilya.to...@intel.com

               * gcc.target/i386/fma-check.h: New.
               * gcc.target/i386/fma-256-fmaddXX.c: New testcase.
               * gcc.target/i386/fma-256-fmaddsubXX.c: Likewise.
               * gcc.target/i386/fma-256-fmsubXX.c: Likewise.
               * gcc.target/i386/fma-256-fmsubaddXX.c: Likewise.
               * gcc.target/i386/fma-256-fnmaddXX.c: Likewise.
               * gcc.target/i386/fma-256-fnmsubXX.c: Likewise.
               * gcc.target/i386/fma-fmaddXX.c: Likewise.
               * gcc.target/i386/fma-fmaddsubXX.c: Likewise.
               * gcc.target/i386/fma-fmsubXX.c: Likewise.
               * gcc.target/i386/fma-fmsubaddXX.c: Likewise.
               * gcc.target/i386/fma-fnmaddXX.c: Likewise.
               * gcc.target/i386/fma-fnmsubXX.c: Likewise.
               * gcc.target/i386/fma-compile.c: Likewise.
               * gcc.target/i386/i386.exp (check_effective_target_fma): New.
               * gcc.target/i386/sse-12.c: Add -mfma.
               * gcc.target/i386/sse-13.c: Likewise.
               * gcc.target/i386/sse-14.c: Likewise.
               * gcc.target/i386/sse-22.c: Likewise.
               * gcc.target/i386/sse-23.c: Likewise.
               * gcc.target/i386/sse-13.c: Likewise.

 Duplicate.

               * g++.dg/other/i386-2.c: Likewise.

 *g++.dg/other/i386-2.C

               * g++.dg/other/i386-2.c: Likewise.

 * g++.dg/other/i386-3.C

 Uros.

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-25 Thread Ilya Tocar

Fixed.

Changelog:

2011-08-25  Ilya Tocar  ilya.to...@intel.com

 * config/i386/fmaintrin.h: New.
 * config.gcc: Add fmaintrin.h.
 * config/i386/i386.c
(enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New.
 IX86_BUILTIN_VFMADDSD3: Likewise.
 * config/i386/sse.md (fmai_vmfmadd_mode): New.
 (*fmai_fmadd_mode): Likewise.
 (*fmai_fmsub_mode): Likewise.
 (*fmai_fnmadd_mode): Likewise.
 (*fmai_fnmsub_mode): Likewise.
 * config/i386/x86intrin.h: Add fmaintrin.h.

And Changelog for testsuite:

2011-08-25  Ilya Tocar ilya.to...@intel.com

 * gcc.target/i386/fma-check.h: New.
 * gcc.target/i386/fma-256-fmaddXX.c: New testcase.
 * gcc.target/i386/fma-256-fmaddsubXX.c: Likewise.
 * gcc.target/i386/fma-256-fmsubXX.c: Likewise.
 * gcc.target/i386/fma-256-fmsubaddXX.c: Likewise.
 * gcc.target/i386/fma-256-fnmaddXX.c: Likewise.
 * gcc.target/i386/fma-256-fnmsubXX.c: Likewise.
 * gcc.target/i386/fma-fmaddXX.c: Likewise.
 * gcc.target/i386/fma-fmaddsubXX.c: Likewise.
 * gcc.target/i386/fma-fmsubXX.c: Likewise.
 * gcc.target/i386/fma-fmsubaddXX.c: Likewise.
 * gcc.target/i386/fma-fnmaddXX.c: Likewise.
 * gcc.target/i386/fma-fnmsubXX.c: Likewise.
 * gcc.target/i386/fma-compile.c: Likewise.
 * gcc.target/i386/i386.exp (check_effective_target_fma): New.
 * gcc.target/i386/sse-12.c: Add -mfma.
 * gcc.target/i386/sse-13.c: Likewise.
 * gcc.target/i386/sse-14.c: Likewise.
 * gcc.target/i386/sse-22.c: Likewise.
 * gcc.target/i386/sse-23.c: Likewise.
 * g++.dg/other/i386-2.C: Likewise.
 * g++.dg/other/i386-3.C: Likewise.

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-24 Thread Ilya Tocar

Removed extra blank lines and pass tests through indent.

2011/8/23 Uros Bizjak ubiz...@gmail.com:
 On Tue, Aug 23, 2011 at 4:19 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 I removed unnecessary expands/builtins and tests are now compiled with -O2.
 Is this version ok?

 OK with minor comments:

 - Please remove extra blank lines you introduced in sse.md
 - Also, I'd recomend you to pass new testcases through indent
 command to fix formatting.

 Thanks,
 Uros.



patch
Description: Binary data

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-24 Thread Ilya Tocar

2011/8/24 Jakub Jelinek ja...@redhat.com:
 On Wed, Aug 24, 2011 at 12:48:06PM +0400, Ilya Tocar wrote:
 Removed extra blank lines and pass tests through indent.

 You haven't:
Ah sorry only noticed one in sse.md.

 @@ -25113,6 +25125,9 @@ static const struct builtin_description 
 bdesc_multi_arg[] =
     __builtin_ia32_vfmaddpd256, IX86_BUILTIN_VFMADDPD256,
     UNKNOWN, (int)MULTI_ARG_3_DF2 },

 +
 +
 +
   { OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4sf,
     __builtin_ia32_vfmaddsubps, IX86_BUILTIN_VFMADDSUBPS,
     UNKNOWN, (int)MULTI_ARG_3_SF },

 Also, why is fmaintrin.h including immintrin.h?  You can't include fmaintrin.h
 directly and x86intrin.h has already included it before including fmaintrin.h.
Makes sense. Removed it.

        Jakub



patch
Description: Binary data

99 matches

Mail list logo