Re: [PATCH] Fix find_base_term in 32-bit -fpic code (PR lto/64025)

2014-11-25 Thread Uros Bizjak
On Tue, Nov 25, 2014 at 8:40 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Nov 25, 2014 at 12:25 AM, Jakub Jelinek ja...@redhat.com wrote:

 The fallback delegitimization I've added as last option mainly for
 debug info purposes, when we don't know if the base is a PIC register
 or say a PIC register plus some addend, unfortunately in some tests
 broke find_base_term, which for PLUS looks only at the first operand
 and recursion on it finds a base term, it returns it immediately.
 So, it found base term of _GLOBAL_OFFSET_TABLE_, when the right base
 term is actually in the second operand.

 This patch fixes it by swapping the operands, debug info doesn't care about
 the order, it won't match in any instruction anyway, but helps alias.c.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

 2014-11-24  Jakub Jelinek  ja...@redhat.com

 PR lto/64025
 * config/i386/i386.c (ix86_delegitimize_address): Ensure result
 comes before (addend - _GLOBAL_OFFSET_TABLE_) term.

 Can you also swap operands of (%ecx - %ebx) + foo? There is no point
 digging into RTX involving registers only when we know that we are
 looking for foo. This will also be consistent with the code you
 patched below.

Something like attached prototype patch.

Uros.
Index: i386.c
===
--- i386.c  (revision 218037)
+++ i386.c  (working copy)
@@ -14847,19 +14847,20 @@ ix86_delegitimize_address (rtx x)
 leal (%ebx, %ecx, 4), %ecx
 ...
 movl foo@GOTOFF(%ecx), %edx
-in which case we return (%ecx - %ebx) + foo
-or (%ecx - _GLOBAL_OFFSET_TABLE_) + foo if pseudo_pic_reg
+in which case we return foo + (%ecx - %ebx)
+or foo + (%ecx - _GLOBAL_OFFSET_TABLE_) if pseudo_pic_reg
 and reload has completed.  */
   if (pic_offset_table_rtx
   (!reload_completed || !ix86_use_pseudo_pic_reg ()))
-result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
-pic_offset_table_rtx),
-  result);
+result = gen_rtx_PLUS (Pmode, result,
+  gen_rtx_MINUS (Pmode, copy_rtx (addend),
+ pic_offset_table_rtx));
   else if (pic_offset_table_rtx  !TARGET_MACHO  !TARGET_VXWORKS_RTP)
{
  rtx tmp = gen_rtx_SYMBOL_REF (Pmode, GOT_SYMBOL_NAME);
- tmp = gen_rtx_MINUS (Pmode, copy_rtx (addend), tmp);
- result = gen_rtx_PLUS (Pmode, tmp, result);
+ result = gen_rtx_PLUS (Pmode, result,
+gen_rtx_MINUS (Pmode, copy_rtx (addend),
+   tmp));
}
   else
return orig_x;


Re: [PATCH, ciklplus]: Use -ffloat-store for 32bit x86 in cilk-plus/AN/builtin_fn_{custom,mutating}.c

2014-11-25 Thread Uros Bizjak
On Tue, Nov 25, 2014 at 10:23 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Nov 24, 2014 at 10:33 PM, Jeff Law l...@redhat.com wrote:
 On 11/22/14 11:50, Uros Bizjak wrote:

 Hello!

 These two tests fix PR target/63847 [1], where x87 excess precision
 causes testcase to fail. The problem was triggered by -fpic, please
 see the PR for analysis.

 The patch adds -ffloat-store for 32bit x86 target, a standard and well
 tested solution for this problem.

 2014-11-22  Uros Bizjak  ubiz...@gmail.com

  PR target/63847
  * c-c++-common/cilk-plus/AN/builtin_fn_custom.c: Add -ffloat-store
  for 32bit x86 targets.
  * c-c++-common/cilk-plus/AN/builtin_fn_mutating.c: Ditto.

 OK.

 Don't we have -fexcess-precision=standard for this now?

Oh ... indeed. I will update the patch to enable it for all x86 targets.

Thanks,
Uros.


Re: [PATCH, ciklplus]: Use -ffloat-store for 32bit x86 in cilk-plus/AN/builtin_fn_{custom,mutating}.c

2014-11-25 Thread Uros Bizjak
On Tue, Nov 25, 2014 at 10:38 AM, Uros Bizjak ubiz...@gmail.com wrote:

 These two tests fix PR target/63847 [1], where x87 excess precision
 causes testcase to fail. The problem was triggered by -fpic, please
 see the PR for analysis.

 The patch adds -ffloat-store for 32bit x86 target, a standard and well
 tested solution for this problem.

 2014-11-22  Uros Bizjak  ubiz...@gmail.com

  PR target/63847
  * c-c++-common/cilk-plus/AN/builtin_fn_custom.c: Add -ffloat-store
  for 32bit x86 targets.
  * c-c++-common/cilk-plus/AN/builtin_fn_mutating.c: Ditto.

 OK.

 Don't we have -fexcess-precision=standard for this now?

 Oh ... indeed. I will update the patch to enable it for all x86 targets.

cc1plus: sorry, unimplemented: -fexcess-precision=standard for C++

Uros.


[PATCH, libgfortran]: Remove unused variable

2014-11-25 Thread Uros Bizjak
Hello!

2014-11-25  Uros Bizjak  ubiz...@gmail.com

* intrinsics/env.c (getenv): Remove unused variable res_len.

Bootstrapped on x86_64-linux-gnu.

Almost trivial, but ... OK for mainline?

Uros.

Index: intrinsics/env.c
===
--- intrinsics/env.c(revision 218056)
+++ intrinsics/env.c(working copy)
@@ -42,7 +42,6 @@ PREFIX(getenv) (char * name, char * value, gfc_cha
 {
   char *name_nt;
   char *res = NULL;
-  int res_len;

   if (name == NULL || value == NULL)
 runtime_error (Both arguments to getenv are mandatory.);


[PATCH, libobjc]: Remove ‘...’ is static but used in inline function ‘...’ which is not static

2014-11-25 Thread Uros Bizjak
Hello!

Recently, gcc bootstrap started to emit following warnings when
building libobjc:

libobjc/sendmsg.c:338:13: warning: ‘get_implementation’ is static but
used in inline function ‘get_imp’ which is not static
libobjc/sendmsg.c:335:15: warning: ‘sarray_get_safe’ is static but
used in inline function ‘get_imp’ which is not static
libobjc/sendmsg.c:143:21: warning: ‘__objc_word_forward’ is static but
used in inline function ‘__objc_get_forward_imp’ which is not static
libobjc/sendmsg.c:141:21: warning: ‘__objc_double_forward’ is static
but used in inline function ‘__objc_get_forward_imp’ which is not
static
libobjc/sendmsg.c:139:21: warning: ‘__objc_block_forward’ is static
but used in inline function ‘__objc_get_forward_imp’ which is not
static


2014-11-25  Uros Bizjak  ubiz...@gmail.com

* sendmsg.c (get_imp): Declare as static inline.
(__objc_get_forward_imp): Ditto.

Bootstrapped on x86_64-linux-gnu.

OK for mainline?

Uros.

Index: sendmsg.c
===
--- sendmsg.c   (revision 218056)
+++ sendmsg.c   (working copy)
@@ -105,7 +105,7 @@
 id nil_method (id, SEL);

 /* Given a selector, return the proper forwarding implementation.  */
-inline
+static inline
 IMP
 __objc_get_forward_imp (id rcv, SEL sel)
 {
@@ -320,7 +320,7 @@
   return res;
 }

-inline
+static inline
 IMP
 get_imp (Class class, SEL sel)
 {


Re: [PATCH 04/08] PR jit/63854: Remove xstrdup from ipa/cgraph fprintf calls

2014-11-26 Thread Uros Bizjak
Hello!

 cgraph*.c and ipa-*.c use xstrdup on strings when dumping them via
 fprintf, leaking all of the duplicated buffers.

 Is/was there a reason for doing this?

Yes, please see [1] and PR 53136 [2]. As said in [1]:

There is a problem with multiple calls of cgraph_node_name in fprintf
dumps. Please note that C++ uses caching in
cxx_printable_name_internal (aka LANG_HOOKS_DECL_PRINTABLE_NAME), so
when cxx_printable_name_internal is called multiple times from printf
(i.e. fprintf %s/%i - %s/%i), it can happen that the first string
gets evicted by the second call, before fprintf is fully evaluated.

 Taking them out fixes these leaks (seen when dumping is enabled):

But you will get Invalid read of size X instead.

The patch at [1] fixed these, but introduced memory leaks, which were
tolerable at the time:

I think that small memory leak is tolerable here (the changes are
exclusively in the dump code), and follows the same approach as in
java frontend.

It seems that these assumptions are not valid anymore.

[1] https://gcc.gnu.org/ml/gcc-patches/2012-04/msg01904.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53136

Uros.


Re: [PATCH, i386] Add new arg values for __builtin_cpu_supports

2014-11-26 Thread Uros Bizjak
Hello!

 I've added avx512f support to __builtin_cpu_supports.
 I'm not sure about bw+vl, i think for compound values like
 avx512bd+dq+vl, arch is better. Also for such cases prority is unclear,
 what should we choose bw+vl  or e. g. avx512f+er?
 I've left MPX bits in cpuid.h, in case we will need them later (e. g.
 for runtime mpx tests enabling).

 Ok for trunk?

 gcc/

 * config/i386/cpuid.h (bit_MPX, bit_BNDREGS, bit_BNDCSR):
 Define.
 * config/i386/i386.c (get_builtin_code_for_version): Add avx512f.
 (fold_builtin_cpu): Ditto.
 * doc/extend.texi: Documment it.


 gcc/testsuite/

 * g++.dg/ext/mv2.C: Add test for target (avx512f).
 * gcc.target/i386/builtin_target.c: Ditto.


 libgcc/

 * config/i386/cpuinfo.c (processor_features): Add FEATURE_AVX512F.
 * config/i386/cpuinfo.c (get_available_features): Detect it.

OK.

Thanks,
Uros.


[RFC PATCH, i386]: Prefer %ebx in set_got patterns

2014-11-27 Thread Uros Bizjak
Hello!

Attached patch helps RA to choose the most appropriate PIC register by
changing the register preference for set_got patterns. Using this
patch, there should really be a reason for RA to avoid ABI mandated
hard PIC reg.

This patch avoids many mov %exx,%ebx in front of the calls, that
happen with unpatched compiler even with Vladimir's latest RA patch to
avoid duplicated PIC registers.

As a smoke test, I have checked 32bit libgo.so.6.0.0 library, where now we have:

[uros@omen7 .libs]$ grep thunk.bx aaa | wc -l
7693
[uros@omen7 .libs]$ grep thunk.ax aaa | wc -l
10
[uros@omen7 .libs]$ grep thunk.cx aaa | wc -l
4
[uros@omen7 .libs]$ grep thunk.dx aaa | wc -l
8
[uros@omen7 .libs]$ grep thunk.bp aaa | wc -l
497
[uros@omen7 .libs]$ grep thunk.si aaa | wc -l
145
[uros@omen7 .libs]$ grep thunk.di aaa | wc -l
198

2014-11-27  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (set_got): Use =b,?r constraint for operand 0.
(set_got_labelled): Ditto.
(set_got_rex64): Ditto.
(set_rip_rex64): Ditto.
(set_got_offset_rex64): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Thoughts?

Uros
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 218111)
+++ config/i386/i386.md (working copy)
@@ -12101,7 +12101,7 @@
   ix86_expand_prologue (); DONE;)
 
 (define_insn set_got
-  [(set (match_operand:SI 0 register_operand =r)
+  [(set (match_operand:SI 0 register_operand =b,?r)
(unspec:SI [(const_int 0)] UNSPEC_SET_GOT))
(clobber (reg:CC FLAGS_REG))]
   !TARGET_64BIT
@@ -12110,7 +12110,7 @@
(set_attr length 12)])
 
 (define_insn set_got_labelled
-  [(set (match_operand:SI 0 register_operand =r)
+  [(set (match_operand:SI 0 register_operand =b,?r)
(unspec:SI [(label_ref (match_operand 1))]
 UNSPEC_SET_GOT))
(clobber (reg:CC FLAGS_REG))]
@@ -12120,7 +12120,7 @@
(set_attr length 12)])
 
 (define_insn set_got_rex64
-  [(set (match_operand:DI 0 register_operand =r)
+  [(set (match_operand:DI 0 register_operand =b,?r)
(unspec:DI [(const_int 0)] UNSPEC_SET_GOT))]
   TARGET_64BIT
   lea{q}\t{_GLOBAL_OFFSET_TABLE_(%%rip), %0|%0, _GLOBAL_OFFSET_TABLE_[rip]}
@@ -12129,7 +12129,7 @@
(set_attr mode DI)])
 
 (define_insn set_rip_rex64
-  [(set (match_operand:DI 0 register_operand =r)
+  [(set (match_operand:DI 0 register_operand =b,?r)
(unspec:DI [(label_ref (match_operand 1))] UNSPEC_SET_RIP))]
   TARGET_64BIT
   lea{q}\t{%l1(%%rip), %0|%0, %l1[rip]}
@@ -12138,7 +12138,7 @@
(set_attr mode DI)])
 
 (define_insn set_got_offset_rex64
-  [(set (match_operand:DI 0 register_operand =r)
+  [(set (match_operand:DI 0 register_operand =b,?r)
(unspec:DI
  [(label_ref (match_operand 1))]
  UNSPEC_SET_GOT_OFFSET))]


Re: PATCH: PR target/63833: REAL_PIC_OFFSET_TABLE_REGNUM is wrong for x86-64

2014-11-27 Thread Uros Bizjak
On Wed, Nov 12, 2014 at 3:53 PM, H.J. Lu hongjiu...@intel.com wrote:
 We have been using the wrong register to hold GOT in 64-bit large model,
 which is used by the large model PLT.  The only reason we haven't run into
 any problem is linker doesn't support the large model PLT. I am looking
 into linker issue.  This patch corrects REAL_PIC_OFFSET_TABLE_REGNUM for
 64-bit large model.  OK to install?

 Thanks.

 H.J.
 ---
 2014-11-12  H.J. Lu  hongjiu...@intel.com

  PR target/63833
  * config/i386/i386.h (REAL_PIC_OFFSET_TABLE_REGNUM): Use
  R15_REG for 64-bit.
  * config/i386/rdos64.h (REAL_PIC_OFFSET_TABLE_REGNUM): Removed.

OK, the ABI documentation has just been fixed.

Thanks,
Uros.


[PATCH, i386]: Use preferred_for_{size,speed} instead of Yx and Yd register constraints

2014-11-27 Thread Uros Bizjak
Hello!

Yx and Yd register constraints depend on optimize_function_for_speed.
However, optimize_function_for_size is not stable during the
compilation, so this can result in unrecognized insn (as was the case
with *floatSWI48:modeMODEF:mode2_sse insn failure).

The patch uses preferred_for_{size,speed} infrastructure instead to
achieve the same functionality as with Yx and Yd register constraint.

2014-11-27  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (preferred_for_size): New attribute
(*pushxf): Split Yx*r constraints to r,*r.  Use preferred_for_size
attribute to conditionally disable alternative 1.
(*pushdf): Split Yd*r constraints to r,*r.  Use preferred_for_size
and prefered_for_speed attributes to conditionally disable
alternative 1.
(*movxf_internal): Split Yx*r constraints to r,*r.  Use
preferred_for_size attribute to conditionally disable
alternatives 3 and 4.
(*movdf_internal): Split Yd*r constraints to r,*r.  Use
preferred_for_size and prefered_for_speed attributes to conditionally
disable alternatives 3 and 4.
* config/i386/constraints.md (Yd, Yx): Remove register constraints.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32} and was committed to mainline SVN.

Uros.
Index: config/i386/constraints.md
===
--- config/i386/constraints.md  (revision 218129)
+++ config/i386/constraints.md  (working copy)
@@ -105,8 +105,6 @@
 ;;  n  MMX inter-unit moves from MMX register enabled
 ;;  a  Integer register when zero extensions with AND are disabled
 ;;  p  Integer register when TARGET_PARTIAL_REG_STALL is disabled
-;;  d  Integer register when integer DFmode moves are enabled
-;;  x  Integer register when integer XFmode moves are enabled
 ;;  f  x87 register when 80387 floating point arithmetic is enabled
 
 (define_register_constraint Yz TARGET_SSE ? SSE_FIRST_REG : NO_REGS
@@ -137,15 +135,6 @@
   ? NO_REGS : GENERAL_REGS
  @internal Any integer register when zero extensions with AND are disabled.)
 
-(define_register_constraint Yd
- TARGET_INTEGER_DFMODE_MOVES  optimize_function_for_speed_p (cfun)
-  ? GENERAL_REGS : NO_REGS
- @internal Any integer register when integer DFmode moves are enabled.)
-
-(define_register_constraint Yx
- optimize_function_for_speed_p (cfun) ? GENERAL_REGS : NO_REGS
- @internal Any integer register when integer XFmode moves are enabled.)
-
 (define_register_constraint Yf
  (ix86_fpmath  FPMATH_387) ? FLOAT_REGS : NO_REGS
  @internal Any x87 register when 80387 FP arithmetic is enabled.)
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 218129)
+++ config/i386/i386.md (working copy)
@@ -816,6 +816,7 @@
]
(const_int 1)))
 
+(define_attr preferred_for_size  (const_int 1))
 (define_attr preferred_for_speed  (const_int 1))
 
 ;; Describe a user's asm statement.
@@ -2811,8 +2812,8 @@
 })
 
 (define_insn *pushxf
-  [(set (match_operand:XF 0 push_operand =,)
-   (match_operand:XF 1 general_no_elim_operand f,Yx*roF))]
+  [(set (match_operand:XF 0 push_operand =,,,)
+   (match_operand:XF 1 general_no_elim_operand f,r,*r,oF))]
   
 {
   /* This insn should be already split before reg-stack.  */
@@ -2819,14 +2820,18 @@
   gcc_unreachable ();
 }
   [(set_attr type multi)
-   (set_attr unit i387,*)
+   (set_attr unit i387,*,*,*)
(set (attr mode)
-   (cond [(eq_attr alternative 1)
+   (cond [(eq_attr alternative 1,2,3)
 (if_then_else (match_test TARGET_64BIT)
   (const_string DI)
   (const_string SI))
  ]
- (const_string XF)))])
+ (const_string XF)))
+   (set (attr preferred_for_size)
+ (cond [(eq_attr alternative 1)
+  (symbol_ref false)]
+   (symbol_ref true)))])
 
 ;; %%% Kill this when call knows how to work this out.
 (define_split
@@ -2842,18 +2847,26 @@
 })
 
 (define_insn *pushdf
-  [(set (match_operand:DF 0 push_operand =,,,)
-   (match_operand:DF 1 general_no_elim_operand f,Yd*roF,rmF,x))]
+  [(set (match_operand:DF 0 push_operand =,)
+   (match_operand:DF 1 general_no_elim_operand f,r,*r,oF,rmF,x))]
   
 {
   /* This insn should be already split before reg-stack.  */
   gcc_unreachable ();
 }
-  [(set_attr isa *,nox64,x64,sse2)
+  [(set_attr isa *,nox64,nox64,nox64,x64,sse2)
(set_attr type multi)
-   (set_attr unit i387,*,*,sse)
-   (set_attr mode DF,SI,DI,DF)])
-
+   (set_attr unit i387,*,*,*,*,sse)
+   (set_attr mode DF,SI,SI,SI,DI,DF)
+   (set (attr preferred_for_size)
+ (cond [(eq_attr alternative 1)
+  (symbol_ref false)]
+   (symbol_ref true)))
+   (set (attr preferred_for_speed)
+ (cond [(eq_attr alternative 1)
+  (symbol_ref TARGET_INTEGER_DFMODE_MOVES)]
+   (symbol_ref true)))])
+   
 ;; %%% Kill this when call knows how to work this out

Re: RFA: one more version of the patch for PR61360

2014-09-28 Thread Uros Bizjak
On Fri, Sep 26, 2014 at 10:31 PM, Vladimir Makarov vmaka...@redhat.com wrote:
 I guess we achieved the consensus about the following patch to fix PR61360

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

 The patch was successfully bootstrapped and tested (w/wo -march=amdfam10) on
 x86/x86-64.

 Is it ok to commit to trunk?

 2014-09-26  Vladimir Makarov  vmaka...@redhat.com

 PR target/61360
 * lra.c (lra): Remove call of recog_init.
 * recog.c (constrain_operands): Permit reg for memory constraint
 when LRA is used.
 * config/i386/i386.md (*floatSWI48:modeMODEF:mode2_sse):
 Enable first alternative independently on RA stage.

Please also mention that this patch reverts:

2014-04-01  Richard Henderson  r...@redhat.com

PR target/60704
* config/i386/i386.md (*floatSWI48MODEF2_sse): Leave the second
alternative enabled before register allocation.

The x86 part is OK.

BTW: I think that this patch should also be backported to 4.9 branch,
since the original patch and the PR60704 fix were also installed
there.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [57/n] Extend blend/cmp/brodcast insn patterns.

2014-09-29 Thread Uros Bizjak
On Fri, Sep 26, 2014 at 11:04 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 Patch in the bottom extends blend/cmp/brodcast
 insn patterns.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn avx512f_blendmmode): Delete.
 (define_insn avx512_blendmVI48_AVX512VL:mode): New.
 (define_insn avx512_blendmVI12_AVX512VL:mode): Ditto..
 (define_mode_attr cmp_imm_predicate): Add V8SF, V4DF, V8SI, V4DI, 
 V4SF,
 V2DF, V4SI, V2DI, V32HI, V64QI, V16HI, V32QI, V8HI, V16QI modes.
 (define_insn
 avx512f_cmpmode3mask_scalar_merge_nameround_saeonly_name):
 Remove.
 (define_insn
 
 avx512_cmpVI48_AVX512VL:mode3mask_scalar_merge_nameround_saeonly_name):
 New.
 (define_insn
 
 avx512_cmpVI12_AVX512VL:mode3mask_scalar_merge_nameround_saeonly_name):
 Ditto.
 (define_insn mask_codeforavx512f_vec_dupmodemask_name): 
 Delete.
 (define_insn avx512_vec_dupV48_AVX512VL:modemask_name): New.
 (define_insn avx512_vec_dupV12_AVX512VL:modemask_name): Ditto.
 (define_insn mask_codeforavx512f_vec_dup_gprmodemask_name):
 Delete.
 (define_insn
 mask_codeforavx512_vec_dup_gprVI48_AVX512VL:modemask_name):
 New.
 (define_insn
 mask_codeforavx512_vec_dup_gprVI12_AVX512VL:modemask_name):
 Ditto.
 (define_insn·mask_codeforavx512f_vec_dup_memmodemask_name):
 Delete.
 (define_insn
 mask_codeforavx512_vec_dup_memVI48_AVX512VL:modemask_name):
 New.
 (define_insn
 mask_codeforavx512_vec_dup_memVI12_AVX512VL:modemask_name):
 Ditto.

OK with a small fix below.

Thanks,
Uros.

 --
 Thanks, K

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 9edfebc..43d6655 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -954,14 +954,26 @@
 (set_attr memory none,load)
 (set_attr mode sseinsnmode)])

 -(define_insn avx512f_blendmmode
 -  [(set (match_operand:VI48F_512 0 register_operand =v)
 -   (vec_merge:VI48F_512
 - (match_operand:VI48F_512 2 nonimmediate_operand vm)
 - (match_operand:VI48F_512 1 register_operand v)
 +(define_insn avx512_blendmmode
 +  [(set (match_operand:V48_AVX512VL 0 register_operand =v)
 +   (vec_merge:V48_AVX512VL
 + (match_operand:V48_AVX512VL 2 nonimmediate_operand vm)
 + (match_operand:V48_AVX512VL 1 register_operand v)
   (match_operand:avx512fmaskmode 3 register_operand Yk)))]
TARGET_AVX512F
 -  vsseintprefixblendmssemodesuffix\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}
 +  vblendmssemodesuffix\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}
 +  [(set_attr type ssemov)
 +   (set_attr prefix evex)
 +   (set_attr mode sseinsnmode)])
 +
 +(define_insn avx512_blendmmode
 +  [(set (match_operand:VI12_AVX512VL 0 register_operand =v)
 +   (vec_merge:VI12_AVX512VL
 + (match_operand:VI12_AVX512VL 2 nonimmediate_operand vm)
 + (match_operand:VI12_AVX512VL 1 register_operand v)
 + (match_operand:avx512fmaskmode 3 register_operand Yk)))]
 +  TARGET_AVX512BW
 +  vpblendmssemodesuffix\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}
[(set_attr type ssemov)
 (set_attr prefix evex)
 (set_attr mode sseinsnmode)])
 @@ -2467,14 +2479,21 @@
 (set_attr mode ssescalarmode)])

  (define_mode_attr cmp_imm_predicate
 -  [(V16SF const_0_to_31_operand) (V8DF const_0_to_31_operand)
 -  (V16SI const_0_to_7_operand) (V8DI const_0_to_7_operand)])
 -
 -(define_insn avx512f_cmpmode3mask_scalar_merge_nameround_saeonly_name
 +  [(V16SF const_0_to_31_operand)  (V8DF const_0_to_31_operand)
 +   (V16SI const_0_to_7_operand)   (V8DI const_0_to_7_operand)
 +   (V8SF const_0_to_31_operand)   (V4DF const_0_to_31_operand)
 +   (V8SI const_0_to_7_operand)(V4DI const_0_to_7_operand)
 +   (V4SF const_0_to_31_operand)   (V2DF const_0_to_31_operand)
 +   (V4SI const_0_to_7_operand)(V2DI const_0_to_7_operand)
 +   (V32HI const_0_to_7_operand)   (V64QI const_0_to_7_operand)
 +   (V16HI const_0_to_7_operand)   (V32QI const_0_to_7_operand)
 +   (V8HI const_0_to_7_operand)(V16QI const_0_to_7_operand)])
 +
 +(define_insn 
 avx512_cmpmode3mask_scalar_merge_nameround_saeonly_name
[(set (match_operand:avx512fmaskmode 0 register_operand =Yk)
 (unspec:avx512fmaskmode
 - [(match_operand:VI48F_512 1 register_operand v)
 -  (match_operand:VI48F_512 2 round_saeonly_nimm_predicate 
 round_saeonly_constraint)
 + [(match_operand:V48_AVX512VL 1 register_operand v)
 +  (match_operand:V48_AVX512VL 2 nonimmediate_operand 
 round_saeonly_constraint)
(match_operand:SI 3 cmp_imm_predicate n)]
   UNSPEC_PCMP))]
TARGET_AVX512F  round_saeonly_mode512bit_condition
 @@ -2484,6 +2503,20 @@
 (set_attr prefix evex)
 (set_attr mode sseinsnmode)])

 

Re: [PATCH i386 AVX512] [58/n] Add vpmul[u]dq insn patterns.

2014-09-29 Thread Uros Bizjak
On Fri, Sep 26, 2014 at 12:33 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 Patch in the bottom adds support for vpmul[u]dq insn
 patterns.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_expand vec_widen_umult_even_v8simask_name): Add masking.
 (define_insn *vec_widen_umult_even_v8simask_name): Ditto.
 (define_expand vec_widen_umult_even_v4simask_name): Ditto.
 (define_insn *vec_widen_umult_even_v4simask_name): Ditto.
 (define_expand vec_widen_smult_even_v8simask_name): Ditto.
 (define_insn *vec_widen_smult_even_v8simask_name): Ditto.
 (define_expand sse4_1_mulv2siv2di3mask_name): Ditto.
 (define_insn *sse4_1_mulv2siv2di3mask_name): Ditto.
 (define_insn avx512dq_mulmode3mask_name): New.

OK.

Thanks,
Uros.

 --
 Thanks, K

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 43d6655..e52d40c 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -9286,7 +9286,7 @@
 (set_attr prefix evex)
 (set_attr mode XI)])

 -(define_expand vec_widen_umult_even_v8si
 +(define_expand vec_widen_umult_even_v8simask_name
[(set (match_operand:V4DI 0 register_operand)
 (mult:V4DI
   (zero_extend:V4DI
 @@ -9299,29 +9299,30 @@
   (match_operand:V8SI 2 nonimmediate_operand)
   (parallel [(const_int 0) (const_int 2)
  (const_int 4) (const_int 6)])]
 -  TARGET_AVX2
 +  TARGET_AVX2  mask_avx512vl_condition
ix86_fixup_binary_operands_no_copy (MULT, V8SImode, operands);)

 -(define_insn *vec_widen_umult_even_v8si
 -  [(set (match_operand:V4DI 0 register_operand =x)
 +(define_insn *vec_widen_umult_even_v8simask_name
 +  [(set (match_operand:V4DI 0 register_operand =v)
 (mult:V4DI
   (zero_extend:V4DI
 (vec_select:V4SI
 - (match_operand:V8SI 1 nonimmediate_operand %x)
 + (match_operand:V8SI 1 nonimmediate_operand %v)
   (parallel [(const_int 0) (const_int 2)
  (const_int 4) (const_int 6)])))
   (zero_extend:V4DI
 (vec_select:V4SI
 - (match_operand:V8SI 2 nonimmediate_operand xm)
 + (match_operand:V8SI 2 nonimmediate_operand vm)
   (parallel [(const_int 0) (const_int 2)
  (const_int 4) (const_int 6)])]
 -  TARGET_AVX2  ix86_binary_operator_ok (MULT, V8SImode, operands)
 -  vpmuludq\t{%2, %1, %0|%0, %1, %2}
 +  TARGET_AVX2  mask_avx512vl_condition
 +ix86_binary_operator_ok (MULT, V8SImode, operands)
 +  vpmuludq\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}
[(set_attr type sseimul)
 -   (set_attr prefix vex)
 +   (set_attr prefix maybe_evex)
 (set_attr mode OI)])

 -(define_expand vec_widen_umult_even_v4si
 +(define_expand vec_widen_umult_even_v4simask_name
[(set (match_operand:V2DI 0 register_operand)
 (mult:V2DI
   (zero_extend:V2DI
 @@ -9332,28 +9333,29 @@
 (vec_select:V2SI
   (match_operand:V4SI 2 nonimmediate_operand)
   (parallel [(const_int 0) (const_int 2)])]
 -  TARGET_SSE2
 +  TARGET_SSE2  mask_avx512vl_condition
ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);)

 -(define_insn *vec_widen_umult_even_v4si
 -  [(set (match_operand:V2DI 0 register_operand =x,x)
 +(define_insn *vec_widen_umult_even_v4simask_name
 +  [(set (match_operand:V2DI 0 register_operand =x,v)
 (mult:V2DI
   (zero_extend:V2DI
 (vec_select:V2SI
 - (match_operand:V4SI 1 nonimmediate_operand %0,x)
 + (match_operand:V4SI 1 nonimmediate_operand %0,v)
   (parallel [(const_int 0) (const_int 2)])))
   (zero_extend:V2DI
 (vec_select:V2SI
 - (match_operand:V4SI 2 nonimmediate_operand xm,xm)
 + (match_operand:V4SI 2 nonimmediate_operand xm,vm)
   (parallel [(const_int 0) (const_int 2)])]
 -  TARGET_SSE2  ix86_binary_operator_ok (MULT, V4SImode, operands)
 +  TARGET_SSE2  mask_avx512vl_condition
 +ix86_binary_operator_ok (MULT, V4SImode, operands)
@
 pmuludq\t{%2, %0|%0, %2}
 -   vpmuludq\t{%2, %1, %0|%0, %1, %2}
 +   vpmuludq\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}
[(set_attr isa noavx,avx)
 (set_attr type sseimul)
 (set_attr prefix_data16 1,*)
 -   (set_attr prefix orig,vex)
 +   (set_attr prefix orig,maybe_evex)
 (set_attr mode TI)])

  (define_expand vec_widen_smult_even_v16simask_name
 @@ -9401,7 +9403,7 @@
 (set_attr prefix evex)
 (set_attr mode XI)])

 -(define_expand vec_widen_smult_even_v8si
 +(define_expand vec_widen_smult_even_v8simask_name
[(set (match_operand:V4DI 0 register_operand)
 (mult:V4DI
   (sign_extend:V4DI
 @@ -9414,30 +9416,31 @@
   (match_operand:V8SI 2 nonimmediate_operand)
   

Re: [PATCH i386 AVX512] [59/n] Add vptest[n]m, ucmp, cmpeq insn patterns.

2014-09-29 Thread Uros Bizjak
On Fri, Sep 26, 2014 at 12:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 Patch in the bottom adds support for vptest[n]m, ucmp, cmpeq.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c
 (ix86_expand_args_builtin): Handle CODE_FOR_avx512vl_cmpv4di3_mask,
 CODE_FOR_avx512vl_cmpv8si3_mask, CODE_FOR_avx512vl_ucmpv4di3_mask,
 CODE_FOR_avx512vl_ucmpv8si3_mask, CODE_FOR_avx512vl_cmpv2di3_mask,
 CODE_FOR_avx512vl_cmpv4si3_mask, CODE_FOR_avx512vl_ucmpv2di3_mask,
 CODE_FOR_avx512vl_ucmpv4si3_mask.
 * config/i386/sse.md
 (define_insn

Double define_insn here.

 (define_insn avx512f_ucmpmode3mask_scalar_merge_name): Delete.
 avx512_ucmpVI12_AVX512VL:mode3mask_scalar_merge_name):New.
 (define_insn
 avx512_ucmpVI48_AVX512VL:mode3mask_scalar_merge_name):Ditto.
 (define_expand avx512_eqmode3mask_scalar_merge_name): Ditto.
 (define_insn avx512_eqmode3mask_scalar_merge_name_1): Ditto.
 (define_insn avx512_gtmode3mask_scalar_merge_name): Ditto.
 (define_insn avx512_testmmode3mask_scalar_merge_name): Ditto.
 (define_insn avx512_testnmmode3mask_scalar_merge_name): Ditto.

OK.

Thanks,
Uros.

 --
 Thanks, K

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 1aec70f..352ab81 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -34062,6 +34062,14 @@ ix86_expand_args_builtin (const struct 
 builtin_description *d,
   case CODE_FOR_avx512f_cmpv16si3_mask:
   case CODE_FOR_avx512f_ucmpv8di3_mask:
   case CODE_FOR_avx512f_ucmpv16si3_mask:
 + case CODE_FOR_avx512vl_cmpv4di3_mask:
 + case CODE_FOR_avx512vl_cmpv8si3_mask:
 + case CODE_FOR_avx512vl_ucmpv4di3_mask:
 + case CODE_FOR_avx512vl_ucmpv8si3_mask:
 + case CODE_FOR_avx512vl_cmpv2di3_mask:
 + case CODE_FOR_avx512vl_cmpv4si3_mask:
 + case CODE_FOR_avx512vl_ucmpv2di3_mask:
 + case CODE_FOR_avx512vl_ucmpv4si3_mask:
 error (the last argument must be a 3-bit immediate);
 return const0_rtx;

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index e52d40c..625a2e0 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -2517,11 +2517,25 @@
 (set_attr prefix evex)
 (set_attr mode sseinsnmode)])

 -(define_insn avx512f_ucmpmode3mask_scalar_merge_name
 +(define_insn avx512_ucmpmode3mask_scalar_merge_name
[(set (match_operand:avx512fmaskmode 0 register_operand =Yk)
 (unspec:avx512fmaskmode
 - [(match_operand:VI48_512 1 register_operand v)
 -  (match_operand:VI48_512 2 nonimmediate_operand vm)
 + [(match_operand:VI12_AVX512VL 1 register_operand v)
 +  (match_operand:VI12_AVX512VL 2 nonimmediate_operand vm)
 +  (match_operand:SI 3 const_0_to_7_operand n)]
 + UNSPEC_UNSIGNED_PCMP))]
 +  TARGET_AVX512BW
 +  vpcmpussemodesuffix\t{%3, %2, %1, 
 %0mask_scalar_merge_operand4|%0mask_scalar_merge_operand4, %1, %2, %3}
 +  [(set_attr type ssecmp)
 +   (set_attr length_immediate 1)
 +   (set_attr prefix evex)
 +   (set_attr mode sseinsnmode)])
 +
 +(define_insn avx512_ucmpmode3mask_scalar_merge_name
 +  [(set (match_operand:avx512fmaskmode 0 register_operand =Yk)
 +   (unspec:avx512fmaskmode
 + [(match_operand:VI48_AVX512VL 1 register_operand v)
 +  (match_operand:VI48_AVX512VL 2 nonimmediate_operand vm)
(match_operand:SI 3 const_0_to_7_operand n)]
   UNSPEC_UNSIGNED_PCMP))]
TARGET_AVX512F
 @@ -10265,20 +10279,42 @@
 (set_attr prefix vex)
 (set_attr mode OI)])

 -(define_expand avx512f_eqmode3mask_scalar_merge_name
 +(define_expand avx512_eqmode3mask_scalar_merge_name
 +  [(set (match_operand:avx512fmaskmode 0 register_operand)
 +   (unspec:avx512fmaskmode
 + [(match_operand:VI12_AVX512VL 1 register_operand)
 +  (match_operand:VI12_AVX512VL 2 nonimmediate_operand)]
 + UNSPEC_MASKED_EQ))]
 +  TARGET_AVX512BW
 +  ix86_fixup_binary_operands_no_copy (EQ, MODEmode, operands);)
 +
 +(define_expand avx512_eqmode3mask_scalar_merge_name
[(set (match_operand:avx512fmaskmode 0 register_operand)
 (unspec:avx512fmaskmode
 - [(match_operand:VI48_512 1 register_operand)
 -  (match_operand:VI48_512 2 nonimmediate_operand)]
 + [(match_operand:VI48_AVX512VL 1 register_operand)
 +  (match_operand:VI48_AVX512VL 2 nonimmediate_operand)]
   UNSPEC_MASKED_EQ))]
TARGET_AVX512F
ix86_fixup_binary_operands_no_copy (EQ, MODEmode, operands);)

 -(define_insn avx512f_eqmode3mask_scalar_merge_name_1
 +(define_insn avx512_eqmode3mask_scalar_merge_name_1
[(set (match_operand:avx512fmaskmode 0 register_operand =Yk)
 (unspec:avx512fmaskmode
 - 

Re: [PATCH i386 AVX512] [60/n] Update 128bit ashrv insn pattern.

2014-09-29 Thread Uros Bizjak
On Fri, Sep 26, 2014 at 1:13 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This tiny patch extends 128bit ashrv expander.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_mode_iterator VI128_128 [V16QI V8HI V2DI]): Delete.
 (define_expand vashrmode3mask_name): Add masking,
 use VI12_128 mode iterator.
 (define_expand ashrv2di3mask_name): New.

 --
 Thanks, K

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 625a2e0..91d6778 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -498,7 +498,6 @@
  (define_mode_iterator VI12_128 [V16QI V8HI])
  (define_mode_iterator VI14_128 [V16QI V4SI])
  (define_mode_iterator VI124_128 [V16QI V8HI V4SI])
 -(define_mode_iterator VI128_128 [V16QI V8HI V2DI])
  (define_mode_iterator VI24_128 [V8HI V4SI])
  (define_mode_iterator VI248_128 [V8HI V4SI V2DI])
  (define_mode_iterator VI48_128 [V4SI V2DI])
 @@ -15720,17 +15719,36 @@
   (match_operand:VI48_256 2 nonimmediate_operand)))]
TARGET_AVX2)

 -(define_expand vashrmode3
 -  [(set (match_operand:VI128_128 0 register_operand)
 -   (ashiftrt:VI128_128
 - (match_operand:VI128_128 1 register_operand)
 - (match_operand:VI128_128 2 nonimmediate_operand)))]
 -  TARGET_XOP
 +(define_expand vashrmode3mask_name
 +  [(set (match_operand:VI12_128 0 register_operand)
 +   (ashiftrt:VI12_128
 + (match_operand:VI12_128 1 register_operand)
 + (match_operand:VI12_128 2 nonimmediate_operand)))]
 +  TARGET_XOP || (TARGET_AVX512BW  TARGET_AVX512VL)
  {
 -  rtx neg = gen_reg_rtx (MODEmode);
 -  emit_insn (gen_negmode2 (neg, operands[2]));
 -  emit_insn (gen_xop_shamode3 (operands[0], operands[1], neg));
 -  DONE;
 +  if (TARGET_XOP)
 +{
 +  rtx neg = gen_reg_rtx (MODEmode);
 +  emit_insn (gen_negmode2 (neg, operands[2]));
 +  emit_insn (gen_xop_shamode3 (operands[0], operands[1], neg));
 +  DONE;
 +}
 +})
 +
 +(define_expand vashrv2di3mask_name
 +  [(set (match_operand:V2DI 0 register_operand)
 +   (ashiftrt:V2DI
 + (match_operand:V2DI 1 register_operand)
 + (match_operand:V2DI 2 nonimmediate_operand)))]
 +  TARGET_XOP || TARGET_AVX512VL
 +{
 +  if (!TARGET_XOP)

This condition is wrong. Please re-test the patch.

 +{
 +  rtx neg = gen_reg_rtx (V2DImode);
 +  emit_insn (gen_negv2di2 (neg, operands[2]));
 +  emit_insn (gen_xop_shav2di3 (operands[0], operands[1], neg));
 +  DONE;
 +}
  })

  (define_expand vashrv4si3


Re: [PATCH i386 AVX512] [61/n] Update FP logic insn patterns.

2014-09-29 Thread Uros Bizjak
On Fri, Sep 26, 2014 at 2:32 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends andnot and any_logic insn
 patterns.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn sse_andnotVF_128_256:mode3mask_name): Add 
 masking,
 use VF_128_256 mode iterator and update assembler emit code.
 (define_insn sse_andnotVF_512:mode3mask_name): New.
 (define_expand any_logic:codeVF_128_256:mode3mask_name):
 Add masking, use VF_128_256 mode iterator.
 (define_expand any_logic:codeVF_512:mode3mask_name): New.
 (define_insn *any_logic:codeVF_128_256:mode3mask_name):
 Add masking, use VF_128_256 mode iterator and update assembler emit
 code.
 (define_insn *any_logic:codeVF_512:mode3mask_name): New.
 (define_mode_attr avx512flogicsuff): Delete.
 (define_insn avx512f_logicmode): Ditto.
 (define_insn *andnotmode3mask_name): Update MODE_XI, MODE_OI,
 MODE_TI.
 (define_insn mask_codeforcodemode3mask_name): Ditto.

 --
 Thanks, K

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 91d6778..9835234 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -2687,15 +2687,15 @@
  ;;
  ;

 -(define_insn sse_andnotmode3
 -  [(set (match_operand:VF 0 register_operand =x,v)
 -   (and:VF
 - (not:VF
 -   (match_operand:VF 1 register_operand 0,v))
 - (match_operand:VF 2 nonimmediate_operand xm,vm)))]
 -  TARGET_SSE
 +(define_insn sse_andnotmode3mask_name
 +  [(set (match_operand:VF_128_256 0 register_operand =x,v)
 +   (and:VF_128_256
 + (not:VF_128_256
 +   (match_operand:VF_128_256 1 register_operand 0,v))
 + (match_operand:VF_128_256 2 nonimmediate_operand xm,vm)))]
 +  TARGET_SSE  mask_avx512vl_condition
  {
 -  static char buf[32];
 +  static char buf[128];
const char *ops;
const char *suffix;

 @@ -2715,17 +2715,17 @@
ops = andn%s\t{%%2, %%0|%%0, %%2};
break;
  case 1:
 -  ops = vandn%s\t{%%2, %%1, %%0|%%0, %%1, %%2};
 +  ops = vandn%s\t{%%2, %%1, %%0mask_operand3_1|%%0mask_operand3_1, 
 %%1, %%2};
break;
  default:
gcc_unreachable ();
  }

 -  /* There is no vandnp[sd].  Use vpandnq.  */
 -  if (MODE_SIZE == 64)
 +  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
 +  if (mask_applied  !TARGET_AVX512DQ)
  {
 -  suffix = q;
 -  ops = vpandn%s\t{%%2, %%1, %%0|%%0, %%1, %%2};
 +  suffix = GET_MODE_INNER (MODEmode) == DFmode ? q : d;
 +  ops = vpandn%s\t{%%2, %%1, %%0mask_operand3_1|%%0mask_operand3_1, 
 %%1, %%2};
  }

snprintf (buf, sizeof (buf), ops, suffix);
 @@ -2745,30 +2745,63 @@
]
(const_string MODE)))])

 -(define_expand codemode3
 +
 +(define_insn sse_andnotmode3mask_name
 +  [(set (match_operand:VF_512 0 register_operand =v)
 +   (and:VF_512
 + (not:VF_512
 +   (match_operand:VF_512 1 register_operand v))
 + (match_operand:VF_512 2 nonimmediate_operand vm)))]
 +  TARGET_AVX512F
 +{
 +  static char buf[128];
 +  const char *ops;
 +  const char *suffix;
 +
 +  suffix = ssemodesuffix;
 +  ops = ;
 +
 +  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
 +  if (!TARGET_AVX512DQ)

All other patterns also have mask_applied  condition here. Is
the above condition correct?

 +{
 +  suffix = GET_MODE_INNER (MODEmode) == DFmode ? q : d;
 +  ops = p;
 +}
 +
 +  snprintf (buf, sizeof (buf),
 +   v%sandn%s\t{%%2, %%1, %%0mask_operand3_1|%%0mask_operand3_1, 
 %%1, %%2},
 +   ops, suffix);
 +  return buf;
 +}
 +  [(set_attr type sselog)
 +   (set_attr prefix evex)
 +   (set_attr mode sseinsnmode)])
 +
 +(define_expand codemode3mask_name
[(set (match_operand:VF_128_256 0 register_operand)
 -   (any_logic:VF_128_256
 - (match_operand:VF_128_256 1 nonimmediate_operand)
 - (match_operand:VF_128_256 2 nonimmediate_operand)))]
 -  TARGET_SSE
 +   (any_logic:VF_128_256
 + (match_operand:VF_128_256 1 nonimmediate_operand)
 + (match_operand:VF_128_256 2 nonimmediate_operand)))]
 +  TARGET_SSE  mask_avx512vl_condition
ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);)

 -(define_expand codemode3
 +(define_expand codemode3mask_name
[(set (match_operand:VF_512 0 register_operand)
 -   (fpint_logic:VF_512
 +   (any_logic:VF_512
   (match_operand:VF_512 1 nonimmediate_operand)
   (match_operand:VF_512 2 nonimmediate_operand)))]
TARGET_AVX512F
ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);)

 -(define_insn *codemode3
 -  [(set (match_operand:VF 0 register_operand =x,v)
 -   (any_logic:VF
 - (match_operand:VF 1 nonimmediate_operand %0,v)
 -  

Re: [PATCH i386 AVX512] [62/n] Add vpmaddubsw,vdbpsadbw insn patterns.

2014-09-29 Thread Uros Bizjak
On Fri, Sep 26, 2014 at 4:09 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch introduces patterns for vpmaddubsw and vdbpsadbw
 insn.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_c_enum unspec): Add UNSPEC_DBPSADBW, UNSPEC_PMADDUBSW512.
 (define_insn avx512bw_pmaddubsw512modemask_name): New.
 (define_insn mask_codeforavx512bw_dbpsadbwmodemask_name):
 Ditto.
 --
 Thanks, K

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 9835234..601373b 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -130,6 +130,8 @@
UNSPEC_SHA256RNDS2

;; For AVX512BW support
 +  UNSPEC_DBPSADBW
 +  UNSPEC_PMADDUBSW512
UNSPEC_PSHUFHW
UNSPEC_PSHUFLW
UNSPEC_CVTINT2MASK
 @@ -13401,6 +13403,19 @@
 (set_attr prefix vex)
 (set_attr mode OI)])

 +;; Unspec version for intrinsics.
 +(define_insn avx512bw_pmaddubsw512modemask_name
 +  [(set (match_operand:VI2_AVX512VL 0 register_operand =v)
 +  (unspec:VI2_AVX512VL
 +[(match_operand:dbpsadbwmode 1 register_operand v)
 + (match_operand:dbpsadbwmode 2 nonimmediate_operand vm)]
 + UNSPEC_PMADDUBSW512))]
 +   TARGET_AVX512BW
 +   vpmaddubsw\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2};
 +  [(set_attr type sseiadd)
 +   (set_attr prefix evex)
 +   (set_attr mode XI)])
 +

Can the one above be described using standard RTX, perhaps something
similar to avx2_pmaddubsw256?

  (define_insn ssse3_pmaddubsw128
[(set (match_operand:V8HI 0 register_operand =x,x)
 (ss_plus:V8HI
 @@ -18097,6 +18112,21 @@
 [(set_attr prefix evex)
 (set_attr mode ssescalarmode)])

 +(define_insn mask_codeforavx512bw_dbpsadbwmodemask_name
 +  [(set (match_operand:VI2_AVX512VL 0 register_operand =v)
 +   (unspec:VI2_AVX512VL
 + [(match_operand:dbpsadbwmode 1 register_operand v)
 +  (match_operand:dbpsadbwmode 2 nonimmediate_operand vm)
 +  (match_operand:SI 3 const_0_to_255_operand)]
 + UNSPEC_DBPSADBW))]
 +   TARGET_AVX512BW
 +  vdbpsadbw\t{%3, %2, %1, %0mask_operand4|%0mask_operand4, %1, %2, %3}
 +  [(set_attr isa avx)
 +   (set_attr type sselog1)
 +   (set_attr length_immediate 1)
 +   (set_attr prefix evex)
 +   (set_attr mode sseinsnmode)])
 +
  (define_insn clzmode2mask_name
[(set (match_operand:VI48_AVX512VL 0 register_operand =v)
 (clz:VI48_AVX512VL


[PATCH, i386]: Enable reminder{sd,df,xf} and fmod{sf,df,xf} only for flag_finite_math_only.

2014-09-30 Thread Uros Bizjak
Hello!

According to C99, reminder function returns:

   If x or y is a NaN, a NaN is returned.

   If x is an infinity, and y is not a NaN, a domain error occurs,
and a NaN is returned.

   If y is zero, and x is not a NaN, a domain error occurs, and a
NaN is returned.

and fmod returns:

   If x or y is a NaN, a NaN is returned.

   If x is an infinity, a domain error occurs, and a NaN is returned.

   If y is zero, a domain error occurs, and a NaN is returned.

   If x is +0 (-0), and y is not zero, +0 (-0) is returned.

However, x87 fprem and fprem1 instructions that are used to implement
these builtin functions do not return NaN for infinities, but generate
invalid-arithmetic-operand exception.

Attached patch enables these builtins for finite math only, consistent
with gcc documentation:

'-ffinite-math-only':

 Allow optimizations for floating-point arithmetic that assume that
 arguments and results are not NaNs or +-Infs.

 This option is not turned on by any '-O' option since it can result
 in incorrect output for programs that depend on an exact
 implementation of IEEE or ISO rules/specifications for math
 functions.  It may, however, yield faster code for programs that do
 not require the guarantees of these specifications.

2014-09-30  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (fmodxf3): Enable for flag_finite_math_only only.
(fmodmode3): Ditto.
(fpremxf4_i387): Ditto.
(reminderxf3): Ditto.
(remindermode3): Ditto.
(fprem1xf4_i387): Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32}. The patch also fixes ieee_2.f90 testsuite failure with FX's
pending IEEE support improvement patch.

2014-09-30  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (fmodxf3): Enable for flag_finite_math_only only.
(fmodmode3): Ditto.
(fpremxf4_i387): Ditto.
(reminderxf3): Ditto.
(remindermode3): Ditto.
(fprem1xf4_i387): Ditto.

The patch will be committed to mainline and other release branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 215705)
+++ config/i386/i386.md (working copy)
@@ -13813,7 +13813,8 @@
(set (reg:CCFP FPSR_REG)
(unspec:CCFP [(match_dup 2) (match_dup 3)]
 UNSPEC_C2_FLAG))]
-  TARGET_USE_FANCY_MATH_387
+  TARGET_USE_FANCY_MATH_387
+flag_finite_math_only
   fprem
   [(set_attr type fpspc)
(set_attr mode XF)])
@@ -13822,7 +13823,8 @@
   [(use (match_operand:XF 0 register_operand))
(use (match_operand:XF 1 general_operand))
(use (match_operand:XF 2 general_operand))]
-  TARGET_USE_FANCY_MATH_387
+  TARGET_USE_FANCY_MATH_387
+flag_finite_math_only
 {
   rtx_code_label *label = gen_label_rtx ();
 
@@ -13845,7 +13847,8 @@
   [(use (match_operand:MODEF 0 register_operand))
(use (match_operand:MODEF 1 general_operand))
(use (match_operand:MODEF 2 general_operand))]
-  TARGET_USE_FANCY_MATH_387
+  TARGET_USE_FANCY_MATH_387
+flag_finite_math_only
 {
   rtx (*gen_truncxf) (rtx, rtx);
 
@@ -13884,7 +13887,8 @@
(set (reg:CCFP FPSR_REG)
(unspec:CCFP [(match_dup 2) (match_dup 3)]
 UNSPEC_C2_FLAG))]
-  TARGET_USE_FANCY_MATH_387
+  TARGET_USE_FANCY_MATH_387
+flag_finite_math_only
   fprem1
   [(set_attr type fpspc)
(set_attr mode XF)])
@@ -13893,7 +13897,8 @@
   [(use (match_operand:XF 0 register_operand))
(use (match_operand:XF 1 general_operand))
(use (match_operand:XF 2 general_operand))]
-  TARGET_USE_FANCY_MATH_387
+  TARGET_USE_FANCY_MATH_387
+flag_finite_math_only
 {
   rtx_code_label *label = gen_label_rtx ();
 
@@ -13916,7 +13921,8 @@
   [(use (match_operand:MODEF 0 register_operand))
(use (match_operand:MODEF 1 general_operand))
(use (match_operand:MODEF 2 general_operand))]
-  TARGET_USE_FANCY_MATH_387
+  TARGET_USE_FANCY_MATH_387
+flag_finite_math_only
 {
   rtx (*gen_truncxf) (rtx, rtx);
 


Re: [PATCH X86, PR62128] Rotate pattern for AVX2

2014-09-30 Thread Uros Bizjak
On Tue, Sep 30, 2014 at 6:47 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 Patch resubmitted from 
 https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01400.html

 The patch fix PR62128 and  gcc.target/i386/pr52252-atom.c in
 core-avx2 make check.
 The test in pr62128 is exactly TEST 22 from
 gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
 or not.
 The patch developed similar to define_insn_and_split
 *avx_vperm_broadcast_mode.
 The patch passed x86 bootstrap and make check (+2 new passes for
 -march=core-avx2).
 Is it ok?

 Evgeny

 ChangeLog:

 2014-09-30  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/sse.md (avx2_palignrv4di): New.
 * config/i386/sse.md (avx2_rotatemode_perm): New.

+(define_insn avx2_palignrv4di
+  [(set (match_operand:V4DI 0 register_operand =x)
+ (unspec:V4DI
+  [(match_operand:V4DI 1 register_operand x)
+   (match_operand:V4DI 2 nonimmediate_operand xm)
+   (match_operand:SI 3 const_0_to_255_operand n)]
+  UNSPEC_VPALIGNRDI))]
+  TARGET_AVX2
+  vpalignr\t{%3, %2, %1, %0|%0, %1, %2, %3}
+  [(set_attr type sselog)
+   (set_attr prefix vex)
+   (set_attr mode OI)])

Just reuse UNSPEC_PALIGNR, no need for a new unspec.

+(define_insn_and_split avx2_rotatemode_perm
+  [(set (match_operand:V_256 0 register_operand =x)
+  (vec_select:V_256
+ (match_operand:V_256 1 register_operand x)
+ (match_parallel 2 palignr_operand
+  [(match_operand 3 const_int_operand n)])))]
+  TARGET_AVX2
+  #
+   reload_completed
+  [(const_int 0)]

This should be a define_expand. There is nothing that requires hard
registers. You can achieve mode-changes by using gen_lowpart, see many
examples in sse.md

+  if (shift  16)
+ emit_insn (gen_avx2_palignrv4di (op0,
+ op0,
+ op1,
+ GEN_INT (shift)));
+  else if (shift  16)
+ emit_insn (gen_avx2_palignrv4di (op0,
+ op1,
+ op0,
+ GEN_INT (shift - 16)));

What happens when shift == 16?

Uros.


Re: [PATCH X86, PR62128] Rotate pattern for AVX2

2014-09-30 Thread Uros Bizjak
On Tue, Sep 30, 2014 at 7:06 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Sep 30, 2014 at 6:47 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 Patch resubmitted from 
 https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01400.html

 The patch fix PR62128 and  gcc.target/i386/pr52252-atom.c in
 core-avx2 make check.
 The test in pr62128 is exactly TEST 22 from
 gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
 or not.
 The patch developed similar to define_insn_and_split
 *avx_vperm_broadcast_mode.
 The patch passed x86 bootstrap and make check (+2 new passes for
 -march=core-avx2).
 Is it ok?

Please try following (totally untested) expander:

--cut here--
(define_expand avx2_rotatemode_perm
  [(set (match_operand:V_256 0 register_operand)
  (vec_select:V_256
(match_operand:V_256 1 register_operand)
(match_parallel 2 palignr_operand
  [(match_operand 3 const_int_operand n)])))]
  TARGET_AVX2
{
  int shift = INTVAL (operands[3]) * ssescalarsize;
  rtx insn;

  rtx op0 = gen_lowpart (V4DImode, operands[0]);
  rtx op1 = gen_lowpart (V4DImode, operands[1]);

  emit_insn (gen_avx2_permv2ti (op0, op1, op1, GEN_INT (33)));

  op0 = gen_lowpart (V2TImode, operands[0]);
  op1 = gen_lowpart (V2TImode, operands[1]);

  if (shift  GET_MODE_SIZE (TImode))
insn = gen_avx2_palignrv2ti (op0, op0, op1, GEN_INT (shift)));
  else
insn = gen_avx2_palignrv2ti (op0, op1, op0, GEN_INT (shift - 16)));

  emit_insn (insn);
  DONE;
}
--cut here--

BTW: Looking at the code above, it looks to me that avx2_permv2ti
should accept V2TImode operands, not V4DImode.

Uros.


Re: [PATCH X86, PR62128] Rotate pattern for AVX2

2014-09-30 Thread Uros Bizjak
On Tue, Sep 30, 2014 at 8:08 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Sep 30, 2014 at 7:06 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Sep 30, 2014 at 6:47 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 Patch resubmitted from 
 https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01400.html

 The patch fix PR62128 and  gcc.target/i386/pr52252-atom.c in
 core-avx2 make check.
 The test in pr62128 is exactly TEST 22 from
 gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
 or not.
 The patch developed similar to define_insn_and_split
 *avx_vperm_broadcast_mode.
 The patch passed x86 bootstrap and make check (+2 new passes for
 -march=core-avx2).
 Is it ok?

 Please try following (totally untested) expander:

As usual, the wrong version was pasted. This should read:

--cut here--
(define_expand avx2_rotatemode_perm
  [(set (match_operand:V_256 0 register_operand)
  (vec_select:V_256
(match_operand:V_256 1 register_operand)
(match_parallel 2 palignr_operand
  [(match_operand 3 const_int_operand n)])))]
  TARGET_AVX2
{
  int shift = INTVAL (operands[3]) * ssescalarsize;
  rtx insn;

  rtx op1 = gen_lowpart (V4DImode, operands[1]);
  rtx t2 = gen_reg_rtx (V4DImode);

  emit_insn (gen_avx2_permv2ti (t2, op1, op1, GEN_INT (33)));

  op0 = gen_lowpart (V2TImode, operands[0]);
  op1 = gen_lowpart (V2TImode, operands[1]);
  t2 = gen_lowpart (V2TImode, t2);

  if (shift  GET_MODE_SIZE (TImode))
insn = gen_avx2_palignrv2ti (op0, t2, op1, GEN_INT (shift)));
  else
insn = gen_avx2_palignrv2ti (op0, op1, t2, GEN_INT (shift - 16)));

  emit_insn (insn);
  DONE;
}
--cut here--

Uros.


Re: [PATCH X86, PR62128] Rotate pattern for AVX2

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 12:13 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 expand_vselect for some reason ignores the expander.
 Does it work with expanders?
 The comment talks about insn only:
 /* Construct (set target (vec_select op0 (parallel perm))) and
return true if that's a valid instruction in the active ISA.  */

It looks to me that the whole approach is wrong from the beginning.

There is already a function that generates perm/palignr sequence,
conveniently named expand_vec_perm_palignr. This function should be
extended to handle AVX2 sequence. You don't have to add any new
patterns, existing avx2_permv2ti and avx2_palignr2ti should do the
trick.

And, as said by H.J., please add the testcase from the PR that will
exercise the code path. Without the testcase included, the patch is
unreviewable, and this is the reason why no maintainer (including me)
wants to approve it in its current form.

Uros.


Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Getting back to initial patch, is it ok?

IMO, we should start with Jakub's proposed patch [1]

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

Uros.

 It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
 X86 bootstrap is also ok.

 2014-10-01  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
 PALINGR instruction.
 * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
 for AVX2.

 On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 The test in pr62128 is exactly TEST 22 from
 gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
 or not.
 Resubmitting patch looks good as current mail thread is already too 
 complicated.

 On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 It fix gcc.target/i386/pr52252-atom.c in core-avx2 make check and 
 pr62128.


 I suggest you resubmit the patch as a bug fix for pr62128 with
 testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.


 --
 H.J.


Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 1:38 PM, Jakub Jelinek ja...@redhat.com wrote:

 That doesn't compile, will post a new version; got interrupted when
 I found that in
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 one test is miscompiled even with unpatched compiler, debugging that now.

 Let's start with the bugfix.  The || doesn't make any sense, and we really
 want to fill in 4 bits (0, 1, 4, 5) of the immediate, not just two, anyway.
 valid_perm_using_mode_p (V2TImode, d) should already guarantee that
 it is possible to permutate it as V2TI, so all we care about are the
 values of d-perm[0] and d-perm[nelt / 2], but we care not just which
 lane it is, but also which operand (src1 or src2).

 Tested with
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 Ok for trunk/4.9/4.8?

 2014-10-01  Jakub Jelinek  ja...@redhat.com

 PR target/63428
 * config/i386/i386.c (expand_vec_perm_pshufb): Fix up rperm[0]
 argument to avx2_permv2ti.

 * gcc.dg/torture/vshuf-4.inc: Move test 122 from EXPTESTS
 to test 24 in TESTS.

OK.

Thanks,
Uros.


Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 2:17 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Oct 01, 2014 at 01:45:54PM +0200, Uros Bizjak wrote:
 OK.

 Thanks.  Second step is a tiny optimization, for the
 simplified 122 (now 24) vshuf-v4di.c testcase:
 typedef unsigned long long V __attribute__ ((vector_size (32)));
 V a, b, c, d;

 int
 main ()
 {
   int i;
   for (i = 0; i  4; ++i)
 {
   a[i] = i + 2;
   b[i] = 4 + i + 2;
 }
   asm volatile ( : : : memory);
   c = __builtin_shuffle (a, b, (V) { 2, 5, 6, 3 });
   d = __builtin_shuffle ((V) { 2, 3, 4, 5 }, (V) { 6, 7, 8, 9 }, (V) { 2, 5, 
 6, 3 });
   if (__builtin_memcmp (c, d, sizeof (c)))
 __builtin_abort ();
   return 0;
 }

 this patch allows better code to be generated:
 -   vmovdqa b(%rip), %ymm0
 +   vpermq  $238, a(%rip), %ymm1
 movl$32, %edx
 -   movl$d, %esi
 -   vmovdqa a(%rip), %ymm1
 +   vmovdqa b(%rip), %ymm0
 +   movl$d, %esi
 movl$c, %edi
 -   vperm2i128  $17, %ymm0, %ymm1, %ymm1
 vpblendd$195, %ymm1, %ymm0, %ymm0
 vmovdqa %ymm0, c(%rip)

 That is because vperm2i128 $17 unnecessarily uses
 two operands when all the data it grabs are from a single one.
 So, by canonicalizing the permutation we can emit
 vpermq $238 instead.  Perhaps more places might benefit from
 extra canonicalize_perm calls (two spots already use that beyond
 the single one on the expansion/testing entry point).

 Tested again with
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 on x86_64-linux.  Ok for trunk?

 2014-10-01  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (expand_vec_perm_vperm2f128): Canonicalize
 dfirst permutation.

OK.

Thanks,
Uros.


Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote:

 And now the expand_vec_perm_palignr improvement, tested
 with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 E.g.
 typedef unsigned long long V __attribute__ ((vector_size (32)));
 extern void abort (void);
 V a, b, c, d;
 void test_14 (void)
 {
   V mask = { 6, 1, 3, 4 };
   int i;
   c = __builtin_shuffle (a, mask);
   d = __builtin_shuffle (a, b, mask);
 }
 (distilled from test 15 in vshuf-v4di.c) results in:
 -   vmovdqa a(%rip), %ymm0
 -   vpermq  $54, %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vmovdqa %ymm1, c(%rip)
 -   vmovdqa b(%rip), %ymm1
 -   vpshufb .LC0(%rip), %ymm1, %ymm1
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vmovdqa a(%rip), %ymm1
 +   vpermq  $54, %ymm1, %ymm0
 +   vmovdqa %ymm0, c(%rip)
 +   vmovdqa b(%rip), %ymm0
 +   vpalignr$8, %ymm1, %ymm0, %ymm0
 +   vpermq  $99, %ymm0, %ymm0
 vmovdqa %ymm0, d(%rip)
 vzeroupper
 ret
 change (and two fewer .rodata constants).

 Ok for trunk?

 2014-10-01  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (expand_vec_perm_palignr): Handle
 256-bit vectors for TARGET_AVX2.

Please mention PR 62128 and include the testcase from the PR. Also,
please add a version of gcc.target/i386/pr52252-atom.c, compiled with
-mavx2 (perhaps named pr52252-avx2.c)

OK with a small adjustment below.

Thanks,
Uros.

 --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
 +++ gcc/config/i386/i386.c  2014-10-01 14:27:53.577222011 +0200
 @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
rtx shift, target;
struct expand_vec_perm_d dcopy;

 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* Even with AVX, palignr only operates on 128-bit vectors,
 + in AVX2 palignr operates on both 128-bit lanes.  */
 +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +   (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))

Please simplify the above condition ...

  return false;

 -  min = nelt, max = 0;
 +  min = 2 * nelt, max = 0;
for (i = 0; i  nelt; ++i)
  {
unsigned e = d-perm[i];
 +  if (GET_MODE_SIZE (d-vmode) == 32)
 +   e = (e  ((nelt / 2) - 1)) | ((e  nelt)  1);
if (e  min)
 min = e;
if (e  max)
 max = e;
  }
 -  if (min == 0 || max - min = nelt)
 +  if (min == 0
 +  || max - min = (GET_MODE_SIZE (d-vmode) == 32 ? nelt / 2 : nelt))
  return false;

/* Given that we have SSSE3, we know we'll be able to implement the
 - single operand permutation after the palignr with pshufb.  */
 -  if (d-testing_p)
 + single operand permutation after the palignr with pshufb for
 + 128-bit vectors.  */
 +  if (d-testing_p  GET_MODE_SIZE (d-vmode) == 16)
  return true;

dcopy = *d;
 -  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 -  target = gen_reg_rtx (TImode);
 -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 - gen_lowpart (TImode, d-op0), shift));
 -
 -  dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
 -  dcopy.one_operand_p = true;

in_order = true;
for (i = 0; i  nelt; ++i)
  {
 -  unsigned e = dcopy.perm[i] - min;
 +  unsigned e = dcopy.perm[i];
 +  if (GET_MODE_SIZE (d-vmode) == 32
 +  e = nelt
 +  (e  (nelt / 2 - 1))  min)
 +   e = e - min - (nelt / 2);
 +  else
 +   e = e - min;
if (e != i)
 in_order = false;
dcopy.perm[i] = e;
  }
 +  dcopy.one_operand_p = true;
 +
 +  /* For AVX2, test whether we can permute the result in one instruction.  */
 +  if (d-testing_p)
 +{
 +  if (in_order)
 +   return true;
 +  dcopy.op1 = dcopy.op0;
 +  return expand_vec_perm_1 (dcopy);
 +}
 +
 +  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  target = gen_reg_rtx (TImode);
 +  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 + gen_lowpart (TImode, d-op0), shift));
 +}
 +  else
 +{
 +  target = gen_reg_rtx (V2TImode);
 +  emit_insn (gen_avx2_palignrv2ti (target, gen_lowpart (V2TImode, 
 d-op1),
 +  gen_lowpart (V2TImode, d-op0), 
 shift));
 +}
 +
 +  dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);

/* Test for the degenerate case where the alignment by itself
   produces the desired permutation.  */
 @@ -43345,7 +43376,7 @@ expand_vec_perm_palignr (struct expand_v
  }

ok = expand_vec_perm_1 (dcopy);
 -  gcc_assert (ok);
 +  gcc_assert (ok || GET_MODE_SIZE (d-vmode) == 32);

return ok;
  }


  

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 4:12 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Oct 01, 2014 at 03:09:59PM +0200, Uros Bizjak wrote:
  2014-10-01  Jakub Jelinek  ja...@redhat.com
 
  * config/i386/i386.c (expand_vec_perm_palignr): Handle
  256-bit vectors for TARGET_AVX2.

 Please mention PR 62128 and include the testcase from the PR. Also,
 please add a version of gcc.target/i386/pr52252-atom.c, compiled with
 -mavx2 (perhaps named pr52252-avx2.c)

 This patch doesn't fix PR62128, and it is already tested (even without
 GCC_RUN_EXPENSIVE_TESTS=1) in the vshuf*.c torture tests (several of them).

Ah, OK then.

 If you want coverage not just for the default flags, I'd say we should
 say for -O2 only just add gcc.target/i386/{avx,avx2,avx512}-vshuf-*.c
 tests that would include ../../gcc.dg/torture/vshuf-*.c and be compiled/run
 with -mavx/-mavx2/-mavx512 options instead of the default ones.

No, the above should be good for now. The failure was triggered by the
target that defaults to -mavx2, and for avx512f we can run this suite
in the way you suggest.

 For PR62128, IMHO the right fix is attached.  Note, this is again covered
 in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
 With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
 -   vpshufb .LC0(%rip), %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vpermq  $78, %ymm0, %ymm1
 +   vpalignr$1, %ymm0, %ymm1, %ymm0
 ret

  --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
  +++ gcc/config/i386/i386.c  2014-10-01 14:27:53.577222011 +0200
  @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
 rtx shift, target;
 struct expand_vec_perm_d dcopy;
 
  -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
  -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
  +  /* Even with AVX, palignr only operates on 128-bit vectors,
  + in AVX2 palignr operates on both 128-bit lanes.  */
  +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
  +   (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))

 Please simplify the above condition ...

 How?  I don't see how that can be simplified, it can
 be transformed into
   if (!((TARGET_SSSE3  GET_MODE_SIZE (d-vmode) == 16)
 || (TARGET_AVX2  GET_MODE_SIZE (d-vmode) == 32)))
 but I don't find that simpler.

I was thinking of the above, but you are correct, the change doesn't
bring us anything.

So, the patch is OK as it was.

Thanks,
Uros.


Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 4:10 PM, Ilya Enkovich enkovich@gmail.com wrote:

 +;; Return true if size of VALUE can be stored in a sign
 +;; extended immediate field.
 +(define_predicate x86_64_immediate_size_operand
 +  (match_code symbol_ref)
 +{
 +  if (!TARGET_64BIT)
 +return true;
 +
 +  /* For 64 bit target we may assume size of object fits
 + immediate only when code model guarantees that.  */
 +  return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL);
 +})
 +

 This predicate causes bootstrap error:
 predicates.md:362:38: error: unused parameter 'op' [-Werror=unused-parameter]

Huh? How is this predicate different from e.g.

(define_predicate compare_operator
  (match_code compare))

?

Can you please show generated code from gcc/insn-preds.c?

Uros.


Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 7:02 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-10-01 19:17 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Wed, Oct 1, 2014 at 4:10 PM, Ilya Enkovich enkovich@gmail.com wrote:

 +;; Return true if size of VALUE can be stored in a sign
 +;; extended immediate field.
 +(define_predicate x86_64_immediate_size_operand
 +  (match_code symbol_ref)
 +{
 +  if (!TARGET_64BIT)
 +return true;
 +
 +  /* For 64 bit target we may assume size of object fits
 + immediate only when code model guarantees that.  */
 +  return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL);
 +})
 +

 This predicate causes bootstrap error:
 predicates.md:362:38: error: unused parameter 'op' 
 [-Werror=unused-parameter]

 Huh? How is this predicate different from e.g.

 (define_predicate compare_operator
   (match_code compare))

 ?

 Can you please show generated code from gcc/insn-preds.c?

 Uros.

 It is different because it has a code block which is used to generate
 additional function. Here is what generated for the predicate:

 static inline int
 x86_64_immediate_size_operand_1 (rtx op, enum machine_mode mode
 ATTRIBUTE_UNUSED)
 {
   if (!TARGET_64BIT)
 return true;

   /* For 64 bit target we may assume size of object fits
  immediate only when code model guarantees that.  */
   return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL);
 }

 int
 x86_64_immediate_size_operand (rtx op, enum machine_mode mode 
 ATTRIBUTE_UNUSED)
 {
   return ((GET_CODE (op) == SYMBOL_REF)  (
 (x86_64_immediate_size_operand_1 (op, mode  (
 (mode == VOIDmode || GET_MODE (op) == mode));
 }

Well,

--cut here--
(define_predicate x86_64_immediate_size_operand
  (and (match_code symbol_ref)
   (ior (not (match_test TARGET_64BIT))
(and (match_test (ix86_cmodel == CM_SMALL))
 (match_test (ix86_cmodel == CM_KERNEL))
--cut here--

Uros.


Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote:
 OK.

 And now the expand_vec_perm_palignr improvement, tested
 with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 E.g.
 typedef unsigned long long V __attribute__ ((vector_size (32)));
 extern void abort (void);
 V a, b, c, d;
 void test_14 (void)
 {
   V mask = { 6, 1, 3, 4 };
   int i;
   c = __builtin_shuffle (a, mask);
   d = __builtin_shuffle (a, b, mask);
 }
 (distilled from test 15 in vshuf-v4di.c) results in:
 -   vmovdqa a(%rip), %ymm0
 -   vpermq  $54, %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vmovdqa %ymm1, c(%rip)
 -   vmovdqa b(%rip), %ymm1
 -   vpshufb .LC0(%rip), %ymm1, %ymm1
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vmovdqa a(%rip), %ymm1
 +   vpermq  $54, %ymm1, %ymm0
 +   vmovdqa %ymm0, c(%rip)
 +   vmovdqa b(%rip), %ymm0
 +   vpalignr$8, %ymm1, %ymm0, %ymm0
 +   vpermq  $99, %ymm0, %ymm0
 vmovdqa %ymm0, d(%rip)
 vzeroupper
 ret
 change (and two fewer .rodata constants).

On a related note, I would like to point out that
gcc.target/i386/pr61403.c also fails to generate blend insn with
-mavx2. The new insn sequence includes lots of new vpshufb insns with
memory access.

Uros.


Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-02 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 10:43 PM, Jakub Jelinek ja...@redhat.com wrote:

  For PR62128, IMHO the right fix is attached.  Note, this is again covered
  in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
  With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
  -   vpshufb .LC0(%rip), %ymm0, %ymm1
  -   vpshufb .LC1(%rip), %ymm0, %ymm0
  -   vpermq  $78, %ymm1, %ymm1
  -   vpor%ymm1, %ymm0, %ymm0
  +   vpermq  $78, %ymm0, %ymm1
  +   vpalignr$1, %ymm0, %ymm1, %ymm0
  ret
 
  2014-10-01  Jakub Jelinek  ja...@redhat.com
 
PR target/62128
* config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr
if it expands to a single insn only.
(expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument.  If true,
fail unless in_order is true.  Add forward declaration.
(expand_vec_perm_vperm2f128): Fix up comment about which permutation
is useful for one_operand_p.
(ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller.

 Now bootstrapped/regtested on x86_64-linux and i686-linux (and additionally
 tested also with --target_board=unix/-mavx2), ok for trunk?


OK.

Thanks,
Uros.


Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation

2014-10-02 Thread Uros Bizjak
On Thu, Oct 2, 2014 at 10:23 AM, Ilya Enkovich enkovich@gmail.com wrote:

 2014-10-01  Ilya Enkovich  ilya.enkov...@intel.com

 * config/i386/i386.md (UNSPEC_SIZEOF): New.
 (move_size_reloc_mode): New.
 * config/i386/predicates.md (symbol_operand): New.
 (x86_64_immediate_size_operand): New.

OK with a trivial adjustment.

Thanks,
Uros.

 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index 65990b1..1901023 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -79,6 +79,7 @@
UNSPEC_PLTOFF
UNSPEC_MACHOPIC_OFFSET
UNSPEC_PCREL
 +  UNSPEC_SIZEOF

;; Prologue support
UNSPEC_STACK_ALLOC
 @@ -18789,6 +18790,21 @@
bndstx\t{%2, %3|%3, %2}
[(set_attr type mpxst)])

 +(define_insn move_size_reloc_mode
 +  [(set (match_operand:SWI48 0 register_operand =r)
 +   (unspec:SWI48
 +[(match_operand:SWI48 1 symbol_operand)]
 +UNSPEC_SIZEOF))]
 +  TARGET_MPX
 +{
 +  if (x86_64_immediate_size_operand (operands[1], VOIDmode))
 +return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE};
 +  else
 +return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE};
 +}
 +  [(set_attr type imov)
 +   (set_attr mode MODE)])
 +
  (include mmx.md)
  (include sse.md)
  (include sync.md)
 diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
 index fea7754..1875339 100644
 --- a/gcc/config/i386/predicates.md
 +++ b/gcc/config/i386/predicates.md
 @@ -124,6 +124,10 @@
 (match_test TARGET_64BIT)
 (match_test REGNO (op)  BX_REG)))

 +;; Return true if VALUE is symbol reference
 +(define_predicate symbol_operand
 +  (match_code symbol_ref))
 +
  ;; Return true if VALUE can be stored in a sign extended immediate field.
  (define_predicate x86_64_immediate_operand
(match_code const_int,symbol_ref,label_ref,const)
 @@ -336,6 +340,14 @@
return false;
  })

 +;; Return true if size of VALUE can be stored in a sign
 +;; extended immediate field.
 +(define_predicate x86_64_immediate_size_operand
 +  (and (match_code symbol_ref)
 +   (ior (not (match_test TARGET_64BIT))
 +   (ior (match_test (ix86_cmodel == CM_SMALL))
 +(match_test (ix86_cmodel == CM_KERNEL))

Uh yes, I did a trivial thinko. Please note that above (ior) can be
rewritten with a multiple arguments:

(define_predicate x86_64_immediate_size_operand
  (and (match_code symbol_ref)
   (ior (not (match_test TARGET_64BIT))
   (match_test ix86_cmodel == CM_SMALL)
   (match_test ix86_cmodel == CM_KERNEL

Also, there were unneded parenthesis for match_test removed.

  ;; Return true if OP is general operand representable on x86_64.
  (define_predicate x86_64_general_operand
(if_then_else (match_test TARGET_64BIT)


[RFC, RFH PATCH, i386] Fix gcc.target/i386/pr61403.c FAIL with -mavx2

2014-10-02 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 9:03 PM, Uros Bizjak ubiz...@gmail.com wrote:

 And now the expand_vec_perm_palignr improvement, tested
 with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 E.g.
 typedef unsigned long long V __attribute__ ((vector_size (32)));
 extern void abort (void);
 V a, b, c, d;
 void test_14 (void)
 {
   V mask = { 6, 1, 3, 4 };
   int i;
   c = __builtin_shuffle (a, mask);
   d = __builtin_shuffle (a, b, mask);
 }
 (distilled from test 15 in vshuf-v4di.c) results in:
 -   vmovdqa a(%rip), %ymm0
 -   vpermq  $54, %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vmovdqa %ymm1, c(%rip)
 -   vmovdqa b(%rip), %ymm1
 -   vpshufb .LC0(%rip), %ymm1, %ymm1
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vmovdqa a(%rip), %ymm1
 +   vpermq  $54, %ymm1, %ymm0
 +   vmovdqa %ymm0, c(%rip)
 +   vmovdqa b(%rip), %ymm0
 +   vpalignr$8, %ymm1, %ymm0, %ymm0
 +   vpermq  $99, %ymm0, %ymm0
 vmovdqa %ymm0, d(%rip)
 vzeroupper
 ret
 change (and two fewer .rodata constants).

 On a related note, I would like to point out that
 gcc.target/i386/pr61403.c also fails to generate blend insn with
 -mavx2. The new insn sequence includes lots of new vpshufb insns with
 memory access.

Following patch fixes the failure:

--cut here--
Index: i386.c
===
--- i386.c  (revision 215802)
+++ i386.c  (working copy)
@@ -43407,8 +43407,10 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d
  AVX and AVX2 as they require more than 2 instructions.  */
   if (d-one_operand_p)
 return false;
-  if (TARGET_SSE4_1  GET_MODE_SIZE (vmode) == 16)
+  if (TARGET_AVX2  GET_MODE_SIZE (vmode) == 32)
 ;
+  else if (TARGET_SSE4_1  GET_MODE_SIZE (vmode) == 16)
+;
   else
 return false;

--cut here--

The comment above expand_vec_perm_pblendv claims that:

  /* Use the same checks as in expand_vec_perm_blend, but skipping
 AVX and AVX2 as they require more than 2 instructions.  */

However, I see a significant reduction in vpshufb and vpor
instructions (33-16 and 22-11), and 6 new vblendps insns.

BTW: I have no access to avx2 target, so I can't test the patch with a
runtime tests. OTOH, it doesn't ICE for GCC_TEST_RUN_EXPENSIVE=1 make
check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2
dg-torture.exp=vshuf*.c'.

Jakub, what do you think?

Uros.


Re: RFA: one more version of the patch for PR61360

2014-10-03 Thread Uros Bizjak
On Thu, Oct 2, 2014 at 10:13 PM, Vladimir Makarov vmaka...@redhat.com wrote:

 I guess we achieved the consensus about the following patch to fix
 PR61360

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

 The patch was successfully bootstrapped and tested (w/wo
 -march=amdfam10) on x86/x86-64.

 Is it ok to commit to trunk?


 I've tested your patch and unfortunately it doesn't work:

 In file included from
 /var/tmp/moz-build-dir/js/src/shell/Unified_cpp_js_src_shell0.cpp:15:0:
 /var/tmp/mozilla-central/js/src/shell/js.cpp: In function ‘void
 Process(JSContext*, JSObject*, const char*, bool)’:
 /var/tmp/mozilla-central/js/src/shell/js.cpp:592:1: internal compiler
 error: in lra_update_insn_recog_data, at lra.c:1221
   }
   ^
 0xa9d9ec lra_update_insn_recog_data(rtx_insn*)
  ../../gcc/gcc/lra.c:1220
 0xab450f eliminate_regs_in_insn
  ../../gcc/gcc/lra-eliminations.c:1077
 0xab450f process_insn_for_elimination
  ../../gcc/gcc/lra-eliminations.c:1344
 0xab450f lra_eliminate(bool, bool)
  ../../gcc/gcc/lra-eliminations.c:1408
 0xa9f2da lra(_IO_FILE*)
  ../../gcc/gcc/lra.c:2270
 0xa5d659 do_reload
  ../../gcc/gcc/ira.c:5311
 0xa5d659 execute
  ../../gcc/gcc/ira.c:5470


 Testcase is attached:

   % g++ -c -march=amdfam10 -w -O2 js.ii
 js.ii: In function ‘void RunFile(C)’:
 js.ii:64:1: internal compiler error: in lra_update_insn_recog_data, at
 lra.c:1221


 Thanks for reporting this, Marcus.  The problem now is in
 optimize_function_for_size_p.  It is false, when we define and cache enable
 attributes at early stages (instantation of virtual regs) and true later.

 It is returning us to the same problem.  I believe that we should not have
 enable attributes depending on optimization options or on the current
 running pass if we don't want the current solution in the trunk (with
 recog_init).  Setting right value for optimize_function_for_size_p does not
 solve the problem as we can have different options for different functions
 in the same compilation file.

 So minimal solution would be removing optimize_function_for_size_p from the
 attribute definition.  But I guess we could remove all condition.
 Unfortunately, Ganesh did not post is it really beneficial to switch off
 this alternative for AMD CPUs even if AMD optimization guide recommends it.

I propose to split the pattern into size-optimized and normal pattern.
The patch implements this proposal.

Uros.
Index: i386.md
===
--- i386.md (revision 215797)
+++ i386.md (working copy)
@@ -4766,6 +4766,38 @@
 }
 })
 
+(define_insn *floatSWI48:modeMODEF:mode2_sse_size
+  [(set (match_operand:MODEF 0 register_operand =f,x,x)
+   (float:MODEF
+ (match_operand:SWI48 1 nonimmediate_operand m,r,m)))]
+  SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_SSE_MATH
+optimize_function_for_size_p (cfun)
+ @
+   fild%Z1\t%1
+   %vcvtsi2MODEF:ssemodesuffixSWI48:rex64suffix\t{%1, %d0|%d0, %1}
+   %vcvtsi2MODEF:ssemodesuffixSWI48:rex64suffix\t{%1, %d0|%d0, %1}
+  [(set_attr type fmov,sseicvt,sseicvt)
+   (set_attr prefix orig,maybe_vex,maybe_vex)
+   (set_attr mode MODEF:MODE)
+   (set (attr prefix_rex)
+ (if_then_else
+   (and (eq_attr prefix maybe_vex)
+   (match_test SWI48:MODEmode == DImode))
+   (const_string 1)
+   (const_string *)))
+   (set_attr unit i387,*,*)
+   (set_attr athlon_decode *,double,direct)
+   (set_attr amdfam10_decode *,vector,double)
+   (set_attr bdver1_decode *,double,direct)
+   (set_attr fp_int_src true)
+   (set (attr enabled)
+ (cond [(eq_attr alternative 0)
+  (symbol_ref TARGET_MIX_SSE_I387
+X87_ENABLE_FLOAT (MODEF:MODEmode,
+SWI48:MODEmode))
+   ]
+   (const_int 1)))])
+
 (define_insn *floatSWI48:modeMODEF:mode2_sse
   [(set (match_operand:MODEF 0 register_operand =f,x,x)
(float:MODEF
@@ -4795,16 +4827,9 @@
 X87_ENABLE_FLOAT (MODEF:MODEmode,
 SWI48:MODEmode))
 (eq_attr alternative 1)
-  /* ??? For sched1 we need constrain_operands to be able to
- select an alternative.  Leave this enabled before RA.  */
-  (symbol_ref TARGET_INTER_UNIT_CONVERSIONS
-   || optimize_function_for_size_p (cfun)
-   || !(reload_completed
-|| reload_in_progress
-|| lra_in_progress))
+  (symbol_ref TARGET_INTER_UNIT_CONVERSIONS)
]
-   (symbol_ref true)))
-   ])
+   (const_int 1)))])
 
 (define_insn *floatSWI48x:modeMODEF:mode2_i387
   [(set (match_operand:MODEF 0 register_operand =f)


Re: [PATCH] palignr improvement (PR target/62128)

2014-10-03 Thread Uros Bizjak
On Fri, Oct 3, 2014 at 8:52 AM, Jakub Jelinek ja...@redhat.com wrote:

 Tested with
 GCC_TEST_RUN_EXPENSIVE=1 make -k check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 on x86_64-linux, ok for trunk if it passes bootstrap?

 As for the previous testcase with distilled pr52252-atom.c permutations,
 f1/f4 is now vpunpcklbw/vpunpckhbw/vperm2i128, f2 2x vpshufb/vpermq/vpor,
 f3/f5/f6 vperm2i128/vpalignr, suggestions how to improve that?

 2014-10-02  Jakub Jelinek  ja...@redhat.com

   PR target/62128
   * config/i386/i386.c (expand_vec_perm_palignr): If op1, op0 order
   of palignr arguments can't be used due to min 0 or max - min
   too high, try also op0, op1 order of palignr arguments.

   * gcc.dg/torture/vshuf-16.inc (TESTS): Add 2 new permutations.
   * gcc.dg/torture/vshuf-32.inc (TESTS): Add 5 new permutations.

 Now successfully bootstrapped/regtested on x86_64-linux and i686-linux
 (without ada, as that doesn't bootstrap right now, Honza is looking into
 that).  Ok for trunk?

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [60/n] Update 128bit ashrv insn pattern.

2014-10-03 Thread Uros Bizjak
On Fri, Oct 3, 2014 at 12:26 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello Uroš,
 On 29 Sep 09:54, Uros Bizjak wrote:
  +(define_expand vashrv2di3mask_name
  +  [(set (match_operand:V2DI 0 register_operand)
  +   (ashiftrt:V2DI
  + (match_operand:V2DI 1 register_operand)
  + (match_operand:V2DI 2 nonimmediate_operand)))]
  +  TARGET_XOP || TARGET_AVX512VL
  +{
  +  if (!TARGET_XOP)

 This condition is wrong. Please re-test the patch.
 Great catch! Didn't tested whole i386.exp, so XOP tests didn't run.
 Fixed. Patch in the bottom. XOP tests are now pass.
 Is it ok for trunk now?

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [61/n] Update FP logic insn patterns.

2014-10-03 Thread Uros Bizjak
On Fri, Oct 3, 2014 at 12:49 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello Uroš,
 On 29 Sep 10:00, Uros Bizjak wrote:
  +  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
  +  if (!TARGET_AVX512DQ)

 All other patterns also have mask_applied  condition here. Is
 the above condition correct?
 I think this is correct since in this pattern we use AVX-512 only modes
 in iterator, so no chance to emit anything else but EVEX insn.
 In say, previous pattern we use modes are enabled for previous ISA extensions,
 so we emit this hack when masking (AVX-512 specific feature) is used.

Thanks for the explanation, the patch is OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [62/n] Add vpmaddubsw,vdbpsadbw insn patterns.

2014-10-03 Thread Uros Bizjak
On Fri, Oct 3, 2014 at 1:03 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello Uroš,
 On 29 Sep 10:08, Uros Bizjak wrote:
 On Fri, Sep 26, 2014 at 4:09 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
  +(define_insn avx512bw_pmaddubsw512modemask_name
  +  [(set (match_operand:VI2_AVX512VL 0 register_operand =v)
  +  (unspec:VI2_AVX512VL
  +[(match_operand:dbpsadbwmode 1 register_operand v)
  + (match_operand:dbpsadbwmode 2 nonimmediate_operand vm)]
  + UNSPEC_PMADDUBSW512))]
  +   TARGET_AVX512BW
  +   vpmaddubsw\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2};
  +  [(set_attr type sseiadd)
  +   (set_attr prefix evex)
  +   (set_attr mode XI)])
  +
 Can the one above be described using standard RTX, perhaps something
 similar to avx2_pmaddubsw256?
 Definetely, it can. We didn't do that because final pattern will be
 twice as long as 256-bit variant resulting to 96 (!) lines and I think
 in near future auto-vect won't pick MADD at all.
 But if you think that it will be better to have explicit RTX, I ready to do 
 that!

Uh, no. Let's be reasonable, and put only a comment explaining the
situation, as in case of sse2_avx2_psadbw.

Uros.


Re: [RFC, RFH PATCH, i386] Fix gcc.target/i386/pr61403.c FAIL with -mavx2

2014-10-03 Thread Uros Bizjak
On Fri, Oct 3, 2014 at 1:11 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Oct 02, 2014 at 08:34:40PM +0200, Uros Bizjak wrote:
 Index: i386.c
 ===
 --- i386.c  (revision 215802)
 +++ i386.c  (working copy)
 @@ -43407,8 +43407,10 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d
   AVX and AVX2 as they require more than 2 instructions.  */
if (d-one_operand_p)
  return false;
 -  if (TARGET_SSE4_1  GET_MODE_SIZE (vmode) == 16)
 +  if (TARGET_AVX2  GET_MODE_SIZE (vmode) == 32)
  ;
 +  else if (TARGET_SSE4_1  GET_MODE_SIZE (vmode) == 16)
 +;
else
  return false;

 --cut here--

 The comment above expand_vec_perm_pblendv claims that:

   /* Use the same checks as in expand_vec_perm_blend, but skipping
  AVX and AVX2 as they require more than 2 instructions.  */

 The comment is mostly right though, I'd say as they sometimes require
 more than 2 instructions.

 BTW: I have no access to avx2 target, so I can't test the patch with a
 runtime tests. OTOH, it doesn't ICE for GCC_TEST_RUN_EXPENSIVE=1 make
 check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2
 dg-torture.exp=vshuf*.c'.

 Even the expensive testsuite has very limited coverage.
 As I wanted to prove your patch will ICE, I wrote following generator:

 #ifndef ITYPE
 #define ITYPE TYPE
 #endif
 #define S2(X) #X
 #define S(X) S2(X)

 int
 main ()
 {
   int i, j, nelt = 32 / sizeof (TYPE);
   printf (
 typedef  S(TYPE)  V __attribute__ ((vector_size (32)));\n
 typedef  S(ITYPE)  VI __attribute__ ((vector_size (32)));\n
 V a, b, c;\n
 \n
 #define T(n, m...) void foo##n (void) { c = __builtin_shuffle (a, b, (VI) 
 m); }\n
 #define S(n, m...) T(n, m)\n);
   for (i = 0; i  10; i++)
 {
   printf (S (__LINE__, { );
   for (j = 0; j  nelt; j++)
 {
   int k = random ()  3;
   int v = j;
   if (k  1)
 v = ((k  2) ? nelt : 0) + (random ()  (nelt - 1));
   printf (%d%s, v, j  (nelt - 1) ? ,  :  })\n);
 }
 }
 }

 which can be compiled e.g. with
 -DTYPE=char
 -DTYPE=short
 -DTYPE=int
 -DTYPE=long
 -DTYPE=float -DITYPE=int
 -DTYPE=double -DITYPE=long
 and then in each case generate 10 tests (sort -u on it plus manual fixup
 can decrease that, for the V4DI/V4DF cases substantially).  The first one
 triggered almost immediately an ICE, added to vshuf-32.inc (non-expensive).

 With the following updated patch all those generated testcases don't ICE
 (-mavx2 for the first four, -mavx for the last two).

 Also tested with
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'

I have had some problems testing with TARGET_AVX part of the change
and /-mavx tests. I assume that your patch survives these tests.

 The pr61403.c testcase can be simplified into:
 typedef float V __attribute__ ((vector_size (32)));
 typedef int VI __attribute__ ((vector_size (32)));
 V a, b, c;

 #define T(n, m...) void foo##n (void) { c = __builtin_shuffle (a, b, (VI) m); 
 }
 T (0, { 0, 1, 2, 3, 4, 5, 10, 13 })
 T (1, { 0, 1, 2, 3, 4, 8, 11, 14 })
 T (2, { 0, 1, 2, 3, 4, 9, 12, 15 })
 T (3, { 0, 13, 2, 3, 14, 5, 6, 15 })
 T (4, { 0, 1, 8, 3, 4, 9, 6, 7 })
 T (5, { 0, 3, 11, 0, 4, 12, 0, 5 })
 T (6, { 0, 3, 6, 9, 12, 15, 0, 0 })
 T (7, { 0, 8, 0, 1, 9, 0, 2, 10 })
 T (8, { 10, 1, 2, 11, 4, 5, 12, 7 })
 T (9, { 13, 0, 6, 14, 0, 7, 15, 0 })
 T (10, { 1, 4, 7, 10, 13, 0, 0, 0 })
 T (11, { 2, 5, 8, 11, 14, 0, 0, 0 })
 permutations, where both your and my patch optimize
 foo{0,1,2,3,4,8}.

 2014-10-03  Jakub Jelinek  ja...@redhat.com
 Uros Bizjak  ubiz...@gmail.com

 PR tree-optimization/61403
 * config/i386/i386.c (expand_vec_perm_palignr): Fix a spelling
 error in comment.  Also optimize 256-bit vectors for AVX2
 or AVX (floating vectors only), provided the first permutation
 can be performed in one insn.

 * gcc.dg/torture/vshuf-32.inc: Add a new test 29.

OK if the patch bootstraps and regtests without problems.

Thanks,
Uros.


Re: [PATCH] Small pre-AVX512F optimization

2014-10-03 Thread Uros Bizjak
On Fri, Oct 3, 2014 at 4:32 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 I've noticed that expand_vec_perm_1 completely uselessly builds GC garbage
 (CONST_VECTOR at least) when AVX512F isn't enabled at all.

 Ok to just call it for AVX512F?

OK.

 Even better would be to check the modes first too depending on target
 (AVX512F will only handle V{8D,16S}{I,F}mode, AVX512BW would handle
 also V32HImode (not yet implemented?), AVX512VL could handle
 V{2D,4D,4S,8S}{I,F}mode (not yet implemented?, though is there any
 const permutation not handled yet earlier?), and AVX512VL+AVX512BW
 could handle V{8,16}HImode (not yet implemented?) before creating a
 CONST_VECTOR.

AVX512BW and AVX512VL are currently moving targets, I propose to look
at this issue once all changes are committed to the repository.

 2014-10-03  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Fix up 
 formatting.
 (ix86_expand_vec_perm): Only call ix86_expand_vec_perm_vpermi2 if
 TARGET_AVX512F.
 (expand_vec_perm_1): Likewise.

OK.

Thanks,
Uros.


Re: [PATCH] 512-bit gcc.dg/torture/vshuf*.c

2014-10-03 Thread Uros Bizjak
On Fri, Oct 3, 2014 at 4:25 PM, Jakub Jelinek ja...@redhat.com wrote:

 This patch extends the gcc.dg/torture/ testsuite for
 512-bit vectors.  Tested with
 GCC_TEST_RUN_EXPENSIVE=1 make -j32 check-gcc \
 RUNTESTFLAGS='--target_board=unix\{-mavx2,-mavx,-mavx512f,-mavx512bw/-mavx512vl\}
  dg-torture.exp=vshuf*.c'
 (of course with expected AVX512* execution test failures,
 as I don't have hw and didn't bother with simulator).

 As before in vshuf-{8,16,32}.inc, some of the permutations
 are hand written (subset in the TESTS portion), the rest
 are random permutations.

 Ok for trunk?

 2014-10-03  Jakub Jelinek  ja...@redhat.com

 * gcc.dg/torture/vshuf-v8df.c: New test.
 * gcc.dg/torture/vshuf-v8di.c: New test.
 * gcc.dg/torture/vshuf-v16sf.c: New test.
 * gcc.dg/torture/vshuf-v16si.c: New test.
 * gcc.dg/torture/vshuf-v32hi.c: New test.
 * gcc.dg/torture/vshuf-v64qi.c: New test.
 * gcc.dg/torture/vshuf-64.inc: New file.

Please note that -mavx512{bw,vl} are moving target ATM, quite some
patches are still pending.

That said, the patch is OK from x86 side, but a testsuite maintainer
should OK it.

Uros.


[PATCH, RTX]: Additional fix for PR 57003

2014-10-03 Thread Uros Bizjak
Hello!

My r215428 change exposed another PR 57003 problem on x86_64. When
compiling gcc.target/i386/pr57003.c we refer to clobbered %rdi
register after the call to memcpy:

--- pr57003.s   2014-10-03 15:08:24.0 +0200
+++ pr57003_.s  2014-10-03 15:08:19.0 +0200
@@ -78,7 +78,7 @@
leaq-20(%rbx), %rdx
movq%rax, %rdi
callmemcpy
-   movq%rdi, c(%rip)
+   movq%rax, c(%rip)
 .L8:
movaps  (%rsp), %xmm6
movaps  16(%rsp), %xmm7
@@ -321,5 +321,5 @@
.byte   0xb
.align 8
 .LEFDE7:
-   .ident  GCC: (GNU) 5.0.0 20141002 (experimental) [trunk
revision 215797]
+   .ident  GCC: (GNU) 4.9.2 20141001 (prerelease)
[gcc-4_9-branch revision 215749]
.section.note.GNU-stack,,@progbits

The runtime failure happens only on CentOS5 (and not in Fedora20),
which supports findings in Comment #17 of the PR.

The difference is, that now we emit memcpy for MS_ABI-ELF_ABI
cross-ABI call as:

#(call_insn:TI 24 23 27 3 (set (reg:DI 0 ax)
#(call (mem:QI (symbol_ref:DI (memcpy) [flags 0x41]
function_decl 0x7fce6f586438 memcpy) [0 memcpy S1 A8])
#(const_int 0 [0]))) pr57003.c:32 661 {*call_value}
# (expr_list:REG_DEAD (reg:DI 5 di)
#(expr_list:REG_DEAD (reg:DI 4 si)
#(expr_list:REG_DEAD (reg:DI 1 dx)
#(expr_list:REG_UNUSED (reg:DI 0 ax)
#(expr_list:REG_RETURNED (reg/v/f:DI 2 cx [orig:87 e ] [87])
#(expr_list:REG_CALL_DECL (symbol_ref:DI
(memcpy) [flags 0x41] function_decl 0x7fce6f586438 memcpy)
#(expr_list:REG_EH_REGION (const_int 0 [0])
#(nil
#(expr_list (clobber (reg:TI 52 xmm15))
#(expr_list (clobber (reg:TI 51 xmm14))
#(expr_list (clobber (reg:TI 50 xmm13))
#(expr_list (clobber (reg:TI 49 xmm12))
#(expr_list (clobber (reg:TI 48 xmm11))
#(expr_list (clobber (reg:TI 47 xmm10))
#(expr_list (clobber (reg:TI 46 xmm9))
#(expr_list (clobber (reg:TI 45 xmm8))
#(expr_list (clobber (reg:TI 28 xmm7))
#(expr_list (clobber (reg:TI 27 xmm6))
#(expr_list (clobber (reg:DI 5 di))
#(expr_list (clobber
(reg:DI 4 si))
#(expr_list:DI
(set (reg:DI 0 ax)
#(reg:DI 5 di))
#(expr_list:DI
(use (reg:DI 5 di))
#
(expr_list:DI (use (reg:DI 4 si))
#
(expr_list:DI (use (reg:DI 1 dx))
#
(nil))

which is alternate, but equivalent form of what was generated previously:

#(call_insn:TI 24 23 27 3 (parallel [
#(set (reg:DI 0 ax)
#(call (mem:QI (symbol_ref:DI (memcpy) [flags 0x41]
function_decl 0x7fd91824a800 memcpy) [0 memcpy S1 A8])
#(const_int 0 [0])))
#(unspec [
#(const_int 0 [0])
#] UNSPEC_MS_TO_SYSV_CALL)
#(clobber (reg:DI 4 si))
#(clobber (reg:DI 5 di))
#(clobber (reg:TI 27 xmm6))
#(clobber (reg:TI 28 xmm7))
#(clobber (reg:TI 45 xmm8))
#(clobber (reg:TI 46 xmm9))
#(clobber (reg:TI 47 xmm10))
#(clobber (reg:TI 48 xmm11))
#(clobber (reg:TI 49 xmm12))
#(clobber (reg:TI 50 xmm13))
#(clobber (reg:TI 51 xmm14))
#(clobber (reg:TI 52 xmm15))
#]) pr57003.c:32 652 {*call_value_rex64_ms_sysv}
# (expr_list:REG_DEAD (reg:DI 5 di)
#(expr_list:REG_DEAD (reg:DI 4 si)
#(expr_list:REG_DEAD (reg:DI 1 dx)
#(expr_list:REG_RETURNED (reg/v/f:DI 2 cx [orig:87 e ] [87])
#(expr_list:REG_EH_REGION (const_int 0 [0])
#(nil))
#(expr_list:DI (set (reg:DI 0 ax)
#(reg:DI 5 di))
#(expr_list:DI (use (reg:DI 5 di))
#(expr_list:DI (use (reg:DI 4 si))
#(expr_list:DI (use (reg:DI 1 dx))
#(nil))

It looks that Jakub's patch, proposed in Comment #21 doesn't cover
alternative form, so it doesn't record clobbers properly.

Attached patch fixes this omission.

2014-10-03  Uros Bizjak  ubiz...@gmail.com

PR rtl-optimization/57003
* regcprop.c (copyprop_hardreg_forward_1): If ksvd.ignore_set_reg,
also check CALL_INSN_FUNCTION_USAGE for clobbers again after
killing regs_invalidated_by_call.

Tested on x86_64-linux-gnu {,-m32}.

OK for mainline and release branches?

Uros.
Index: regcprop.c
===
--- regcprop.c  (revision 215861)
+++ regcprop.c  (working copy

Re: RFA: one more version of the patch for PR61360

2014-10-04 Thread Uros Bizjak
On Sat, Oct 4, 2014 at 12:49 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Uros Bizjak ubiz...@gmail.com writes:
 On Thu, Oct 2, 2014 at 10:13 PM, Vladimir Makarov vmaka...@redhat.com 
 wrote:

 I guess we achieved the consensus about the following patch to fix
 PR61360

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

 The patch was successfully bootstrapped and tested (w/wo
 -march=amdfam10) on x86/x86-64.

 Is it ok to commit to trunk?


 I've tested your patch and unfortunately it doesn't work:

 In file included from
 /var/tmp/moz-build-dir/js/src/shell/Unified_cpp_js_src_shell0.cpp:15:0:
 /var/tmp/mozilla-central/js/src/shell/js.cpp: In function ‘void
 Process(JSContext*, JSObject*, const char*, bool)’:
 /var/tmp/mozilla-central/js/src/shell/js.cpp:592:1: internal compiler
 error: in lra_update_insn_recog_data, at lra.c:1221
   }
   ^
 0xa9d9ec lra_update_insn_recog_data(rtx_insn*)
  ../../gcc/gcc/lra.c:1220
 0xab450f eliminate_regs_in_insn
  ../../gcc/gcc/lra-eliminations.c:1077
 0xab450f process_insn_for_elimination
  ../../gcc/gcc/lra-eliminations.c:1344
 0xab450f lra_eliminate(bool, bool)
  ../../gcc/gcc/lra-eliminations.c:1408
 0xa9f2da lra(_IO_FILE*)
  ../../gcc/gcc/lra.c:2270
 0xa5d659 do_reload
  ../../gcc/gcc/ira.c:5311
 0xa5d659 execute
  ../../gcc/gcc/ira.c:5470


 Testcase is attached:

   % g++ -c -march=amdfam10 -w -O2 js.ii
 js.ii: In function ‘void RunFile(C)’:
 js.ii:64:1: internal compiler error: in lra_update_insn_recog_data, at
 lra.c:1221


 Thanks for reporting this, Marcus.  The problem now is in
 optimize_function_for_size_p.  It is false, when we define and cache enable
 attributes at early stages (instantation of virtual regs) and true later.

 It is returning us to the same problem.  I believe that we should not have
 enable attributes depending on optimization options or on the current
 running pass if we don't want the current solution in the trunk (with
 recog_init).  Setting right value for optimize_function_for_size_p does not
 solve the problem as we can have different options for different functions
 in the same compilation file.

 So minimal solution would be removing optimize_function_for_size_p from the
 attribute definition.  But I guess we could remove all condition.
 Unfortunately, Ganesh did not post is it really beneficial to switch off
 this alternative for AMD CPUs even if AMD optimization guide recommends it.

 I propose to split the pattern into size-optimized and normal pattern.
 The patch implements this proposal.

 An alternative would be to add two new enabled-like attributes,
 good_for_size and good_for_speed, that say whether it is efficient
 to use a particular alternative.  These attributes wouldn't ever stop
 an existing instruction from being recognised; they would just say
 whether the RA and optimisers should consider the alternative when
 optimising for size or speed respectively.  The attributes would have
 the same restrictions as the enabled attribute and could be cached in
 the same way.

 The enabled attribute would then be purely about whether an
 instruction is available, not whether it's efficient in a particular
 situation.

 The main advantage is that it would be possible to make the size/speed
 choice at a basic block level rather than a function level.  In the worst
 case we might move an instruction from a hot block to a cold block or
 vice versa, but with intelligent spilling that shouldn't happen too
 often and at least it wouldn't trigger an ICE.

 If that sounds OK, I'll try to get something together next week.
 I think Ramana said he had a use for this on ARM too.

I think that this would be way better than duplication of patterns.
Perhaps we can name these attributes enable_for_size and
enable_for_speed, and have them in addition to enable attribute.
The final enable condition would be an union of enable,
enable_for_speed and enable_for_size attributes.

Uros.


Re: [PATCH,i386] Fix adxintrin on mingw.

2014-10-06 Thread Uros Bizjak
On Thu, Oct 2, 2014 at 4:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 Hi,

 sizeof (long) == 4 on windows, so we should use long long as param type.
 Patch below does it.
 Ok for trunk?

 2014-10-02  Ilya Tocar  ilya.to...@intel.com

 * config/i386/adxintrin.h (_subborrow_u64): Use long long for param
 type.
 (_addcarry_u64): Ditto.
 (_addcarryx_u64): Ditto.

OK everywhere.

Thanks,
Uros.


[PATCH, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference

2014-10-08 Thread Uros Bizjak
Hello!

This message revives an old thread [1], where the miscompilation of gfortran
on alpha was found that that resulted in:

FAIL: gfortran.dg/assumed_rank_3.f90:15.20:

print *, ubound(x,dim=3)  !  wrong dim
1
Error: Assumed-rank variable x at (1) may only be used as actual argument

The problem was in the miscompilation of resolve_actual_arglist from
resolve.c.  This function initializes two nearby global bool variables
with the following sequence:

  actual_arg = true;
  first_actual_arg = true;

but due to the miscompilation, the actual_arg was never set to true.
This happened due to the way stores to QImode and HImode locations are
implemented on non-BWX targets. The sequence reads full word, does its
magic to the part and stores the full word with changed part back to
the memory. However - the scheduler mixed two sequences, violating the
atomicity of RMW sequence.

As demostrated in the great detail in the PR [2], the problem is in
early exit  for MEM_READOLNY_P in true_dependence_1 in alias.c. This
early exit declares all MEM_READONLY_P references as non-aliasing,
which is not true when possibly aliasing references with alignment
ANDs are involved.

Proposed patch prevents MEM_READONLY_P memory references to be moved
over possibly aliased memory (with alignment ANDs). The patch prevents
early exit for MEM_READONLY_P references when alignment ANDs are
involved. The aliasing is then determined later in the function. In
effect, it changes early exit to:

-  if (MEM_READONLY_P (x))
-return 0;
+  if (MEM_READONLY_P (x)
+   GET_CODE (x_addr) != AND
+   GET_CODE (mem_addr) != AND)
+return 0;

The comment also mentions ... We don't expect to find read-only set
on MEM, but stupid user tricks can produce them, so don't die.. We
certainly don't die anymore, as confirmed by a native alpha-linux-gnu
(please note - not alphaev6!) bootstrap and regression test.

2014-10-08  Uros Bizjak  ubiz...@gmail.com

* alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P
references when alignment ANDs are involved.

The patch was bootstrapped and regression tested on alpha-linux-gnu.

OK for mainline and release branches?

Please note that the patch by itself is not enough to fix the original
problem with gfortran miscompilation. Another problem in this area is
summarised in PR 63475 [3], where postreload CSE propagates aliased
memory operand.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02251.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475

Uros.
Index: alias.c
===
--- alias.c (revision 215966)
+++ alias.c (working copy)
@@ -2458,18 +2458,6 @@ true_dependence_1 (const_rtx mem, enum machine_mod
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
-  /* Read-only memory is by definition never modified, and therefore can't
- conflict with anything.  We don't expect to find read-only set on MEM,
- but stupid user tricks can produce them, so don't die.  */
-  if (MEM_READONLY_P (x))
-return 0;
-
-  /* If we have MEMs referring to different address spaces (which can
- potentially overlap), we cannot easily tell from the addresses
- whether the references overlap.  */
-  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
-return 1;
-
   if (! mem_addr)
 {
   mem_addr = XEXP (mem, 0);
@@ -2493,6 +2481,22 @@ true_dependence_1 (const_rtx mem, enum machine_mod
}
 }
 
+  /* Read-only memory is by definition never modified, and therefore can't
+ conflict with anything.  However, don't assume anything when AND
+ addresses are involved and leave to the code below to determine
+ dependence.  We don't expect to find read-only set on MEM, but
+ stupid user tricks can produce them, so don't die.  */
+  if (MEM_READONLY_P (x)
+   GET_CODE (x_addr) != AND
+   GET_CODE (mem_addr) != AND)
+return 0;
+
+  /* If we have MEMs referring to different address spaces (which can
+ potentially overlap), we cannot easily tell from the addresses
+ whether the references overlap.  */
+  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
+return 1;
+
   base = find_base_term (x_addr);
   if (base  (GET_CODE (base) == LABEL_REF
   || (GET_CODE (base) == SYMBOL_REF


Re: [PATCH, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference

2014-10-08 Thread Uros Bizjak
On Wed, Oct 8, 2014 at 12:51 PM, Richard Biener rguent...@suse.de wrote:

 This message revives an old thread [1], where the miscompilation of gfortran
 on alpha was found that that resulted in:

[...]

 As said in the audit trail of the bugreport I think that the caller
 of alpha_set_memflags is wrong in applying MEM flags from the _source_
 operand to MEMs generated for the RMW sequence for the destination.

 It would be better to fix that instead.

Please see comment #6 of the referred PR [1] for further analysis and
ammended testcase. The testcase and analysis will show a native read
passing possibly aliasing store.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483#c6

Uros.


Re: [patch] Excessive alignment in ix86_data_alignment

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 On 08 Oct 23:02, Petr Murzin wrote:
 Hi,
 I have measured performance impact on Haswell platform according to this
 input:
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html

What about older processors?

The optimization was introduced well before Haswell for then current
processors, and it was based on the recommendation from Intel
optimization guide. If this optimization doesn't apply for new
processors, then tune option should be introduced and set accordingly.

Uros.


Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html

 (another part of the discussion is around
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )

 Most people who commented seem cautiously in favor. The least favorable was
 Ulrich who suggested to go with it but keep the old behavior accessible if
 the user defines some macro (which imho would lose a large part of the
 simplification benefits of the patch)
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html

 If this is accepted, I will gladly prepare patches removing the unused
 builtins and extending this to a few more operations (integer vectors in
 particular). If this is not the direction we want to go, I'd like to hear it
 clearly so I can move on...

Well, I'm undecided.

The current approach is proven to work OK, there is no bugs reported
in this area and the performance is apparently OK. There should be
clear benefits in order to change something that ain't broken, and
at least some proof that we won't regress in this area with the new
approach.

On the other hand, if the new approach opens new optimization
opportunities (without regression!), I'm in favor of it, including the
fact that new code won't produce equivalent assembly - as long as
functionality of the optimized asm stays the same (obviously, I'd
say).

Please also note that this is quite big project. There are plenty of
intrinsics and I for one don't want another partial transition ...

TL/DR: If there are benefits, no regressions and you think you'll
finish the transition, let's go for it.

Uros.


Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Thu, 9 Oct 2014, Uros Bizjak wrote:

 On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse marc.gli...@inria.fr wrote:

 Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html

 (another part of the discussion is around
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )

 Most people who commented seem cautiously in favor. The least favorable
 was
 Ulrich who suggested to go with it but keep the old behavior accessible
 if
 the user defines some macro (which imho would lose a large part of the
 simplification benefits of the patch)
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html

 If this is accepted, I will gladly prepare patches removing the unused
 builtins and extending this to a few more operations (integer vectors in
 particular). If this is not the direction we want to go, I'd like to hear
 it
 clearly so I can move on...


 Well, I'm undecided.


 First, thanks for answering, it helps me a lot to know what others think.

 The current approach is proven to work OK, there is no bugs reported
 in this area and the performance is apparently OK. There should be
 clear benefits in order to change something that ain't broken, and
 at least some proof that we won't regress in this area with the new
 approach.


 There are quite a few enhancement PRs asking for more performance, but
 indeed no (or very few) complaints about correctness or about gcc turning
 their code into something worse than what they wrote, which I completely
 agree weighs more.

 On the other hand, if the new approach opens new optimization
 opportunities (without regression!), I'm in favor of it, including the
 fact that new code won't produce equivalent assembly - as long as
 functionality of the optimized asm stays the same (obviously, I'd
 say).

 Please also note that this is quite big project. There are plenty of
 intrinsics and I for one don't want another partial transition ...


 That might be an issue : this transition is partial by nature. Many
 intrinsics cannot (easily) be expressed in GIMPLE, and among those that can
 be represented, we only want to change those for which we are confident that
 we will not regress the quality of the code. From the reactions, I would
 assume that we want to be quite conservative at the beginning, and maybe we
 can reconsider some other intrinsics later.

 The best I can offer is consistency: if addition of v2df is changed,
 addition of v4df is changed as well (and say any +-*/ of float/double
 vectors of any supported size). Another block would be +-*/% for integer
 vectors. And construction / access (most construction is already
 builtin-free). And remove the unused builtins in the same patch that makes
 them unused. If you don't like those blocks, I can write one mega-patch that
 does all these, if we roughly agree on the list beforehand, so it goes in
 all at once.

 Would that be good enough?

OK, let's go in the proposed way, more detailed:

- we begin with +-*/ of float/double vectors. IMO, this would result
in a relatively small and easily reviewable patch to iron out the
details of the approach. Alternatively, we can begin with floats only.
- commit the patch and wait for the sky to fall down.
- we play a bit with the compiler to check generated code and corner
cases (some kind of Q/A) and wait if someone finds a problem (say, a
couple of weeks).
- if there are no problems, continue with integer builtins following
the established approach, otherwise we revert everything and go back
to the drawing board.
- repeat the procedure for other builtins.

I propose to wait a couple of days for possible comments before we get
the ball rolling.

Uros.


Re: [PATCH i386 AVX512] [64/n] Add rest of VI1-AVX2: vpack[us]wb.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:09 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch adds rest of vpack instruction patterns.

 Bootstrapped.
 gcc.target/i386.exp tests on top of patch-set show no regressions.
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn sse2_avx2_packsswbmask_name): Add masking.
 (define_insn sse2_avx2_packuswbmask_name): Ditto.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [65/n] Add rest of VI1-AVX2: mul insn pattern.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:19 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This tiny patch extend mulmode insn pattern to support
 masking.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_expand mulmode3mask_name): Add masking.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [66/n] Extend vpalignr insn patterns.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:28 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends vpalignr insn patterns.
 It also introduces dedicated `masked' version of pattern
 w/o substing.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_mode_iterator SSESCALARMODE): Add V4TI mode.
 (define_insn ssse3_avx2_palignrmode_mask): New.
 (define_insn ssse3_avx2_palignrmode): Add EVEX version.

OK, although SSESCALARMODE became even more messy ...

Just FYI: V1TI in VIMAX_AVX2 iterator is used to prevent moves of
TImode values from SSE to general regs on x86_64. The same reasoning
applies to V1DI MMX mode on x86_32.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [67/n] Update constraints in vec_dup insn pattern.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:34 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This tiny patch updates constraints in vec_dup insn
 pattern.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn vec_dupmode): Update constraints.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 1:07 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends vpmullw, vpacksdw and pmaddwd
 insn patterns.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_c_enum unspec): Add UNSPEC_PMADDWD512.
 (define_mode_iterator VI2_AVX2): Add V32HI mode.
 (define_expand mulmode3mask_name): Add masking.
 (define_insn *mulmode3mask_name): Ditto.
 (define_expand smulmode3_highpartmask_name): Ditto.
 (define_insn *smulmode3_highpartmask_name): Ditto.
 (define_insn avx512bw_pmaddwd512modemask_name): New.
 (define_mode_attr SDOT_PMADD_SUF): Ditto.
 (define_expand sdot_prodmode): Add SDOT_PMADD_SUF.
 (define_insn sse2_avx2_packssdwmask_name): Add masking.
 (define_insn *ssse3_avx2_pmulhrswmode3mask_name): Ditto.
 (define_insn avx2_packusdw): Delete.
 (define_insn sse4_1_packusdw): Ditto.
 (define_insn sse4_1_avx2_packusdwmask_name): New.

OK.

 +  TARGET_SSE2
 +ix86_binary_operator_ok (MULT, MODEmode, operands)
 +mask_mode512bit_condition  mask_avx512bw_condition

 Just noticed, that need to swap target check with operads check.

No need to worry for minor issues now, but looking at the sse.md, it
looks to me like a case for a quick cleanup patch to correct these
inconsistencies.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [69/n] Add vpmulhrsw insn support.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 1:12 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch adds support for vpmulhrsw insn.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn avx512bw_umulhrswv32hi3mask_name): New.
 (define_expand ssse3_avx2_pmulhrswmode3_mask): Ditto.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [70/n]

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 1:36 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch further extends maxmin patterns.

You didn't update Subject field ;)

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn *codemode3_finitemask_nameround_saeonly_name):
 Fix pattern conditions order.

No, not yet. As recommended earlier, this change should be part of a
later cleanup patch. Do not mix functional changes and cleanups
together, it makes review and eventual bisections harder.

 (define_insn *sse4_1_codemode3mask_name): Add masking.
 (define_insn *sse4_1_codemode3mask_name0: Ditto.

OK without the part, mentioned above.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [71/n] [Obvious?] Remove redudant iterator attribute.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 1:39 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This obvious patch removes redundant iterator attribute

 Bootstrapped.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md (define_mode_attr avx2_avx512f): Remove.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [72/n] Extend VI itterator.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 1:47 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends VI mode iterator.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c
 (ix86_expand_vector_logical_operator): Handle V16SF and V8DF modes.
 * config/i386/sse.md
 (define_mode_iterator VI): Add V64QI and V32HI modes.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [73/n] Extend reduc min/max autogen.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 1:55 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends pattern for reducation maxmin autogen.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_mode_iterator REDUC_SMINMAX_MODE): Add V64QI and V32HI modes.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [74/n] Add byte/word max/mix reduction.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 2:02 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c
 (emit_reduc_half): Handle V64QI and V32HI mode.
 * config/i386/sse.md
 (define_mode_iterator VI_AVX512BW): New.
 (define_expand reduc_code_mode): Use VI512_48F_12BW.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [75/n] Update vec_init.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 2:13 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends vec_init-related routines/patterns.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c
 (ix86_expand_vector_init_duplicate): Handle V64QI and V32HI modes,
 update V8HI, V16QI, V32QI modes handling.
 (ix86_expand_vector_init_general): Handle V64QI and V32HI modes.
 * config/i386/sse.md
 (define_mode_iterator VI48F_512): Rename to ...
 (define_mode_iterator VI48F_I12_AVX512BW): ... this. Extend to
 AVX-512BW modes.
 (define_expand vec_initmode): Use VI48F_I12_AVX512BW.

LGTM, but I'd like to ask Jakub for his opinion on vec_init stuff.

Uros.


Re: [PATCH i386 AVX512] [76/n] Extend int 2 float conversions.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 5:01 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends autogeneration of SI-2-SF
 conversions.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c
 (ix86_expand_vector_convert_uns_vsivsf): Handle V16SI mode and
 TARGET_AVX512VL.
 --
 Thanks, K

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 7c34431..8a7853e 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -18811,6 +18811,19 @@ ix86_expand_vector_convert_uns_vsivsf (rtx target, 
 rtx val)
enum machine_mode fltmode = GET_MODE (target);
rtx (*cvt) (rtx, rtx);

Please handle this directly in floatunssseintvecmodelowermode2
expander. The V16SImode is already handled from there.

Uros.

 +  if (intmode == V16SImode)
 +{
 +  emit_insn (gen_ufloatv16siv16sf2 (target, val));
 +  return;
 +}
 +  if (TARGET_AVX512VL)
 +{
 +  if (intmode == V4SImode)
 +emit_insn (gen_ufloatv4siv4sf2 (target, val));
 +  else
 +emit_insn (gen_ufloatv8siv8sf2 (target, val));
 +  return;
 +}
if (intmode == V4SImode)
  cvt = gen_floatv4siv4sf2;
else


Re: [PATCH i386 AVX512] [77/n] Use blend for cond-set V32HI and V64QI.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 5:07 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends movcc/vcond autogen.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c
 (ix86_expand_sse_movcc): Handle V64QI and V32HI mode.
 (ix86_expand_int_vcond): Ditto.

OK.

Thanks,
Uros.


Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com wrote:
 It appeared I changed a semantics of BNDMK expand when replaced tree 
 operations with rtl ones.

 Original code:

 +  op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1),
 +   arg1, integer_minus_one_node));
 +  op1 = force_reg (Pmode, op1);

 Modified code:

 +  op1 = expand_normal (arg1);
 +
 +  if (!register_operand (op1, Pmode))
 +   op1 = ix86_zero_extend_to_Pmode (op1);
 +
 +  /* Builtin arg1 is size of block but instruction op1 should
 +be (size - 1).  */
 +  op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
 +op1, 1, OPTAB_DIRECT);

 The problem is that in the fixed version we may modify value of a pseudo 
 register into which arg1 is expanded which means incorrect value for all 
 following usages of arg1.

 Didn't reveal it early because programs surprisingly rarely hit this bug.  I 
 do following change to fix it:

op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
 -op1, 1, OPTAB_DIRECT);
 +NULL, 1, OPTAB_DIRECT);

 Similar problem was also fixed for BNDNARROW.  Does it look OK?

I'm not aware of this type of limitation, and there are quite some
similar constructs in i386.c. It is hard to say without the testcase
what happens, perhaps one of RTX experts (CC'd) can advise what is
recommended here.

Uros.


Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 7:46 PM, Marc Glisse marc.gli...@inria.fr wrote:

 If this is accepted, I will gladly prepare patches removing the unused
 builtins and extending this to a few more operations (integer vectors in
 particular). If this is not the direction we want to go, I'd like to hear it
 clearly so I can move on...


 As we discussed offlist, removing all the builtins would be problematic
 for
 Ada as they are the only medium allowing flexible access to vector
 instructions
 (aside autovectorization) for users.

 Today, the model is very simple: people who want to build on top of vector
 operations just bind to the builtins they need and expose higher level
 interfaces if they like, provided proper type definitions (see g-sse.ads
 for
 example).


 It is sad that this prevents us from removing the builtins, but I agree that
 we can't just drop ada+sse users like that. Well, less work for me if I
 don't have to remove the builtins, and my main motivation is optimization,
 even if I tried to sell the clean up to convince people.

 Uros, is it still ok if I change the intrinsics without removing the
 builtins? (with testcases for HJ and not before Kirill says it is ok)

Given that this will be a substantial work and considering the request
from Kirill, what do you think about separate development branch until
AVXn stuff is finished? This will give a couple of weeks and a
playground to finalize the approach for the conversion. Maybe even ada
can be tested there to not regress with the compatibility stuff.

Uros.


[4.9 PATCH, testsuite]: Fix g++.dg/cpp1y/feat-cxx14.C testsuite errors

2014-10-09 Thread Uros Bizjak
Hello!

2014-10-09  Uros Bizjak  ubiz...@gmail.com

* g++.dg/cpp1y/feat-cxx14.C: Variable templates not in yet.
(dg-do): Use c++1y target.

Tested on x86_64. OK for branch?

Uros.
Index: g++.dg/cpp1y/feat-cxx14.C
===
--- g++.dg/cpp1y/feat-cxx14.C   (revision 216044)
+++ g++.dg/cpp1y/feat-cxx14.C   (working copy)
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++14 } }
+// { dg-do compile { target c++1y } }
 // { dg-options -I${srcdir}/g++.dg/cpp1y -I${srcdir}/g++.dg/cpp1y/testinc }
 
 // Begin C++11 tests.
@@ -125,10 +125,9 @@
 #  error __cpp_aggregate_nsdmi
 #endif
 
-#ifndef __cpp_variable_templates
+//  Variable templates not in yet.
+#ifdef __cpp_variable_templates
 #  error __cpp_variable_templates
-#elif __cpp_variable_templates != 201304
-#  error __cpp_variable_templates != 201304
 #endif
 
 #ifndef __cpp_digit_separators


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.
 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.

Couple of nits below.

+/* Set regs_ever_live for PIC base address register
+   to true if required.  */
+static void
+set_pic_reg_ever_alive ()

Please rename this function to set_pic_reg_ever_live.

-#define PIC_OFFSET_TABLE_REGNUM \
-  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
- || TARGET_PECOFF)) \
-   || !flag_pic ? INVALID_REGNUM \
-   : reload_completed ? REGNO (pic_offset_table_rtx) \
+#define PIC_OFFSET_TABLE_REGNUM \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
+ || TARGET_PECOFF)) \
+   || !flag_pic ? INVALID_REGNUM \
+   : pic_offset_table_rtx ? INVALID_REGNUM \
: REAL_PIC_OFFSET_TABLE_REGNUM)

No negative conditions, please. Also, please follow established
multi-level condition format, please see e.g. HARD_REGNO_NREGS
definition in i386.h.

OK for mainline after infrastructure patch is approved.

Thanks,
Uros.


[PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference

2014-10-10 Thread Uros Bizjak
On Wed, Oct 8, 2014 at 1:56 PM, Uros Bizjak ubiz...@gmail.com wrote:

 This message revives an old thread [1], where the miscompilation of gfortran
 on alpha was found that that resulted in:

 [...]

 As said in the audit trail of the bugreport I think that the caller
 of alpha_set_memflags is wrong in applying MEM flags from the _source_
 operand to MEMs generated for the RMW sequence for the destination.

 It would be better to fix that instead.

 Please see comment #6 of the referred PR [1] for further analysis and
 ammended testcase. The testcase and analysis will show a native read
 passing possibly aliasing store.

Attached v2 patch implements the same approach in all alias.c places
that declare MEM_READONLY_P operands as never aliasing.

2014-10-10  Uros Bizjak  ubiz...@gmail.com

* alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P
references when alignment ANDs are involved.
(write_dependence_p): Ditto.
(may_alias_p): Ditto.

Patch was boostrapped and regression tested on x86_64-linux-gnu and
alpha-linux-gnu. Unfortunately, there are still failures remaining in
gfortran testsuite due to independent RTL infrastructure problem with
VALUEs leaking into aliasing detecting functions [2], [3].

The patch was discussed and OK'd by Richi in the PR audit trail. If
there are no objections from RTL maintainers, I plan to commit it to
the mainline early next week.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483
[2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475

Uros.
Index: alias.c
===
--- alias.c (revision 216025)
+++ alias.c (working copy)
@@ -2458,18 +2458,6 @@ true_dependence_1 (const_rtx mem, enum machine_mod
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
-  /* Read-only memory is by definition never modified, and therefore can't
- conflict with anything.  We don't expect to find read-only set on MEM,
- but stupid user tricks can produce them, so don't die.  */
-  if (MEM_READONLY_P (x))
-return 0;
-
-  /* If we have MEMs referring to different address spaces (which can
- potentially overlap), we cannot easily tell from the addresses
- whether the references overlap.  */
-  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
-return 1;
-
   if (! mem_addr)
 {
   mem_addr = XEXP (mem, 0);
@@ -2493,6 +2481,22 @@ true_dependence_1 (const_rtx mem, enum machine_mod
}
 }
 
+  /* Read-only memory is by definition never modified, and therefore can't
+ conflict with anything.  However, don't assume anything when AND
+ addresses are involved and leave to the code below to determine
+ dependence.  We don't expect to find read-only set on MEM, but
+ stupid user tricks can produce them, so don't die.  */
+  if (MEM_READONLY_P (x)
+   GET_CODE (x_addr) != AND
+   GET_CODE (mem_addr) != AND)
+return 0;
+
+  /* If we have MEMs referring to different address spaces (which can
+ potentially overlap), we cannot easily tell from the addresses
+ whether the references overlap.  */
+  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
+return 1;
+
   base = find_base_term (x_addr);
   if (base  (GET_CODE (base) == LABEL_REF
   || (GET_CODE (base) == SYMBOL_REF
@@ -2576,16 +2580,6 @@ write_dependence_p (const_rtx mem,
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
-  /* A read from read-only memory can't conflict with read-write memory.  */
-  if (!writep  MEM_READONLY_P (mem))
-return 0;
-
-  /* If we have MEMs referring to different address spaces (which can
- potentially overlap), we cannot easily tell from the addresses
- whether the references overlap.  */
-  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
-return 1;
-
   mem_addr = XEXP (mem, 0);
   if (!x_addr)
 {
@@ -2603,6 +2597,21 @@ write_dependence_p (const_rtx mem,
}
 }
 
+  /* A read from read-only memory can't conflict with read-write memory.
+ Don't assume anything when AND addresses are involved and leave to
+ the code below to determine dependence.  */
+  if (!writep
+   MEM_READONLY_P (mem)
+   GET_CODE (x_addr) != AND
+   GET_CODE (mem_addr) != AND)
+return 0;
+
+  /* If we have MEMs referring to different address spaces (which can
+ potentially overlap), we cannot easily tell from the addresses
+ whether the references overlap.  */
+  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
+return 1;
+
   base = find_base_term (mem_addr);
   if (! writep
base
@@ -2690,18 +2699,6 @@ may_alias_p (const_rtx mem, const_rtx x)
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
-  /* Read-only memory is by definition never modified, and therefore can't
- conflict with anything.  We don't expect to find read-only set on MEM,
- but stupid user tricks can

Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 the patch improves performance when previous are applied.
 It makes RTL loop invariant behavior for GOT loads same as it was
 before the 2 previous patches.

 The patch fixes x86 address cost so that cost for addresses with GOT
 register becomes less, how it was before enabling EBX.

 In x86_address_cost the result of “REGNO (parts.base) =
 FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes
 condition result
 the same when parts.base is GOT register (the same for parts.index).

 2014-10-08  Evgeny Stupachenko  evstu...@gmail.com
 * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for
 when address contains GOT register.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index b43e870..9d8cfd1 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode,
 addr_space_t, bool)
  cost++;

Please add a short comment here, explaining the reason for new condition.

if (parts.base
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
 (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
 parts.index
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
 (!REG_P (parts.index) || REGNO (parts.index) = 
 FIRST_PSEUDO_REGISTER)
 parts.base != parts.index)
  cost++;

Otherwise LGTM, but please repost the patch with a comment.

Uros.


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.

Please mention *which* code you removed here.

 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.

This is not New check, but changed one. Please mention *what* changed.

-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+   (!reload_completed || !ix86_use_pseudo_pic_reg ()))

Hm, can you please add a comment for this change?

Uros.


Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 5:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 We've met several performance issues (up to 15%) on Silvermont caused
 by the PARTIAL_REG_DEPENDENCY tuning.
 Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY.

 The patch passed bootstrap, make check.

 Is it ok for trunk?

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 5:47 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 My strong preference would be:
   enum machine_mode maskmode = mode;
   rtx (*gen) (rtx, rtx, rtx, rtx);
 right below the enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
 line and then inside of the first switch just do:
 ...
 case V16SImode:
   if (!TARGET_AVX512F)
   return false;
   gen = gen_avx512f_vpermi2varv16si3;
   break;
 case V4SFmode:
   if (!TARGET_AVX512VL)
   return false;
   gen = gen_avx512vl_vpermi2varv4sf3;
   maskmode = V4SImode;
   break;
 ...
 etc., then in the mask = line use:
 mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d-nelt, vec));
 and finally instead of the second switch do:
   emit_insn (gen (target, op0, force_reg (maskmode, mask), op1));
   return true;

 Updated patch below.

Please recode that horrible first switch statement to:

--cut here--
  rtx (*gen) (rtx, rtx, rtx, rtx) = NULL;

  switch (mode)
{
case V8HImode:
  if (TARGET_AVX512VL  TARGET_AVX152BW)
gen = gen_avx512vl_vpermi2varv8hi3;
  break;

...

case V2DFmode:
  if (TARGET_AVX512VL)
{
  gen = gen_avx512vl_vpermi2varv2df3;
  maskmode = V2DImode;
}
  break;

default:
  break;
}

  if (gen == NULL)
return false;
--cut here--

The patch is OK with the above improvement.

(Please also note that the patch has a bunch of i386.md changes that
will clash with followup patch series).

Thanks,
Uros.


Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI

2014-10-10 Thread Uros Bizjak
On Wed, Oct 1, 2014 at 8:57 PM, Vladimir Makarov vmaka...@redhat.com wrote:

 The problem is in code introduced by Bernd in IRA and caller-saves.c in
 2012.  It is basically an optimization for functions returning always the
 same result as one argument (e.g. memcpy returning 1st argument).

 There are two possible solutions.  First one is to prohibit the
 optimizations when there is a parallel in SET.  Second one is to go deeper
 if the call result is guaranteed in the first element which is true for the
 patch.

I suspect that the first solution will regress
gcc.target/i386/retarg.c on i686 - that testcase test if referred
optimization is effective. All things equal, I think we should go with
the second solution.

 For the first solution, the patch would be

 Index: lra-constraints.c
 ===
 --- lra-constraints.c   (revision 215748)
 +++ lra-constraints.c   (working copy)
 @@ -5348,16 +5348,19 @@
   if (GET_CODE (pat) == PARALLEL)
 pat = XVECEXP (pat, 0, 0);
   dest = SET_DEST (pat);
 - start_sequence ();
 - emit_move_insn (cheap, copy_rtx (dest));
 - restore = get_insns ();
 - end_sequence ();
 - lra_process_new_insns (curr_insn, NULL, restore,
 -Inserting call parameter
 restore);
 - /* We don't need to save/restore of the pseudo from
 -this call.  */
 - usage_insns[regno].calls_num = calls_num;
 - bitmap_set_bit (check_only_regs, regno);
 + if (REG_P (dest))
 +   {
 + start_sequence ();
 + emit_move_insn (cheap, copy_rtx (dest));
 + restore = get_insns ();
 + end_sequence ();
 + lra_process_new_insns (curr_insn, NULL, restore,
 +Inserting call parameter
 restore);
 + /* We don't need to save/restore of the pseudo
 +from this call.  */
 + usage_insns[regno].calls_num = calls_num;
 + bitmap_set_bit (check_only_regs, regno);
 +   }
 }
 }
   to_inherit_num = 0;


 For the second solution, the patch is


 Index: lra-constraints.c
 ===
 --- lra-constraints.c   (revision 215748)
 +++ lra-constraints.c   (working copy)
 @@ -5348,16 +5348,25 @@
   if (GET_CODE (pat) == PARALLEL)
 pat = XVECEXP (pat, 0, 0);
   dest = SET_DEST (pat);
 - start_sequence ();
 - emit_move_insn (cheap, copy_rtx (dest));
 - restore = get_insns ();
 - end_sequence ();
 - lra_process_new_insns (curr_insn, NULL, restore,
 -Inserting call parameter
 restore);
 - /* We don't need to save/restore of the pseudo from
 -this call.  */
 - usage_insns[regno].calls_num = calls_num;
 - bitmap_set_bit (check_only_regs, regno);
 + if (GET_CODE (dest) == PARALLEL)
 +   {
 + dest = XVECEXP (dest, 0, 0);
 + if (GET_CODE (dest) == EXPR_LIST)
 +   dest = XEXP (dest, 0);
 +   }
 + if (REG_P (dest))
 +   {
 + start_sequence ();
 + emit_move_insn (cheap, copy_rtx (dest));
 + restore = get_insns ();
 + end_sequence ();
 + lra_process_new_insns (curr_insn, NULL, restore,
 +Inserting call parameter
 restore);
 + /* We don't need to save/restore of the pseudo from
 +this call.  */
 + usage_insns[regno].calls_num = calls_num;
 + bitmap_set_bit (check_only_regs, regno);
 +   }
 }
 }


 The first patch is safer but the second one is ok too.  I have no particular
 preferences.  Whatever we choose, analogous code in caller-saves.c should be
 changed too.

Uros.


Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 7:29 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-10-10 21:10 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Wed, Oct 1, 2014 at 8:57 PM, Vladimir Makarov vmaka...@redhat.com wrote:

 The problem is in code introduced by Bernd in IRA and caller-saves.c in
 2012.  It is basically an optimization for functions returning always the
 same result as one argument (e.g. memcpy returning 1st argument).

 There are two possible solutions.  First one is to prohibit the
 optimizations when there is a parallel in SET.  Second one is to go deeper
 if the call result is guaranteed in the first element which is true for the
 patch.

 I suspect that the first solution will regress
 gcc.target/i386/retarg.c on i686 - that testcase test if referred
 optimization is effective. All things equal, I think we should go with
 the second solution.

 The first solutions is in trunk since October 3
 (https://gcc.gnu.org/ml/gcc-cvs/2014-10/msg00094.html) and I don't see
 such fail.  Patch actually just checks for a case which never occurs
 right now and therefore should be quite safe.

True, but after MPX patches are committed, PARALLELs will be passed as
call targets. I wonder if the testcase fails then.

Uros.


Re: [PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 7:25 PM, Jeff Law l...@redhat.com wrote:

 This message revives an old thread [1], where the miscompilation of
 gfortran
 on alpha was found that that resulted in:


 [...]

 As said in the audit trail of the bugreport I think that the caller
 of alpha_set_memflags is wrong in applying MEM flags from the _source_
 operand to MEMs generated for the RMW sequence for the destination.

 It would be better to fix that instead.


 Please see comment #6 of the referred PR [1] for further analysis and
 ammended testcase. The testcase and analysis will show a native read
 passing possibly aliasing store.


 Attached v2 patch implements the same approach in all alias.c places
 that declare MEM_READONLY_P operands as never aliasing.

 2014-10-10  Uros Bizjak  ubiz...@gmail.com

  * alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P
  references when alignment ANDs are involved.
  (write_dependence_p): Ditto.
  (may_alias_p): Ditto.

 Patch was boostrapped and regression tested on x86_64-linux-gnu and
 alpha-linux-gnu. Unfortunately, there are still failures remaining in
 gfortran testsuite due to independent RTL infrastructure problem with
 VALUEs leaking into aliasing detecting functions [2], [3].

 The patch was discussed and OK'd by Richi in the PR audit trail. If
 there are no objections from RTL maintainers, I plan to commit it to
 the mainline early next week.

 [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483
 [2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html
 [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475

 No objection.  In fact, after reading everything it's pretty obvious to me
 that a /u MEM must be considered as potentially conflicting with writes that
 are implemented as RMW sequences to deal with the lack of byte access
 support.

Thanks, I went ahead and commit the patch to SVN mainline. I wonder,
if they should be also committed to release branches?

 The escaping VALUE stuff is still in my queue.

Great, I can test them on alpha native, there are many gfortran
testsuite failures due to this problem.

Thanks,
Uros.


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 5:01 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
-#define PIC_OFFSET_TABLE_REGNUM \
-  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
- || TARGET_PECOFF)) \
-   || !flag_pic ? INVALID_REGNUM \
-   : reload_completed ? REGNO (pic_offset_table_rtx) \
+#define PIC_OFFSET_TABLE_REGNUM \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
+ || TARGET_PECOFF)) \
+   || !flag_pic ? INVALID_REGNUM \
+   : pic_offset_table_rtx ? INVALID_REGNUM \
: REAL_PIC_OFFSET_TABLE_REGNUM)

No negative conditions, please. Also, please follow established
multi-level condition format, please see e.g. HARD_REGNO_NREGS
definition in i386.h.
 I don't see how we can avoid negative condition here. If we remove
 not from !flag_pic we'll need to add not to TARGET_64BIT and
 TARGET_PECOFF.

 I've done it this way:
 +#define PIC_OFFSET_TABLE_REGNUM
  \
 +  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
 + || TARGET_PECOFF))
  \
 +   || !flag_pic
  \
 +   ? INVALID_REGNUM\
 +   : pic_offset_table_rtx  \
 + ? INVALID_REGNUM  \
 + : REAL_PIC_OFFSET_TABLE_REGNUM)

 Is it ok?

Oh, indeed. I missed the logical or. Maybe put the first condition
into parenthesis, to avoid confusion.

OK in any case.

Thanks,
Uros.


Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 5:17 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Patch updated with the comment:

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 2a64d2d..5fd6a82 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12455,9 +12455,18 @@ ix86_address_cost (rtx x, enum machine_mode,
 addr_space_t, bool)
   || REGNO (parts.index) = FIRST_PSEUDO_REGISTER)))
  cost++;

 +  /* When address base or index is pic_offset_table_rtx we don't increase
 + address cost.  When a memop with pic_offset_table_rtx is not invariant
 + itself it most likely means that base or index is not invariant.
 + Therefore only pic_offset_table_rtx could be hoisted out, which is not
 + profitable for x86.  */
if (parts.base
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
 (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
 parts.index
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
 (!REG_P (parts.index) || REGNO (parts.index) = 
 FIRST_PSEUDO_REGISTER)
 parts.base != parts.index)
  cost++;

LGTM.

OK.

Thanks,
Uros.


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

Reversed patch was attached. Please repost.

Uros.


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 6:32 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Reattached.

 On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

 Reversed patch was attached. Please repost.

OK.

Thanks,
Uros.


[PATCH, rtl-optimization] Fix PR63475, Postreload CSE propagates aliased memory operand

2014-10-14 Thread Uros Bizjak
Hello!

Attached patch fixes PR63475, where postreload CSE propagates aliased
memory operand.

The core of the problem was with the call to base_alias_check when
VALUE RTXes are involved. Before the call, find_base_term is used to
extract the base of x_addr and mem_addr. Please note that
find_base_term is able to extract the bases from VALUE RTXes. These
extracted bases were passed to base_alias_check, together with
original VALUE RTXes x_addr and mem_addr.

The problem begins here. base_alias_check doesn't handle VALUE RTXes,
and uses e.g. canon_rtx on VALUEs and various GET_CODE accessors to
determine various properties of passed x_addr and mem_addr. One of
these check checks for the AND alignment addresses to prevent:

  /* Differing symbols not accessed via AND never alias.  */
  if (GET_CODE (x_base) != ADDRESS  GET_CODE (y_base) != ADDRESS)
return 0;

early exit. However, when x and y are passed as VALUE RTXes (that
corresponds and hides the address with AND), and preceding calls to
find_base_term are nevertheless able to extract the bases of x and y,
this condition fires erroneously and invalid return value is returned
(with 0 meaning that the addresses X and Y are known to point to
different objects).

The solution is to always extract values for x_addr and mem_addr and
use them in the calls to find_base_term and base_alias_check.

[It can happen that get_addr is not able to match VALUE RTX with some
address, so it is not possible to simply add a bunch of GET_CODE (x)
!= VALUE asserts in base_alias_check. But in this case find_base_term
returns ADDRESS RTX, so we stay in sync as far as base_alias_check is
concerned (see the quoted code above).]

Added benefit of the patch is, that canon_rtx now works as expected.
canon_rtx does NOT handle VALUE RTXes.

A small optimization is also present. If the address is already
canonicalized, we pass original address to memrefs_conflict_p, but we
have to extract original address for preceding functions nevertheless.
Also, we use extracted original address in recently added check for
AND aligned addresses when checking for MEM_READONLY_P.

The patch also removes a couple of unneeded and unused calls to
canon_rtx, also to show the level of bitrot in this area ...

2014-10-14  Uros Bizjak  ubiz...@gmail.com

PR rtl-optimization/63475
* alias.c (true_dependence_1): Always use get_addr to extract
true address operands from x_addr and mem_addr.  Use extracted
address operands to check for references with alignment ANDs.
Use extracted address operands with find_base_term and
base_alis_check. For noncanonicalized operands call canon_rtx with
extracted address operand.
(write_dependence_1): Ditto.
(may_alias_p): Ditto.  Remove unused calls to canon_rtx.

Patch was thoroughly tested on x86_64-linux-gnu {,-m32} and
alpha-linux-gnu for all default languages plus obj-c++ and go. While
there was no differences on x86_64-linux-gnu (as expected),
alpha-linux-gnu improved the result [1] for some hundred of PASSes in
gfortran testsuite [2].

OK for mainline?

[1] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01151.html
[2] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01478.html

Uros.
Index: alias.c
===
--- alias.c (revision 216149)
+++ alias.c (working copy)
@@ -2439,6 +2439,7 @@ static int
 true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr,
   const_rtx x, rtx x_addr, bool mem_canonicalized)
 {
+  rtx true_mem_addr;
   rtx base;
   int ret;
 
@@ -2458,6 +2459,10 @@ true_dependence_1 (const_rtx mem, enum machine_mod
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
+  if (! x_addr)
+x_addr = XEXP (x, 0);
+  x_addr = get_addr (x_addr);
+
   if (! mem_addr)
 {
   mem_addr = XEXP (mem, 0);
@@ -2464,23 +2469,8 @@ true_dependence_1 (const_rtx mem, enum machine_mod
   if (mem_mode == VOIDmode)
mem_mode = GET_MODE (mem);
 }
+  true_mem_addr = get_addr (mem_addr);
 
-  if (! x_addr)
-{
-  x_addr = XEXP (x, 0);
-  if (!((GET_CODE (x_addr) == VALUE
- GET_CODE (mem_addr) != VALUE
- reg_mentioned_p (x_addr, mem_addr))
-   || (GET_CODE (x_addr) != VALUE
-GET_CODE (mem_addr) == VALUE
-reg_mentioned_p (mem_addr, x_addr
-   {
- x_addr = get_addr (x_addr);
- if (! mem_canonicalized)
-   mem_addr = get_addr (mem_addr);
-   }
-}
-
   /* Read-only memory is by definition never modified, and therefore can't
  conflict with anything.  However, don't assume anything when AND
  addresses are involved and leave to the code below to determine
@@ -2488,7 +2478,7 @@ true_dependence_1 (const_rtx mem, enum machine_mod
  stupid user tricks can produce them, so don't die.  */
   if (MEM_READONLY_P (x)
GET_CODE (x_addr) != AND
-   GET_CODE (mem_addr

Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.

2014-10-14 Thread Uros Bizjak
On Tue, Oct 14, 2014 at 9:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello Uroš,
 It seems like I missed to post uppdated patch.
 On 25 Sep 20:11, Uros Bizjak wrote:
 I'd rather go with the second approach, it is less confusing from the
 maintainer POV. All other patterns with masking use some consistent
 template, so I'd suggest using the same approach for everything. If it
 is indeed too many patterns, then please split the patch to smaller
 pieces.
 Goal was not to decrease size of the patch, I wanted to make pattern look
 simpler by hiding masking stuff beyond `subst'.
 Anyway, I've updated the patch.

 Here it is (bootstrapped and regtested).

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md (define_mode_iterator VI_AVX2): Extend
 to support AVX-512BW.
 (define_mode_iterator VI124_AVX2_48_AVX512F): Remove.
 (define_expand plusminus_insnmode3): Remove masking support.
 (define_insn *plusminus_insnmode3): Ditto.
 (define_expand plusminus_insnVI48_AVX512VL:mode3_mask): New.
 (define_expand plusminus_insnVI12_AVX512VL:mode3_mask): Ditto.
 (define_insn *plusminus_insnVI48_AVX512VL:mode3_mask): Ditto.
 (define_insn *plusminus_insnVI12_AVX512VL:mode3_mask): Ditto.
 (define_expand sse2_avx2_andnotmode3): Remove masking support.
 (define_insn *andnotmode3): Ditto.
 (define_expand sse2_avx2_andnotVI48_AVX512VL:mode3_mask): New.
 (define_expand sse2_avx2_andnotVI12_AVX512VL:mode3_mask): Ditto.
 (define_insn *andnotVI48_AVX512VL:mode3mask_name): Ditto.
 (define_insn *andnotVI12_AVX512VL:mode3mask_name): Ditto.
 (define_insn *absmode2): Remove masking support.
 (define_insn absVI48_AVX512VL:mode2_mask): New.
 (define_insn absVI12_AVX512VL:mode2_mask): Ditto.
 (define_expand absmode2): Use VI_AVX2 mode iterator.

IMO, it seems much more readable this way.

OK for mainline.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [76/n] Extend int 2 float conversions.

2014-10-15 Thread Uros Bizjak
On Wed, Oct 15, 2014 at 3:37 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index 7c34431..8a7853e 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -18811,6 +18811,19 @@ ix86_expand_vector_convert_uns_vsivsf (rtx 
  target, rtx val)
 enum machine_mode fltmode = GET_MODE (target);
 rtx (*cvt) (rtx, rtx);

 Please handle this directly in floatunssseintvecmodelowermode2
 expander. The V16SImode is already handled from there.
 Done.

 Bootstrapped.

 gcc/
 * config/i386/sse.md
 (define_expand floatunssseintvecmodelowermode2): Extend to
 support AVX-512VL instructions.

 Is it ok for main trunk?

OK.

Thanks,
Uros.


[PATCH, i386]: Fix PR 59432, sync/atomic FAILs on 32bit x86 systems without .cfi directives

2014-10-15 Thread Uros Bizjak
Hello!

Now that %ebx is no more fixed, we can remove all PIC related
complications in atomic_compare_and_swapdwi_doubleword pattern. The
immediate consequence is, that we avoid hidden xchgs that clobbered
unwinding state.

Earlier fix by Ian [1] partly solved this issue using various .cfi
directives to fixup the mess, but these were not available on the
systems without .cfi directives (e.g. Centos 5 and Solaris).

The patch fixes this problem for good by removing problematic
alternative that tried to skip %ebx allocations.

2014-10-15  Uros Bizjak  ubiz...@gmail.com

PR go/59432
* config/i386/sync.md (atomic_compare_and_swapdwi_doubleword):
Remove the second alternative.
(regprefix): Remove mode attribute.
(atomic_compare_and_swapmode): Do not fixup operand 2.
* config/i386/predicates.md (cmpxchg8b_pic_memory_operand): Remove.

Revert:
2013-11-05  Ian Lance Taylor  i...@google.com

* config/i386/sync.md (atomic_compare_and_swapdwi_doubleword):
If possible, add .cfi directives to record change to bx.
* config/i386/i386.c (ix86_emit_cfi): New function.
* config/i386/i386-protos.h (ix86_emit_cfi): Declare.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32} on Fedora 20 and Centos 5.11, where fixes Go sync/atomic
failure on 32bit targets.

Patch was committed to mainline SVN.

Uros
Index: predicates.md
===
--- predicates.md   (revision 216245)
+++ predicates.md   (working copy)
@@ -1092,43 +1092,6 @@
   return parts.disp != NULL_RTX;
 })
 
-;; Return true if OP is memory operand which will need zero or
-;; one register at most, not counting stack pointer or frame pointer.
-(define_predicate cmpxchg8b_pic_memory_operand
-  (match_operand 0 memory_operand)
-{
-  struct ix86_address parts;
-  int ok;
-
-  if (TARGET_64BIT || !flag_pic)
-return true;
-
-  ok = ix86_decompose_address (XEXP (op, 0), parts);
-  gcc_assert (ok);
-
-  if (parts.base  GET_CODE (parts.base) == SUBREG)
-parts.base = SUBREG_REG (parts.base);
-  if (parts.index  GET_CODE (parts.index) == SUBREG)
-parts.index = SUBREG_REG (parts.index);
-
-  if (parts.base == NULL_RTX
-  || parts.base == arg_pointer_rtx
-  || parts.base == frame_pointer_rtx
-  || parts.base == hard_frame_pointer_rtx
-  || parts.base == stack_pointer_rtx)
-return true;
-
-  if (parts.index == NULL_RTX
-  || parts.index == arg_pointer_rtx
-  || parts.index == frame_pointer_rtx
-  || parts.index == hard_frame_pointer_rtx
-  || parts.index == stack_pointer_rtx)
-return true;
-
-  return false;
-})
-
-
 ;; Return true if OP is memory operand that cannot be represented
 ;; by the modRM array.
 (define_predicate long_memory_operand
Index: i386-protos.h
===
--- i386-protos.h   (revision 216245)
+++ i386-protos.h   (working copy)
@@ -142,7 +142,6 @@ extern void ix86_split_lshr (rtx *, rtx, enum mach
 extern rtx ix86_find_base_term (rtx);
 extern bool ix86_check_movabs (rtx, int);
 extern void ix86_split_idivmod (enum machine_mode, rtx[], bool);
-extern bool ix86_emit_cfi ();
 
 extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot);
 extern int ix86_attr_length_immediate_default (rtx_insn *, bool);
Index: sync.md
===
--- sync.md (revision 216245)
+++ sync.md (working copy)
@@ -351,10 +351,9 @@
   else
 {
   enum machine_mode hmode = CASHMODEmode;
-  rtx lo_o, lo_e, lo_n, hi_o, hi_e, hi_n, mem;
+  rtx lo_o, lo_e, lo_n, hi_o, hi_e, hi_n;
 
   lo_o = operands[1];
-  mem  = operands[2];
   lo_e = operands[3];
   lo_n = operands[4];
   hi_o = gen_highpart (hmode, lo_o);
@@ -364,12 +363,9 @@
   lo_e = gen_lowpart (hmode, lo_e);
   lo_n = gen_lowpart (hmode, lo_n);
 
-  if (!cmpxchg8b_pic_memory_operand (mem, MODEmode))
-   mem = replace_equiv_address (mem, force_reg (Pmode, XEXP (mem, 0)));
-
   emit_insn
(gen_atomic_compare_and_swapmode_doubleword
-(lo_o, hi_o, mem, lo_e, hi_e, lo_n, hi_n, operands[6]));
+(lo_o, hi_o, operands[2], lo_e, hi_e, lo_n, hi_n, operands[6]));
 }
 
   ix86_expand_setcc (operands[0], EQ, gen_rtx_REG (CCZmode, FLAGS_REG),
@@ -398,57 +394,27 @@
 ;; That said, in order to take advantage of possible lower-subreg opts,
 ;; treat all of the integral operands in the same way.
 
-;; Operands 5 and 6 really need to be different registers, which in
-;; this case means op5 must not be ecx.  If op5 and op6 are the same
-;; (like when the input is -1LL) GCC might chose to allocate op5 to ecx,
-;; like op6.  This breaks, as the xchg will move the PIC register
-;; contents to %ecx then -- boom.
-
 (define_mode_attr doublemodesuffix [(SI 8) (DI 16)])
-(define_mode_attr regprefix [(SI e) (DI r)])
 
 (define_insn

Re: [PATCH, i386]: Fix PR 59432, sync/atomic FAILs on 32bit x86 systems without .cfi directives

2014-10-16 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 11:06 AM, Andi Kleen a...@firstfloor.org wrote:

 Now that %ebx is no more fixed, we can remove all PIC related
 complications in atomic_compare_and_swapdwi_doubleword pattern. The
 immediate consequence is, that we avoid hidden xchgs that clobbered
 unwinding state.

 Could also do the same in cpuid.h now

I am just writing the patch submission ;)

Uros.


[RFC PATCH, i386]: Remove special PIC related __cpuid definitions from config/i386/cpuid.h

2014-10-16 Thread Uros Bizjak
Hello!

Now that %ebx is also allocatable in PIC modes, we can cleanup
config/i386/cpuid considerably. I propose to remove all PIC related
specializations of __cpuid and __cpuid_count and protect the
compilation with #if __GNUC__ = 5.

The only drawback would be that non-bootstrapped build with gcc  5.0
will ignore -march=native, but I think this should be acceptable.
Bootstrapped build will still work as expected.

2014-10-16  Uros Bizjak  ubiz...@gmail.com

* config/i386/cpuid.h (__cpuid): Remove definitions that handle %ebx
register in a special way.
(__cpuid_count): Ditto.
* config/i386/driver-i386.h: Protect with #if __GNUC__ = 5.
(host_detect_local_cpu): Mention that GCC that is able to handle
%ebx register in PIC and non-PIC modes is required to compile the
function.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Any comments?

Uros.
Index: config/i386/cpuid.h
===
--- config/i386/cpuid.h (revision 216282)
+++ config/i386/cpuid.h (working copy)
@@ -146,56 +146,7 @@
 #define signature_VORTEX_ecx   0x436f5320
 #define signature_VORTEX_edx   0x36387865
 
-#if defined(__i386__)  defined(__PIC__)
-/* %ebx may be the PIC register.  */
-#if __GNUC__ = 3
 #define __cpuid(level, a, b, c, d) \
-  __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchg{l}\t{%%}ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level))
-
-#define __cpuid_count(level, count, a, b, c, d)\
-  __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchg{l}\t{%%}ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level), 2 (count))
-#else
-/* Host GCCs older than 3.0 weren't supporting Intel asm syntax
-   nor alternatives in i386 code.  */
-#define __cpuid(level, a, b, c, d) \
-  __asm__ (xchgl\t%%ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchgl\t%%ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level))
-
-#define __cpuid_count(level, count, a, b, c, d)\
-  __asm__ (xchgl\t%%ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchgl\t%%ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level), 2 (count))
-#endif
-#elif defined(__x86_64__)  (defined(__code_model_medium__) || 
defined(__code_model_large__))  defined(__PIC__)
-/* %rbx may be the PIC register.  */
-#define __cpuid(level, a, b, c, d) \
-  __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \
-  cpuid\n\t  \
-  xchg{q}\t{%%}rbx, %q1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level))
-
-#define __cpuid_count(level, count, a, b, c, d)\
-  __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \
-  cpuid\n\t  \
-  xchg{q}\t{%%}rbx, %q1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level), 2 (count))
-#else
-#define __cpuid(level, a, b, c, d) \
   __asm__ (cpuid\n\t \
   : =a (a), =b (b), =c (c), =d (d) \
   : 0 (level))
@@ -204,8 +155,8 @@
   __asm__ (cpuid\n\t \
   : =a (a), =b (b), =c (c), =d (d) \
   : 0 (level), 2 (count))
-#endif
 
+
 /* Return highest supported input value for cpuid instruction.  ext can
be either 0x0 or 0x800 to return highest supported value for
basic or extended cpuid information.  Function returns 0 if cpuid
Index: config/i386/driver-i386.c
===
--- config/i386/driver-i386.c   (revision 216282)
+++ config/i386/driver-i386.c   (working copy)
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 
 const char *host_detect_local_cpu (int argc, const char **argv);
 
-#ifdef __GNUC__
+#if __GNUC__ = 5
 #include cpuid.h
 
 struct cache_desc
@@ -942,9 +942,10 @@ done:
 }
 #else
 
-/* If we aren't compiling with GCC then the driver will just ignore
-   -march and -mtune native target and will leave to the newly
-   built compiler to generate code for its default target.  */
+/* If we aren't compiling with GCC that is able to handle %EBX
+   register in PIC and non-PIC modes, then the driver will just
+   ignore -march and -mtune native target and will leave to the
+   newly built compiler to generate code for its default target.  */
 
 const char *host_detect_local_cpu (int, const char **)
 {


Re: [RFC PATCH, i386]: Remove special PIC related __cpuid definitions from config/i386/cpuid.h

2014-10-16 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 11:36 AM, Jakub Jelinek ja...@redhat.com wrote:

  Now that %ebx is also allocatable in PIC modes, we can cleanup
  config/i386/cpuid considerably. I propose to remove all PIC related
  specializations of __cpuid and __cpuid_count and protect the
  compilation with #if __GNUC__ = 5.
 
  The only drawback would be that non-bootstrapped build with gcc  5.0
  will ignore -march=native, but I think this should be acceptable.
 
  I'm worried about that.
  Can't you instead keep the current cpuid.h stuff as is, just add
   __GNUC__  5
  to that, so it treats GCC 5+ PIC as if __PIC__ wasn't defined?
 
  Or, at least use cpuid.h even for older GCC if __PIC__ is not defined
  (or __x86_64__ is defined and not medium/large PIC model)?

 Do we really care that much about non-bootstrapped build? I don't see

 At least on Linux, driver-i386.c should not be built with PIC normally,
 so at least changing
 #if __GNUC__ = 5
 to
 #if defined(__GNUC__)  (__GNUC__ = 5 || !defined(__PIC__))
 would limit the -march=native change for non-bootstrapped compilers to
 Darwin only (or what other targets use PIC by default?).

Yes, this would work for me - the goal is to keep only one universal
__cpuid (and __cpuid_count) define, and the above condition fits this
goal.

Uros.


Re: [PATCH i386 AVX512] [79/n] Extend expand_mul_widen_hilo.

2014-10-16 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 8:28 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends expand_mul_widen_hilo to 512-bit QI,SI,HI modes.

 Bootstrapped and regtested

 gcc/
 * config/i386/i386.c
 (ix86_expand_mul_widen_hilo): Handle V32HI, V16SI, V64QI modes.

 Is it ok for trunk?

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [78/n] Use blend for inserting.

2014-10-16 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 9:28 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Oct 16, 2014 at 10:24:45AM +0400, Kirill Yukhin wrote:
 Hello,
 This patch extends insertion hook.

 AVX-512* tests on top of patch-set all pass
 under simulator.


 gcc/
   * config/i386/i386.c
   (ix86_expand_vector_set): Handle V8DF, V8DI, V16SF, V16SI, V32HI, V64QI
   modes.

 Just a ChangeLog comment style (seen in several entries you've committed
 and several posted patches).  Please don't put a line break right after
 the filename if the (functionname): part fits nicely on the same line, the
 description can be wrapped anywhere as appropriate.
 In this case,
 * config/i386/i386.c (ix86_expand_vector_set): Handle V8DF, V8DI,
 V16SF, V16SI, V32HI, V64QI modes.
 is shorter and more readable.

 Other than that, this particular patch LGTM (unless we'd want for the 4
 mostly repetitious cases add a common handling spot, which would need the
 gen fnpointer and kmode vars set before goto), but I'll leave it to Uros to
 ack it.

Let's leave this as it is for now.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [80/n] Extend expand_sse2_mulvxdi3.

2014-10-16 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 1:55 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends expand_sse2_mulvxdi3.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Extend
 expand_sse2_mulvxdi3.

 --
 Thanks, K

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 1ee947a..945bc8d 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -45667,7 +45667,19 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
enum machine_mode mode = GET_MODE (op0);
rtx t1, t2, t3, t4, t5, t6;

 -  if (TARGET_XOP  mode == V2DImode)
 +  if (TARGET_AVX512DQ)
 +{
 +  if (mode == V8DImode)
 +   emit_insn (gen_avx512dq_mulv8di3 (op0, op1, op2));
 +  else if (TARGET_AVX512VL)
 +   {
 + if (mode == V4DImode)
 +   emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2));
 + else if (mode == V2DImode)
 +   emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2));

Should this be v2di ?

 +   }
 +}
 +  else if (TARGET_XOP  mode == V2DImode)
  {
/* op1: A,B,C,D, op2: E,F,G,H */
op1 = gen_lowpart (V4SImode, op1);

Please use function pointers in the added part.

Thanks,
Uros.


Re: [RFC PATCH, i386]: Remove special PIC related __cpuid definitions from config/i386/cpuid.h

2014-10-17 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 12:25 PM, Uros Bizjak ubiz...@gmail.com wrote:

  Now that %ebx is also allocatable in PIC modes, we can cleanup
  config/i386/cpuid considerably. I propose to remove all PIC related
  specializations of __cpuid and __cpuid_count and protect the
  compilation with #if __GNUC__ = 5.
 
  The only drawback would be that non-bootstrapped build with gcc  5.0
  will ignore -march=native, but I think this should be acceptable.
 
  I'm worried about that.
  Can't you instead keep the current cpuid.h stuff as is, just add
   __GNUC__  5
  to that, so it treats GCC 5+ PIC as if __PIC__ wasn't defined?
 
  Or, at least use cpuid.h even for older GCC if __PIC__ is not defined
  (or __x86_64__ is defined and not medium/large PIC model)?

 Do we really care that much about non-bootstrapped build? I don't see

 At least on Linux, driver-i386.c should not be built with PIC normally,
 so at least changing
 #if __GNUC__ = 5
 to
 #if defined(__GNUC__)  (__GNUC__ = 5 || !defined(__PIC__))
 would limit the -march=native change for non-bootstrapped compilers to
 Darwin only (or what other targets use PIC by default?).

 Yes, this would work for me - the goal is to keep only one universal
 __cpuid (and __cpuid_count) define, and the above condition fits this
 goal.

I have committed the attached patch to mainline SVN.

2014-10-17  Uros Bizjak  ubiz...@gmail.com

* config/i386/cpuid.h (__cpuid): Remove definitions that handle %ebx
register in a special way.
(__cpuid_count): Ditto.
* config/i386/driver-i386.h: Protect with
#if defined(__GNUC__)  (__GNUC__ = 5 || !defined(__PIC__)).
(host_detect_local_cpu): Mention that GCC with non-fixed %ebx
is required to compile the function.

Bootstrapped and regression tested on x86_64-linux-gnu.

Uros.
Index: config/i386/cpuid.h
===
--- config/i386/cpuid.h (revision 216298)
+++ config/i386/cpuid.h (working copy)
@@ -146,56 +146,7 @@
 #define signature_VORTEX_ecx   0x436f5320
 #define signature_VORTEX_edx   0x36387865
 
-#if defined(__i386__)  defined(__PIC__)
-/* %ebx may be the PIC register.  */
-#if __GNUC__ = 3
 #define __cpuid(level, a, b, c, d) \
-  __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchg{l}\t{%%}ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level))
-
-#define __cpuid_count(level, count, a, b, c, d)\
-  __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchg{l}\t{%%}ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level), 2 (count))
-#else
-/* Host GCCs older than 3.0 weren't supporting Intel asm syntax
-   nor alternatives in i386 code.  */
-#define __cpuid(level, a, b, c, d) \
-  __asm__ (xchgl\t%%ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchgl\t%%ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level))
-
-#define __cpuid_count(level, count, a, b, c, d)\
-  __asm__ (xchgl\t%%ebx, %k1\n\t \
-  cpuid\n\t  \
-  xchgl\t%%ebx, %k1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level), 2 (count))
-#endif
-#elif defined(__x86_64__)  (defined(__code_model_medium__) || 
defined(__code_model_large__))  defined(__PIC__)
-/* %rbx may be the PIC register.  */
-#define __cpuid(level, a, b, c, d) \
-  __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \
-  cpuid\n\t  \
-  xchg{q}\t{%%}rbx, %q1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level))
-
-#define __cpuid_count(level, count, a, b, c, d)\
-  __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \
-  cpuid\n\t  \
-  xchg{q}\t{%%}rbx, %q1\n\t  \
-  : =a (a), =r (b), =c (c), =d (d)\
-  : 0 (level), 2 (count))
-#else
-#define __cpuid(level, a, b, c, d) \
   __asm__ (cpuid\n\t \
   : =a (a), =b (b), =c (c), =d (d) \
   : 0 (level))
@@ -204,8 +155,8 @@
   __asm__ (cpuid\n\t \
   : =a (a), =b (b), =c (c), =d (d) \
   : 0 (level), 2 (count))
-#endif
 
+
 /* Return highest supported input value for cpuid instruction.  ext can
be either 0x0 or 0x800 to return highest supported value for
basic or extended cpuid information.  Function returns 0 if cpuid
Index: config/i386/driver-i386.c

Re: [PATCH i386 AVX512] [80/n] Extend expand_sse2_mulvxdi3.

2014-10-17 Thread Uros Bizjak
On Fri, Oct 17, 2014 at 2:32 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello Uroš,
 On 16 Oct 14:29, Uros Bizjak wrote:
  + if (mode == V4DImode)
  +   emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2));
  + else if (mode == V2DImode)
  +   emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2));

 Should this be v2di ?
 Right, copy-and-paste :(

  +   }
  +}
  +  else if (TARGET_XOP  mode == V2DImode)
   {
 /* op1: A,B,C,D, op2: E,F,G,H */
 op1 = gen_lowpart (V4SImode, op1);

 Please use function pointers in the added part.
 Done.

 Updated patch in the bottom. Is it ok?

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [75/n] Update vec_init.

2014-10-17 Thread Uros Bizjak
On Fri, Oct 17, 2014 at 2:57 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 17, 2014 at 04:28:12PM +0400, Kirill Yukhin wrote:
  I wonder whether for these modes it can ever be beneficial to build them
  through interleaves/concatenations etc., if it wouldn't be better to build
  them by storing all values into memory and just reading it back.
 I've tried this example:
 #include immintrin.h

 unsigned char a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, 
 a14,
   a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29,
   a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44,
   a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59,
   a60, a61, a62, a63;

 __m512i foo ()
 {
   return __extension__ (__m512i)(__v64qi){
 a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14,
   a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, 
 a29,
   a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, 
 a44,
   a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, 
 a59,
   a60, a61, a62, a63 };
 }

 w/ and w/o -mavx512bw (and always -mavx512f).

 When, this code works, we've got 127 lines of assembly to do this init.
 W/o AVX-512BW we've got  300 lines of code (mostly on GPRs, using sal, and 
 etc.)

 Then I've looked into actual assembly w/ -mavx512bw and it turns out that no
 AVX-512BW insn were generated, only AVX-512F (and below). Fixed iterator.

 Ok, if it is shorter than copying all those into memory and reading from
 memory, so be it.

   -(define_mode_iterator VI48F_512 [V16SI V16SF V8DI V8DF])
   +(define_mode_iterator VI48F_I12_AVX512BW
   +  [V16SI V16SF V8DI V8DF
   +  (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512BW)])
 
  What does the I12 stand for?  Wasn't it meant to be VI48F_512_AVX512BW
  or I512?
 Actually, I am not awere of any name convention for iterators.
 As far as I understand, name [more or less] for vector mode
 should reflect:
   - Type family of the unit: float or int
   - Size of the unit: 1, 2, 4 etc. bytes
   - If possible, target predicates to enable certain modes in
 given iterator.

 The name is:
   - Vector (V)
   - I48F - contains both ints and floats of size 4 and 8
   - I12 - contains ints of size 1 and 2
   - AVX512BW - affected by the target (according to previous note - to be 
 removed)

 Maybe it'll be better to name it: VF48_I1248?

 I'll leave that to Uros, the patch is ok by me.

Don't want to bikeshed, but VF48_I1248 looks somehow better to me.

Anyway, the patch is OK even without this change.

Thanks,
Uros.


Re: [PATCH i386 AVX512 Boostrap] [80/n] Extend expand_sse2_mulvxdi3.

2014-10-17 Thread Uros Bizjak
On Fri, Oct 17, 2014 at 4:25 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This is fix for bootstrap failure.

 Is it OK?

 gcc/
 * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Refactor
 conditions to fix bootstrap.

Well, OK.

Uros.


Re: [PATCH 5/5] Use preferred_for_speed in i386.md

2014-10-17 Thread Uros Bizjak
On Fri, Oct 17, 2014 at 4:54 PM, Richard Sandiford
richard.sandif...@arm.com wrote:
 Undo the original fix for 61630 and use preferred_for_speed in the
 problematic pattern.

 I've not written many gcc.target/i386 tests so the markup might need
 some work.

 Richard


 gcc/
 * lra.c (lra): Remove call to recog_init.
 * config/i386/i386.md (preferred_for_speed): New attribute
 (*floatSWI48:modeMODEF:mode2_sse): Override it instead of
 enabled.

 gcc/testsuite/
 * gcc.target/i386/conversion-2.c: New test.

Please use the attached testcase that is also compatible with 32bit
compiles (and removes unnecessary comments in the asm). Please also
mention the revert  in ChangeLog.

OK for x86 part with these changes.

Thanks,
Uros.
/* { dg-do compile } */
/* { dg-options -O2 -fno-toplevel-reorder -mtune=bdver2 } */
/* { dg-additional-options -mregparm=1 -msse -mfpmath=sse { target ia32 } } */

void __attribute__ ((hot))
f1 (int x)
{
  register float f asm (%xmm0) = x;
  asm volatile ( :: x (f));
}

void __attribute__ ((cold))
f2 (int x)
{
  register float f asm (%xmm1) = x;
  asm volatile ( :: x (f));
}

void __attribute__ ((hot))
f3 (int x)
{
  register float f asm (%xmm2) = x;
  asm volatile ( :: x (f));
}

void __attribute__ ((cold))
f4 (int x)
{
  register float f asm (%xmm3) = x;
  asm volatile ( :: x (f));
}

/* { dg-final { scan-assembler sp\\\), %xmm0 } } */
/* { dg-final { scan-assembler (ax|di), %xmm1 } } */
/* { dg-final { scan-assembler sp\\\), %xmm2 } } */
/* { dg-final { scan-assembler (ax|di), %xmm3 } } */


Re: [x86] Replace builtins with vector extensions

2014-10-17 Thread Uros Bizjak
On Sun, Oct 12, 2014 at 10:36 PM, Marc Glisse marc.gli...@inria.fr wrote:

 for the first patch, it is actually easier to repost the old patch with some
 new testcases. That doesn't mean it has to go in all at once, but you can
 comment on various things. If that's a problem I'll separate them and repost
 separately.

 For simple +-*/ for double, I mostly wonder if __A + __B is good enough or
 if (__m128d)((__v2df)__A + (__v2df)__B) would be better. Note that for
 integer vectors (future patch) those casts will be necessary. Maybe I should
 write the integer patch, for a single header, so you can compare.

This is a header, so I think some more casts won't hurt. Also for
consistency with integer ops, as you mentioned.

 I implemented _mm_cvtsi128_si64 with simply __A[0], but the corresponding
 V4SI operation will need ((__v4si)__A)[0], and I don't know if it is better
 to cast everywhere for uniformity, or just where it is necessary. I doubt it
 matters much for the generated code.

Let's go with more casts, just to avoid surprises. This is a header after all.

 Since we are keeping the builtins for Ada, it would be possible to follow
 Ulrich's suggestion and keep the old version of the intrinsics, protected by
 a macro. I would not like that much...

Nope. I also hope that Ada will convert to ppc's approach someday, so
these builtins will be removed.

Also, please note that builtins are not published interface. They
change from time to time ;)

 Something like _mm_load[hl]_pd is in my opinion roughly where the complexity
 limit should be. They would be nice to have, I expect the compiler will
 almost always generate a sequence at least as good, but we are getting quite
 far from the user's code so the likelihood of pessimizing somehow may
 increase. Also, an alternate implementation would be __A[i] = *__B, but IIRC
 the middle-end optimizers have a harder time handling this form. Those 2
 intrinsics should probably not be included in the first batch, but I wanted
 to show them.

Let's start without them, so we convert simple arithmetic first.

 All testcases fail without the patch. The 3rd testcase almost works, there
 is a note in RTL saying that the result is 0.0, but it doesn't use it. (I
 just noticed that all functions are still called myfma, I'll rename those
 not related to fma to 'f')

 If I omit the avx512fintrin.h part, I think it is very unlikely this can
 conflict with Kirill's work in any way (but I can still wait / use a
 branch).

While looking correct, I am a bit nervous about avx512fintrin.h
changes, mainly because I have not much experience with these
patterns. I have adder Kirill to CC for possible comments.

 Bootstrap+testsuite on x86_64-linux-gnu.

 2014-10-13  Marc Glisse  marc.gli...@inria.fr

 gcc/
 * config/i386/xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps,
 _mm_div_ps, _mm_store_ss, _mm_cvtss_f32): Use vector extensions
 instead of builtins.
 * config/i386/avxintrin.h (_mm256_add_pd, _mm256_add_ps,
 _mm256_div_pd, _mm256_div_ps, _mm256_mul_pd, _mm256_mul_ps,
 _mm256_sub_pd, _mm256_sub_ps): Likewise.
 * config/i386/avx512fintrin.h (_mm512_add_pd, _mm512_add_ps,
 _mm512_sub_pd, _mm512_sub_ps, _mm512_mul_pd, _mm512_mul_ps,
 _mm512_div_pd, _mm512_div_ps): Likewise.
 * config/i386/emmintrin.h (_mm_store_sd, _mm_cvtsd_f64,
 _mm_storeh_pd,
 _mm_cvtsi128_si64, _mm_cvtsi128_si64x, _mm_add_pd, _mm_sub_pd,
 _mm_mul_pd, _mm_div_pd, _mm_storel_epi64, _mm_movepi64_pi64,
 _mm_loadh_pd, _mm_loadl_pd): Likewise.
 (_mm_sqrt_sd): Fix comment.
 gcc/testsuite/
 * gcc.target/i386/intrinsics_opt-1.c: New testcase.
 * gcc.target/i386/intrinsics_opt-2.c: Likewise.
 * gcc.target/i386/intrinsics_opt-3.c: Likewise.
 * gcc.target/i386/intrinsics_opt-4.c: Likewise.

I don't have many comments on simple arithmetic, and changes look
trivial. I'd say that simple arithmetic part is OK for branch.

One last note: scalar builtins preserve highpart of target register.
IIRC, I have tried to convert _mm_frcz_s{s,d} to use scalars, but
resulted in a horrible code. Current approach uses
__builtin_ia32_movs{s,d} to generate optimal code, but I didn't test
if current gcc improved in this part.

Thanks,
Uros.


 --
 Marc Glisse
 Index: gcc/config/i386/avx512fintrin.h
 ===
 --- gcc/config/i386/avx512fintrin.h (revision 216116)
 +++ gcc/config/i386/avx512fintrin.h (working copy)
 @@ -10742,26 +10742,21 @@ _mm512_maskz_sqrt_ps (__mmask16 __U, __m
  (__v16sf)
  _mm512_setzero_ps (),
  (__mmask16) __U,
  _MM_FROUND_CUR_DIRECTION);
  }

  extern __inline __m512d
  __attribute__ ((__gnu_inline__, __always_inline__, 

[PATCH, rtl-optimization]: Remove const_alias_set

2014-10-19 Thread Uros Bizjak
Hello!

The fix that fixed scheduler issues with AND addresses (the fix
prevented early exit for MEM_READONLY_P addresses when AND alignment
addresses were involved) caused some fall-out for libgo testsuite.
These tests triggered an assert in mems_in_disjoint_alias_sets_p,
which checks for zero alias set when flag_strict_aliasing is false. We
have had some off-list discussion with Ian Lance Taylor about this
issue.

The problem was, that Go dynamically switches off flag_strict_aliasing
after compilation started and when unsafe package is imported
(similar to when__attribute__ ((optimize (-fno-strict-aliasing))) is
used in c). To mitigate this issue, the Go frontend called
varasm_init_once again to recalculated (= cleared) const_alias_set in
this case.

As observed in [1], the fix for canon_true_depence [2] that introduced
quick exit for a MEM_READONLY_P operands made const_alias_set
redundant, it is no longer user for anything.

The patch that fixed scheduling of AND operands removed early
MEM_READONLY_P exit for memory operands with AND realignment, so
operands could reach more complex code later in the function that was
able to determine dependence of memory operands. This code includes
the call to mems_in_disjoint_alias_sets_p, and the assert triggered
again for some MEM_READONLY_P operands that have had non-zero alias
set, set from the value, cached in const_alias_set from before
flag_strict_aliasing flag was cleared.

The proposed solution is to remove const_alias_set altogether. The
MEM_READONLY_P successfully supersedes const_alias_set functionality,
and this is also confirmed by the removal of the second
varasm_init_once call in the Go frontend. In an off-list discussion,
Ian agrees that attached patch should also fix the problem.

2014-10-19  Uros Bizjak  ubiz...@gmail.com

* varasm.c (const_alias_set): Remove.
(init_varasm_once): Remove initialization of const_alias_set.
(build_constant_desc): Do not set alias set to const_alias_set.

The patch was tested on alpha-linux-gnu [3], alphaev68-linux-gnu and
x86_64-linux-gnu {,-m32} for all default languages plus Go and
obj-c++.

The patch fixes all mentioned libgo failures on alpha.

OK for mainline?

[1] https://gcc.gnu.org/ml/gcc-patches/2013-07/msg01033.html
[2] https://gcc.gnu.org/ml/gcc-patches/2010-07/msg01758.html
[3] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg02041.html

Uros.
Index: varasm.c
===
--- varasm.c(revision 216362)
+++ varasm.c(working copy)
@@ -98,11 +98,6 @@ tree last_assemble_variable_decl;
 
 bool first_function_block_is_cold;
 
-/* We give all constants their own alias set.  Perhaps redundant with
-   MEM_READONLY_P, but pre-dates it.  */
-
-static alias_set_type const_alias_set;
-
 /* Whether we saw any functions with no_split_stack.  */
 
 static bool saw_no_split_stack;
@@ -3231,7 +3226,6 @@ build_constant_desc (tree exp)
   rtl = gen_const_mem (TYPE_MODE (TREE_TYPE (exp)), symbol);
   set_mem_attributes (rtl, exp, 1);
   set_mem_alias_set (rtl, 0);
-  set_mem_alias_set (rtl, const_alias_set);
 
   /* We cannot share RTX'es in pool entries.
  Mark this piece of RTL as required for unsharing.  */
@@ -5928,7 +5922,6 @@ init_varasm_once (void)
   object_block_htab = hash_tableobject_block_hasher::create_ggc (31);
   const_desc_htab = hash_tabletree_descriptor_hasher::create_ggc (1009);
 
-  const_alias_set = new_alias_set ();
   shared_constant_pool = create_constant_pool ();
 
 #ifdef TEXT_SECTION_ASM_OP


Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.

2014-10-20 Thread Uros Bizjak
On Mon, Oct 20, 2014 at 3:41 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Oct 20, 2014 at 05:30:36PM +0400, Kirill Yukhin wrote:
  Unfortunately this caused PR63600.  The problem is that VI_AVX2
  mode iterator includes V2DI and for AVX2 also V4DI, but for pre-ssse3
  ix86_expand_sse2_abs doesn't handle V2DI (and can't easily, we don't have
  PSRAQ instruction), for ssse3 there is no vpabsq instruction, and for
  avx2 neither.
  We can handle V2DI/V4DI only for TARGET_AVX512VL, and V8DI for
  TARGET_AVX512F.
  Thus, IMHO the mode iterator on at least
  (define_insn *absmode2
  and on
  (define_expand absmode2
  is wrong, should not include V2DI/V4DI unless TARGET_AVX512VL
  (so new (or ressurrected, was that VI124_AVX2_48_AVX512F?)
  specialized mode iterator?).


 This patch removes absq insn patterns for non-AVX-512 targets.


 gcc/
   * config/i386/sse.md (define_mode_iterator VI_AVX2): Restore to 128-,
   256- bit integer modes only.
   (define_mode_iterator VI_AVX2_AVX512): New.
   (define_expand negmode2): Use VI_AVX2_AVX512 mode iterator.
   (define_expand plusminus_insnmode3): Ditto.
   (define_insn *plusminus_insnmode3): Ditto.
   (define_expand sse2_avx2_andnotmode3): Ditto.
   (define_mode_iterator VI1248_AVX512VL_AVX512BW): New.
   (define_insn absVI1248_AVX512VL_AVX512BW:mode2): Ditto.

 Bootstrap in progress. AVX-512 tests pass.

 Is it ok for trunk?

 I'll certainly leave the review to Uros, whatever he prefers.
 That said, I was expecting you'd keep VI_AVX2 as is (because from the patch
 clearly that is what is used most commonly, the V?DI modes are for most
 insns normal integral vector modes, VI* uses the same modes and VI_AVX2
 used to be just like VI, just with TARGET_AVX conditions replaced with
 TARGET_AVX2), and just add a new mode iterator for the two abs patterns
 (*absmode2 and absmode2), it can be specialized mode iterator just
 for the abs with ABS in names or something.

Yes, I like this idea, too.

Just add IV1248_AVX512VL_AVX512BW and use it in abs patterns.

The changed patch is pre-approved, but please still make full
bootstrap and regtest cycle.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-20 Thread Uros Bizjak
On Mon, Oct 20, 2014 at 5:19 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  The patch is OK with the above improvement.
 
 

 Will commit version below, if no objections in 24 hours.


 Sorry,
 I've missed palignr, which should also have v64qi version,
 and lost return in expand_vec_perm_palignr case
 (this caused avx512f-vec-unpack test failures).
 Patch below fixes it. Ok for trunk?

 2014-10-20  Ilya Tocar  ilya.to...@intel.com

 * config/i386/i386.c (expand_vec_perm_1): Fix
 expand_vec_perm_palignr case.
 * config/i386/sse.md (ssse3_avx2_palignrmode_mask): Use
 VI1_AVX512.

OK.

Thanks,
Uros.


[PATCH, fixincludes]: Add pthread.h to glibc_c99_inline_4 fix

2014-10-21 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 2:05 PM, Jakub Jelinek ja...@redhat.com wrote:

  Recent change caused bootstrap failure on CentOS 5.11:
 
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  unwind-dw2-fde-dip_s.o: In function `__pthread_cleanup_routine':
  unwind-dw2-fde-dip.c:(.text+0x1590): multiple definition of
  `__pthread_cleanup_routine'
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  unwind-sjlj_s.o: In function `__pthread_cleanup_routine':
  unwind-sjlj.c:(.text+0x0): multiple definition of 
  `__pthread_cleanup_routine'
  unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  emutls_s.o: In function `__pthread_cleanup_routine':
  emutls.c:(.text+0x170): multiple definition of `__pthread_cleanup_routine'
  unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here
  collect2: error: ld returned 1 exit status
  gmake[5]: *** [libgcc_s.so] Error 1
 
  $ ld --version
  GNU ld version 2.17.50.0.6-26.el5 20061020

 It looks like a switch-to-c11 fallout. Older glibc versions have
 issues with c99 (and c11) conformance [1].

 Changing extern __inline void __pthread_cleanup_routine (...) in
 system /usr/include/pthread.h to

 if __STDC_VERSION__  199901L
 extern
 #endif
 __inline__ void __pthread_cleanup_routine (...)

 fixes this issue and allows bootstrap to proceed.

 However, fixincludes is not yet built in stage1 bootstrap. Is there a
 way to fix this issue without changing system headers?

 [1] https://gcc.gnu.org/ml/gcc-patches/2006-11/msg01030.html

 Yeah, old glibcs are totally incompatible with -fno-gnu89-inline.
 Not sure if it is easily fixincludable, if yes, then -fgnu89-inline should
 be used for code like libgcc which is built with the newly built compiler
 before it is fixincluded.
 Or we need -fgnu89-inline by default for old glibcs (that is pretty
 much what we do e.g. in Developer Toolset for RHEL5).

At the end of the day, adding pthread.h to glibc_c99_inline_4 fix
fixes the bootstrap. The fix applies __attribute__((__gnu_inline__))
to the declaration:

extern __inline __attribute__ ((__gnu_inline__)) void
__pthread_cleanup_routine (struct __pthread_cleanup_frame *__frame)

2014-10-21  Uros Bizjak  ubiz...@gmail.com

* inclhack.def (glibc_c99_inline_4): Add pthread.h to files.
* fixincl.x: Regenerate.

Bootstrapped and regression tested on CentOS 5.11 x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
Index: fixincl.x
===
--- fixincl.x   (revision 216501)
+++ fixincl.x   (working copy)
@@ -2,11 +2,11 @@
  * 
  * DO NOT EDIT THIS FILE   (fixincl.x)
  * 
- * It has been AutoGen-ed  August 12, 2014 at 02:09:58 PM by AutoGen 5.12
+ * It has been AutoGen-ed  October 21, 2014 at 10:18:16 AM by AutoGen 5.16.2
  * From the definitionsinclhack.def
  * and the template file   fixincl
  */
-/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Aug 12 14:09:58 MSK 2014
+/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Oct 21 10:18:17 CEST 2014
  *
  * You must regenerate it.  Use the ./genfixes script.
  *
@@ -3173,7 +3173,7 @@
  *  File name selection pattern
  */
 tSCC zGlibc_C99_Inline_4List[] =
-  sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0;
+  
sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0pthread.h\0*/pthread.h\0;
 /*
  *  Machine/OS name selection pattern
  */
Index: inclhack.def
===
--- inclhack.def(revision 216501)
+++ inclhack.def(working copy)
@@ -1687,7 +1687,8 @@
  */
 fix = {
 hackname  = glibc_c99_inline_4;
-files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h';
+files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h',
+pthread.h, '*/pthread.h';
 bypass= __extern_inline|__gnu_inline__;
 select= (^| )extern __inline;
 c_fix = format;


Re: [PATCH] Fix ubsan i?86 {add,sub,mul}vmode4 patterns

2014-03-25 Thread Uros Bizjak
On Tue, Mar 25, 2014 at 8:18 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Mar 25, 2014 at 04:06:40PM +0100, Jakub Jelinek wrote:
 On Tue, Mar 25, 2014 at 12:39:18PM +0100, Uros Bizjak wrote:
  The patch is OK in principle, but we could follow established practice
  and use separate predicates - please see general_szext_operand mode
  attribute definition.

 So like this?  I've tried to use non-VOIDmode of the predicates that were
 used previously (i.e. general_operand or x86_64_general_operand).

 2014-03-25  Jakub Jelinek  ja...@redhat.com

   * config/i386/i386.md (general_sext_operand): New mode attr.
   (addvmode4, subvmode4, mulvmode4): If operands[2] is CONST_INT,
   don't generate (sign_extend (const_int)).
   (*addvmode4, *subvmode4, *mulvmode4): Disallow CONST_INT_P
   operands[2].  Use We constraint instead of i and 
 general_sext_operand
   predicate instead of general_operand.
   (*addvmode4_1, *subvmode4_1, *mulvmode4_1): New insns.
   * config/i386/constraints.md (We): New constraint.
   * config/i386/predicates.md (x86_64_sext_operand,
   sext_operand): New predicates.

 Now successfully bootstrapped/regtested on x86_64-linux and i686-linux.

The patch is OK for mainline.

Thanks,
Uros.


<    1   2   3   4   5   6   7   8   9   10   >