Re: [PATCH] Fix find_base_term in 32-bit -fpic code (PR lto/64025)
On Tue, Nov 25, 2014 at 8:40 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Nov 25, 2014 at 12:25 AM, Jakub Jelinek ja...@redhat.com wrote: The fallback delegitimization I've added as last option mainly for debug info purposes, when we don't know if the base is a PIC register or say a PIC register plus some addend, unfortunately in some tests broke find_base_term, which for PLUS looks only at the first operand and recursion on it finds a base term, it returns it immediately. So, it found base term of _GLOBAL_OFFSET_TABLE_, when the right base term is actually in the second operand. This patch fixes it by swapping the operands, debug info doesn't care about the order, it won't match in any instruction anyway, but helps alias.c. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-11-24 Jakub Jelinek ja...@redhat.com PR lto/64025 * config/i386/i386.c (ix86_delegitimize_address): Ensure result comes before (addend - _GLOBAL_OFFSET_TABLE_) term. Can you also swap operands of (%ecx - %ebx) + foo? There is no point digging into RTX involving registers only when we know that we are looking for foo. This will also be consistent with the code you patched below. Something like attached prototype patch. Uros. Index: i386.c === --- i386.c (revision 218037) +++ i386.c (working copy) @@ -14847,19 +14847,20 @@ ix86_delegitimize_address (rtx x) leal (%ebx, %ecx, 4), %ecx ... movl foo@GOTOFF(%ecx), %edx -in which case we return (%ecx - %ebx) + foo -or (%ecx - _GLOBAL_OFFSET_TABLE_) + foo if pseudo_pic_reg +in which case we return foo + (%ecx - %ebx) +or foo + (%ecx - _GLOBAL_OFFSET_TABLE_) if pseudo_pic_reg and reload has completed. */ if (pic_offset_table_rtx (!reload_completed || !ix86_use_pseudo_pic_reg ())) -result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), -pic_offset_table_rtx), - result); +result = gen_rtx_PLUS (Pmode, result, + gen_rtx_MINUS (Pmode, copy_rtx (addend), + pic_offset_table_rtx)); else if (pic_offset_table_rtx !TARGET_MACHO !TARGET_VXWORKS_RTP) { rtx tmp = gen_rtx_SYMBOL_REF (Pmode, GOT_SYMBOL_NAME); - tmp = gen_rtx_MINUS (Pmode, copy_rtx (addend), tmp); - result = gen_rtx_PLUS (Pmode, tmp, result); + result = gen_rtx_PLUS (Pmode, result, +gen_rtx_MINUS (Pmode, copy_rtx (addend), + tmp)); } else return orig_x;
Re: [PATCH, ciklplus]: Use -ffloat-store for 32bit x86 in cilk-plus/AN/builtin_fn_{custom,mutating}.c
On Tue, Nov 25, 2014 at 10:23 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Nov 24, 2014 at 10:33 PM, Jeff Law l...@redhat.com wrote: On 11/22/14 11:50, Uros Bizjak wrote: Hello! These two tests fix PR target/63847 [1], where x87 excess precision causes testcase to fail. The problem was triggered by -fpic, please see the PR for analysis. The patch adds -ffloat-store for 32bit x86 target, a standard and well tested solution for this problem. 2014-11-22 Uros Bizjak ubiz...@gmail.com PR target/63847 * c-c++-common/cilk-plus/AN/builtin_fn_custom.c: Add -ffloat-store for 32bit x86 targets. * c-c++-common/cilk-plus/AN/builtin_fn_mutating.c: Ditto. OK. Don't we have -fexcess-precision=standard for this now? Oh ... indeed. I will update the patch to enable it for all x86 targets. Thanks, Uros.
Re: [PATCH, ciklplus]: Use -ffloat-store for 32bit x86 in cilk-plus/AN/builtin_fn_{custom,mutating}.c
On Tue, Nov 25, 2014 at 10:38 AM, Uros Bizjak ubiz...@gmail.com wrote: These two tests fix PR target/63847 [1], where x87 excess precision causes testcase to fail. The problem was triggered by -fpic, please see the PR for analysis. The patch adds -ffloat-store for 32bit x86 target, a standard and well tested solution for this problem. 2014-11-22 Uros Bizjak ubiz...@gmail.com PR target/63847 * c-c++-common/cilk-plus/AN/builtin_fn_custom.c: Add -ffloat-store for 32bit x86 targets. * c-c++-common/cilk-plus/AN/builtin_fn_mutating.c: Ditto. OK. Don't we have -fexcess-precision=standard for this now? Oh ... indeed. I will update the patch to enable it for all x86 targets. cc1plus: sorry, unimplemented: -fexcess-precision=standard for C++ Uros.
[PATCH, libgfortran]: Remove unused variable
Hello! 2014-11-25 Uros Bizjak ubiz...@gmail.com * intrinsics/env.c (getenv): Remove unused variable res_len. Bootstrapped on x86_64-linux-gnu. Almost trivial, but ... OK for mainline? Uros. Index: intrinsics/env.c === --- intrinsics/env.c(revision 218056) +++ intrinsics/env.c(working copy) @@ -42,7 +42,6 @@ PREFIX(getenv) (char * name, char * value, gfc_cha { char *name_nt; char *res = NULL; - int res_len; if (name == NULL || value == NULL) runtime_error (Both arguments to getenv are mandatory.);
[PATCH, libobjc]: Remove ‘...’ is static but used in inline function ‘...’ which is not static
Hello! Recently, gcc bootstrap started to emit following warnings when building libobjc: libobjc/sendmsg.c:338:13: warning: ‘get_implementation’ is static but used in inline function ‘get_imp’ which is not static libobjc/sendmsg.c:335:15: warning: ‘sarray_get_safe’ is static but used in inline function ‘get_imp’ which is not static libobjc/sendmsg.c:143:21: warning: ‘__objc_word_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static libobjc/sendmsg.c:141:21: warning: ‘__objc_double_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static libobjc/sendmsg.c:139:21: warning: ‘__objc_block_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static 2014-11-25 Uros Bizjak ubiz...@gmail.com * sendmsg.c (get_imp): Declare as static inline. (__objc_get_forward_imp): Ditto. Bootstrapped on x86_64-linux-gnu. OK for mainline? Uros. Index: sendmsg.c === --- sendmsg.c (revision 218056) +++ sendmsg.c (working copy) @@ -105,7 +105,7 @@ id nil_method (id, SEL); /* Given a selector, return the proper forwarding implementation. */ -inline +static inline IMP __objc_get_forward_imp (id rcv, SEL sel) { @@ -320,7 +320,7 @@ return res; } -inline +static inline IMP get_imp (Class class, SEL sel) {
Re: [PATCH 04/08] PR jit/63854: Remove xstrdup from ipa/cgraph fprintf calls
Hello! cgraph*.c and ipa-*.c use xstrdup on strings when dumping them via fprintf, leaking all of the duplicated buffers. Is/was there a reason for doing this? Yes, please see [1] and PR 53136 [2]. As said in [1]: There is a problem with multiple calls of cgraph_node_name in fprintf dumps. Please note that C++ uses caching in cxx_printable_name_internal (aka LANG_HOOKS_DECL_PRINTABLE_NAME), so when cxx_printable_name_internal is called multiple times from printf (i.e. fprintf %s/%i - %s/%i), it can happen that the first string gets evicted by the second call, before fprintf is fully evaluated. Taking them out fixes these leaks (seen when dumping is enabled): But you will get Invalid read of size X instead. The patch at [1] fixed these, but introduced memory leaks, which were tolerable at the time: I think that small memory leak is tolerable here (the changes are exclusively in the dump code), and follows the same approach as in java frontend. It seems that these assumptions are not valid anymore. [1] https://gcc.gnu.org/ml/gcc-patches/2012-04/msg01904.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53136 Uros.
Re: [PATCH, i386] Add new arg values for __builtin_cpu_supports
Hello! I've added avx512f support to __builtin_cpu_supports. I'm not sure about bw+vl, i think for compound values like avx512bd+dq+vl, arch is better. Also for such cases prority is unclear, what should we choose bw+vl or e. g. avx512f+er? I've left MPX bits in cpuid.h, in case we will need them later (e. g. for runtime mpx tests enabling). Ok for trunk? gcc/ * config/i386/cpuid.h (bit_MPX, bit_BNDREGS, bit_BNDCSR): Define. * config/i386/i386.c (get_builtin_code_for_version): Add avx512f. (fold_builtin_cpu): Ditto. * doc/extend.texi: Documment it. gcc/testsuite/ * g++.dg/ext/mv2.C: Add test for target (avx512f). * gcc.target/i386/builtin_target.c: Ditto. libgcc/ * config/i386/cpuinfo.c (processor_features): Add FEATURE_AVX512F. * config/i386/cpuinfo.c (get_available_features): Detect it. OK. Thanks, Uros.
[RFC PATCH, i386]: Prefer %ebx in set_got patterns
Hello! Attached patch helps RA to choose the most appropriate PIC register by changing the register preference for set_got patterns. Using this patch, there should really be a reason for RA to avoid ABI mandated hard PIC reg. This patch avoids many mov %exx,%ebx in front of the calls, that happen with unpatched compiler even with Vladimir's latest RA patch to avoid duplicated PIC registers. As a smoke test, I have checked 32bit libgo.so.6.0.0 library, where now we have: [uros@omen7 .libs]$ grep thunk.bx aaa | wc -l 7693 [uros@omen7 .libs]$ grep thunk.ax aaa | wc -l 10 [uros@omen7 .libs]$ grep thunk.cx aaa | wc -l 4 [uros@omen7 .libs]$ grep thunk.dx aaa | wc -l 8 [uros@omen7 .libs]$ grep thunk.bp aaa | wc -l 497 [uros@omen7 .libs]$ grep thunk.si aaa | wc -l 145 [uros@omen7 .libs]$ grep thunk.di aaa | wc -l 198 2014-11-27 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (set_got): Use =b,?r constraint for operand 0. (set_got_labelled): Ditto. (set_got_rex64): Ditto. (set_rip_rex64): Ditto. (set_got_offset_rex64): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Thoughts? Uros Index: config/i386/i386.md === --- config/i386/i386.md (revision 218111) +++ config/i386/i386.md (working copy) @@ -12101,7 +12101,7 @@ ix86_expand_prologue (); DONE;) (define_insn set_got - [(set (match_operand:SI 0 register_operand =r) + [(set (match_operand:SI 0 register_operand =b,?r) (unspec:SI [(const_int 0)] UNSPEC_SET_GOT)) (clobber (reg:CC FLAGS_REG))] !TARGET_64BIT @@ -12110,7 +12110,7 @@ (set_attr length 12)]) (define_insn set_got_labelled - [(set (match_operand:SI 0 register_operand =r) + [(set (match_operand:SI 0 register_operand =b,?r) (unspec:SI [(label_ref (match_operand 1))] UNSPEC_SET_GOT)) (clobber (reg:CC FLAGS_REG))] @@ -12120,7 +12120,7 @@ (set_attr length 12)]) (define_insn set_got_rex64 - [(set (match_operand:DI 0 register_operand =r) + [(set (match_operand:DI 0 register_operand =b,?r) (unspec:DI [(const_int 0)] UNSPEC_SET_GOT))] TARGET_64BIT lea{q}\t{_GLOBAL_OFFSET_TABLE_(%%rip), %0|%0, _GLOBAL_OFFSET_TABLE_[rip]} @@ -12129,7 +12129,7 @@ (set_attr mode DI)]) (define_insn set_rip_rex64 - [(set (match_operand:DI 0 register_operand =r) + [(set (match_operand:DI 0 register_operand =b,?r) (unspec:DI [(label_ref (match_operand 1))] UNSPEC_SET_RIP))] TARGET_64BIT lea{q}\t{%l1(%%rip), %0|%0, %l1[rip]} @@ -12138,7 +12138,7 @@ (set_attr mode DI)]) (define_insn set_got_offset_rex64 - [(set (match_operand:DI 0 register_operand =r) + [(set (match_operand:DI 0 register_operand =b,?r) (unspec:DI [(label_ref (match_operand 1))] UNSPEC_SET_GOT_OFFSET))]
Re: PATCH: PR target/63833: REAL_PIC_OFFSET_TABLE_REGNUM is wrong for x86-64
On Wed, Nov 12, 2014 at 3:53 PM, H.J. Lu hongjiu...@intel.com wrote: We have been using the wrong register to hold GOT in 64-bit large model, which is used by the large model PLT. The only reason we haven't run into any problem is linker doesn't support the large model PLT. I am looking into linker issue. This patch corrects REAL_PIC_OFFSET_TABLE_REGNUM for 64-bit large model. OK to install? Thanks. H.J. --- 2014-11-12 H.J. Lu hongjiu...@intel.com PR target/63833 * config/i386/i386.h (REAL_PIC_OFFSET_TABLE_REGNUM): Use R15_REG for 64-bit. * config/i386/rdos64.h (REAL_PIC_OFFSET_TABLE_REGNUM): Removed. OK, the ABI documentation has just been fixed. Thanks, Uros.
[PATCH, i386]: Use preferred_for_{size,speed} instead of Yx and Yd register constraints
Hello! Yx and Yd register constraints depend on optimize_function_for_speed. However, optimize_function_for_size is not stable during the compilation, so this can result in unrecognized insn (as was the case with *floatSWI48:modeMODEF:mode2_sse insn failure). The patch uses preferred_for_{size,speed} infrastructure instead to achieve the same functionality as with Yx and Yd register constraint. 2014-11-27 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (preferred_for_size): New attribute (*pushxf): Split Yx*r constraints to r,*r. Use preferred_for_size attribute to conditionally disable alternative 1. (*pushdf): Split Yd*r constraints to r,*r. Use preferred_for_size and prefered_for_speed attributes to conditionally disable alternative 1. (*movxf_internal): Split Yx*r constraints to r,*r. Use preferred_for_size attribute to conditionally disable alternatives 3 and 4. (*movdf_internal): Split Yd*r constraints to r,*r. Use preferred_for_size and prefered_for_speed attributes to conditionally disable alternatives 3 and 4. * config/i386/constraints.md (Yd, Yx): Remove register constraints. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and was committed to mainline SVN. Uros. Index: config/i386/constraints.md === --- config/i386/constraints.md (revision 218129) +++ config/i386/constraints.md (working copy) @@ -105,8 +105,6 @@ ;; n MMX inter-unit moves from MMX register enabled ;; a Integer register when zero extensions with AND are disabled ;; p Integer register when TARGET_PARTIAL_REG_STALL is disabled -;; d Integer register when integer DFmode moves are enabled -;; x Integer register when integer XFmode moves are enabled ;; f x87 register when 80387 floating point arithmetic is enabled (define_register_constraint Yz TARGET_SSE ? SSE_FIRST_REG : NO_REGS @@ -137,15 +135,6 @@ ? NO_REGS : GENERAL_REGS @internal Any integer register when zero extensions with AND are disabled.) -(define_register_constraint Yd - TARGET_INTEGER_DFMODE_MOVES optimize_function_for_speed_p (cfun) - ? GENERAL_REGS : NO_REGS - @internal Any integer register when integer DFmode moves are enabled.) - -(define_register_constraint Yx - optimize_function_for_speed_p (cfun) ? GENERAL_REGS : NO_REGS - @internal Any integer register when integer XFmode moves are enabled.) - (define_register_constraint Yf (ix86_fpmath FPMATH_387) ? FLOAT_REGS : NO_REGS @internal Any x87 register when 80387 FP arithmetic is enabled.) Index: config/i386/i386.md === --- config/i386/i386.md (revision 218129) +++ config/i386/i386.md (working copy) @@ -816,6 +816,7 @@ ] (const_int 1))) +(define_attr preferred_for_size (const_int 1)) (define_attr preferred_for_speed (const_int 1)) ;; Describe a user's asm statement. @@ -2811,8 +2812,8 @@ }) (define_insn *pushxf - [(set (match_operand:XF 0 push_operand =,) - (match_operand:XF 1 general_no_elim_operand f,Yx*roF))] + [(set (match_operand:XF 0 push_operand =,,,) + (match_operand:XF 1 general_no_elim_operand f,r,*r,oF))] { /* This insn should be already split before reg-stack. */ @@ -2819,14 +2820,18 @@ gcc_unreachable (); } [(set_attr type multi) - (set_attr unit i387,*) + (set_attr unit i387,*,*,*) (set (attr mode) - (cond [(eq_attr alternative 1) + (cond [(eq_attr alternative 1,2,3) (if_then_else (match_test TARGET_64BIT) (const_string DI) (const_string SI)) ] - (const_string XF)))]) + (const_string XF))) + (set (attr preferred_for_size) + (cond [(eq_attr alternative 1) + (symbol_ref false)] + (symbol_ref true)))]) ;; %%% Kill this when call knows how to work this out. (define_split @@ -2842,18 +2847,26 @@ }) (define_insn *pushdf - [(set (match_operand:DF 0 push_operand =,,,) - (match_operand:DF 1 general_no_elim_operand f,Yd*roF,rmF,x))] + [(set (match_operand:DF 0 push_operand =,) + (match_operand:DF 1 general_no_elim_operand f,r,*r,oF,rmF,x))] { /* This insn should be already split before reg-stack. */ gcc_unreachable (); } - [(set_attr isa *,nox64,x64,sse2) + [(set_attr isa *,nox64,nox64,nox64,x64,sse2) (set_attr type multi) - (set_attr unit i387,*,*,sse) - (set_attr mode DF,SI,DI,DF)]) - + (set_attr unit i387,*,*,*,*,sse) + (set_attr mode DF,SI,SI,SI,DI,DF) + (set (attr preferred_for_size) + (cond [(eq_attr alternative 1) + (symbol_ref false)] + (symbol_ref true))) + (set (attr preferred_for_speed) + (cond [(eq_attr alternative 1) + (symbol_ref TARGET_INTEGER_DFMODE_MOVES)] + (symbol_ref true)))]) + ;; %%% Kill this when call knows how to work this out
Re: RFA: one more version of the patch for PR61360
On Fri, Sep 26, 2014 at 10:31 PM, Vladimir Makarov vmaka...@redhat.com wrote: I guess we achieved the consensus about the following patch to fix PR61360 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was successfully bootstrapped and tested (w/wo -march=amdfam10) on x86/x86-64. Is it ok to commit to trunk? 2014-09-26 Vladimir Makarov vmaka...@redhat.com PR target/61360 * lra.c (lra): Remove call of recog_init. * recog.c (constrain_operands): Permit reg for memory constraint when LRA is used. * config/i386/i386.md (*floatSWI48:modeMODEF:mode2_sse): Enable first alternative independently on RA stage. Please also mention that this patch reverts: 2014-04-01 Richard Henderson r...@redhat.com PR target/60704 * config/i386/i386.md (*floatSWI48MODEF2_sse): Leave the second alternative enabled before register allocation. The x86 part is OK. BTW: I think that this patch should also be backported to 4.9 branch, since the original patch and the PR60704 fix were also installed there. Thanks, Uros.
Re: [PATCH i386 AVX512] [57/n] Extend blend/cmp/brodcast insn patterns.
On Fri, Sep 26, 2014 at 11:04 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom extends blend/cmp/brodcast insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx512f_blendmmode): Delete. (define_insn avx512_blendmVI48_AVX512VL:mode): New. (define_insn avx512_blendmVI12_AVX512VL:mode): Ditto.. (define_mode_attr cmp_imm_predicate): Add V8SF, V4DF, V8SI, V4DI, V4SF, V2DF, V4SI, V2DI, V32HI, V64QI, V16HI, V32QI, V8HI, V16QI modes. (define_insn avx512f_cmpmode3mask_scalar_merge_nameround_saeonly_name): Remove. (define_insn avx512_cmpVI48_AVX512VL:mode3mask_scalar_merge_nameround_saeonly_name): New. (define_insn avx512_cmpVI12_AVX512VL:mode3mask_scalar_merge_nameround_saeonly_name): Ditto. (define_insn mask_codeforavx512f_vec_dupmodemask_name): Delete. (define_insn avx512_vec_dupV48_AVX512VL:modemask_name): New. (define_insn avx512_vec_dupV12_AVX512VL:modemask_name): Ditto. (define_insn mask_codeforavx512f_vec_dup_gprmodemask_name): Delete. (define_insn mask_codeforavx512_vec_dup_gprVI48_AVX512VL:modemask_name): New. (define_insn mask_codeforavx512_vec_dup_gprVI12_AVX512VL:modemask_name): Ditto. (define_insn·mask_codeforavx512f_vec_dup_memmodemask_name): Delete. (define_insn mask_codeforavx512_vec_dup_memVI48_AVX512VL:modemask_name): New. (define_insn mask_codeforavx512_vec_dup_memVI12_AVX512VL:modemask_name): Ditto. OK with a small fix below. Thanks, Uros. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9edfebc..43d6655 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -954,14 +954,26 @@ (set_attr memory none,load) (set_attr mode sseinsnmode)]) -(define_insn avx512f_blendmmode - [(set (match_operand:VI48F_512 0 register_operand =v) - (vec_merge:VI48F_512 - (match_operand:VI48F_512 2 nonimmediate_operand vm) - (match_operand:VI48F_512 1 register_operand v) +(define_insn avx512_blendmmode + [(set (match_operand:V48_AVX512VL 0 register_operand =v) + (vec_merge:V48_AVX512VL + (match_operand:V48_AVX512VL 2 nonimmediate_operand vm) + (match_operand:V48_AVX512VL 1 register_operand v) (match_operand:avx512fmaskmode 3 register_operand Yk)))] TARGET_AVX512F - vsseintprefixblendmssemodesuffix\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2} + vblendmssemodesuffix\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2} + [(set_attr type ssemov) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) + +(define_insn avx512_blendmmode + [(set (match_operand:VI12_AVX512VL 0 register_operand =v) + (vec_merge:VI12_AVX512VL + (match_operand:VI12_AVX512VL 2 nonimmediate_operand vm) + (match_operand:VI12_AVX512VL 1 register_operand v) + (match_operand:avx512fmaskmode 3 register_operand Yk)))] + TARGET_AVX512BW + vpblendmssemodesuffix\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2} [(set_attr type ssemov) (set_attr prefix evex) (set_attr mode sseinsnmode)]) @@ -2467,14 +2479,21 @@ (set_attr mode ssescalarmode)]) (define_mode_attr cmp_imm_predicate - [(V16SF const_0_to_31_operand) (V8DF const_0_to_31_operand) - (V16SI const_0_to_7_operand) (V8DI const_0_to_7_operand)]) - -(define_insn avx512f_cmpmode3mask_scalar_merge_nameround_saeonly_name + [(V16SF const_0_to_31_operand) (V8DF const_0_to_31_operand) + (V16SI const_0_to_7_operand) (V8DI const_0_to_7_operand) + (V8SF const_0_to_31_operand) (V4DF const_0_to_31_operand) + (V8SI const_0_to_7_operand)(V4DI const_0_to_7_operand) + (V4SF const_0_to_31_operand) (V2DF const_0_to_31_operand) + (V4SI const_0_to_7_operand)(V2DI const_0_to_7_operand) + (V32HI const_0_to_7_operand) (V64QI const_0_to_7_operand) + (V16HI const_0_to_7_operand) (V32QI const_0_to_7_operand) + (V8HI const_0_to_7_operand)(V16QI const_0_to_7_operand)]) + +(define_insn avx512_cmpmode3mask_scalar_merge_nameround_saeonly_name [(set (match_operand:avx512fmaskmode 0 register_operand =Yk) (unspec:avx512fmaskmode - [(match_operand:VI48F_512 1 register_operand v) - (match_operand:VI48F_512 2 round_saeonly_nimm_predicate round_saeonly_constraint) + [(match_operand:V48_AVX512VL 1 register_operand v) + (match_operand:V48_AVX512VL 2 nonimmediate_operand round_saeonly_constraint) (match_operand:SI 3 cmp_imm_predicate n)] UNSPEC_PCMP))] TARGET_AVX512F round_saeonly_mode512bit_condition @@ -2484,6 +2503,20 @@ (set_attr prefix evex) (set_attr mode sseinsnmode)])
Re: [PATCH i386 AVX512] [58/n] Add vpmul[u]dq insn patterns.
On Fri, Sep 26, 2014 at 12:33 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom adds support for vpmul[u]dq insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand vec_widen_umult_even_v8simask_name): Add masking. (define_insn *vec_widen_umult_even_v8simask_name): Ditto. (define_expand vec_widen_umult_even_v4simask_name): Ditto. (define_insn *vec_widen_umult_even_v4simask_name): Ditto. (define_expand vec_widen_smult_even_v8simask_name): Ditto. (define_insn *vec_widen_smult_even_v8simask_name): Ditto. (define_expand sse4_1_mulv2siv2di3mask_name): Ditto. (define_insn *sse4_1_mulv2siv2di3mask_name): Ditto. (define_insn avx512dq_mulmode3mask_name): New. OK. Thanks, Uros. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 43d6655..e52d40c 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -9286,7 +9286,7 @@ (set_attr prefix evex) (set_attr mode XI)]) -(define_expand vec_widen_umult_even_v8si +(define_expand vec_widen_umult_even_v8simask_name [(set (match_operand:V4DI 0 register_operand) (mult:V4DI (zero_extend:V4DI @@ -9299,29 +9299,30 @@ (match_operand:V8SI 2 nonimmediate_operand) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)])] - TARGET_AVX2 + TARGET_AVX2 mask_avx512vl_condition ix86_fixup_binary_operands_no_copy (MULT, V8SImode, operands);) -(define_insn *vec_widen_umult_even_v8si - [(set (match_operand:V4DI 0 register_operand =x) +(define_insn *vec_widen_umult_even_v8simask_name + [(set (match_operand:V4DI 0 register_operand =v) (mult:V4DI (zero_extend:V4DI (vec_select:V4SI - (match_operand:V8SI 1 nonimmediate_operand %x) + (match_operand:V8SI 1 nonimmediate_operand %v) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))) (zero_extend:V4DI (vec_select:V4SI - (match_operand:V8SI 2 nonimmediate_operand xm) + (match_operand:V8SI 2 nonimmediate_operand vm) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)])] - TARGET_AVX2 ix86_binary_operator_ok (MULT, V8SImode, operands) - vpmuludq\t{%2, %1, %0|%0, %1, %2} + TARGET_AVX2 mask_avx512vl_condition +ix86_binary_operator_ok (MULT, V8SImode, operands) + vpmuludq\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} [(set_attr type sseimul) - (set_attr prefix vex) + (set_attr prefix maybe_evex) (set_attr mode OI)]) -(define_expand vec_widen_umult_even_v4si +(define_expand vec_widen_umult_even_v4simask_name [(set (match_operand:V2DI 0 register_operand) (mult:V2DI (zero_extend:V2DI @@ -9332,28 +9333,29 @@ (vec_select:V2SI (match_operand:V4SI 2 nonimmediate_operand) (parallel [(const_int 0) (const_int 2)])] - TARGET_SSE2 + TARGET_SSE2 mask_avx512vl_condition ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);) -(define_insn *vec_widen_umult_even_v4si - [(set (match_operand:V2DI 0 register_operand =x,x) +(define_insn *vec_widen_umult_even_v4simask_name + [(set (match_operand:V2DI 0 register_operand =x,v) (mult:V2DI (zero_extend:V2DI (vec_select:V2SI - (match_operand:V4SI 1 nonimmediate_operand %0,x) + (match_operand:V4SI 1 nonimmediate_operand %0,v) (parallel [(const_int 0) (const_int 2)]))) (zero_extend:V2DI (vec_select:V2SI - (match_operand:V4SI 2 nonimmediate_operand xm,xm) + (match_operand:V4SI 2 nonimmediate_operand xm,vm) (parallel [(const_int 0) (const_int 2)])] - TARGET_SSE2 ix86_binary_operator_ok (MULT, V4SImode, operands) + TARGET_SSE2 mask_avx512vl_condition +ix86_binary_operator_ok (MULT, V4SImode, operands) @ pmuludq\t{%2, %0|%0, %2} - vpmuludq\t{%2, %1, %0|%0, %1, %2} + vpmuludq\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} [(set_attr isa noavx,avx) (set_attr type sseimul) (set_attr prefix_data16 1,*) - (set_attr prefix orig,vex) + (set_attr prefix orig,maybe_evex) (set_attr mode TI)]) (define_expand vec_widen_smult_even_v16simask_name @@ -9401,7 +9403,7 @@ (set_attr prefix evex) (set_attr mode XI)]) -(define_expand vec_widen_smult_even_v8si +(define_expand vec_widen_smult_even_v8simask_name [(set (match_operand:V4DI 0 register_operand) (mult:V4DI (sign_extend:V4DI @@ -9414,30 +9416,31 @@ (match_operand:V8SI 2 nonimmediate_operand)
Re: [PATCH i386 AVX512] [59/n] Add vptest[n]m, ucmp, cmpeq insn patterns.
On Fri, Sep 26, 2014 at 12:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom adds support for vptest[n]m, ucmp, cmpeq. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_args_builtin): Handle CODE_FOR_avx512vl_cmpv4di3_mask, CODE_FOR_avx512vl_cmpv8si3_mask, CODE_FOR_avx512vl_ucmpv4di3_mask, CODE_FOR_avx512vl_ucmpv8si3_mask, CODE_FOR_avx512vl_cmpv2di3_mask, CODE_FOR_avx512vl_cmpv4si3_mask, CODE_FOR_avx512vl_ucmpv2di3_mask, CODE_FOR_avx512vl_ucmpv4si3_mask. * config/i386/sse.md (define_insn Double define_insn here. (define_insn avx512f_ucmpmode3mask_scalar_merge_name): Delete. avx512_ucmpVI12_AVX512VL:mode3mask_scalar_merge_name):New. (define_insn avx512_ucmpVI48_AVX512VL:mode3mask_scalar_merge_name):Ditto. (define_expand avx512_eqmode3mask_scalar_merge_name): Ditto. (define_insn avx512_eqmode3mask_scalar_merge_name_1): Ditto. (define_insn avx512_gtmode3mask_scalar_merge_name): Ditto. (define_insn avx512_testmmode3mask_scalar_merge_name): Ditto. (define_insn avx512_testnmmode3mask_scalar_merge_name): Ditto. OK. Thanks, Uros. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1aec70f..352ab81 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -34062,6 +34062,14 @@ ix86_expand_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_cmpv16si3_mask: case CODE_FOR_avx512f_ucmpv8di3_mask: case CODE_FOR_avx512f_ucmpv16si3_mask: + case CODE_FOR_avx512vl_cmpv4di3_mask: + case CODE_FOR_avx512vl_cmpv8si3_mask: + case CODE_FOR_avx512vl_ucmpv4di3_mask: + case CODE_FOR_avx512vl_ucmpv8si3_mask: + case CODE_FOR_avx512vl_cmpv2di3_mask: + case CODE_FOR_avx512vl_cmpv4si3_mask: + case CODE_FOR_avx512vl_ucmpv2di3_mask: + case CODE_FOR_avx512vl_ucmpv4si3_mask: error (the last argument must be a 3-bit immediate); return const0_rtx; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index e52d40c..625a2e0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2517,11 +2517,25 @@ (set_attr prefix evex) (set_attr mode sseinsnmode)]) -(define_insn avx512f_ucmpmode3mask_scalar_merge_name +(define_insn avx512_ucmpmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand =Yk) (unspec:avx512fmaskmode - [(match_operand:VI48_512 1 register_operand v) - (match_operand:VI48_512 2 nonimmediate_operand vm) + [(match_operand:VI12_AVX512VL 1 register_operand v) + (match_operand:VI12_AVX512VL 2 nonimmediate_operand vm) + (match_operand:SI 3 const_0_to_7_operand n)] + UNSPEC_UNSIGNED_PCMP))] + TARGET_AVX512BW + vpcmpussemodesuffix\t{%3, %2, %1, %0mask_scalar_merge_operand4|%0mask_scalar_merge_operand4, %1, %2, %3} + [(set_attr type ssecmp) + (set_attr length_immediate 1) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) + +(define_insn avx512_ucmpmode3mask_scalar_merge_name + [(set (match_operand:avx512fmaskmode 0 register_operand =Yk) + (unspec:avx512fmaskmode + [(match_operand:VI48_AVX512VL 1 register_operand v) + (match_operand:VI48_AVX512VL 2 nonimmediate_operand vm) (match_operand:SI 3 const_0_to_7_operand n)] UNSPEC_UNSIGNED_PCMP))] TARGET_AVX512F @@ -10265,20 +10279,42 @@ (set_attr prefix vex) (set_attr mode OI)]) -(define_expand avx512f_eqmode3mask_scalar_merge_name +(define_expand avx512_eqmode3mask_scalar_merge_name + [(set (match_operand:avx512fmaskmode 0 register_operand) + (unspec:avx512fmaskmode + [(match_operand:VI12_AVX512VL 1 register_operand) + (match_operand:VI12_AVX512VL 2 nonimmediate_operand)] + UNSPEC_MASKED_EQ))] + TARGET_AVX512BW + ix86_fixup_binary_operands_no_copy (EQ, MODEmode, operands);) + +(define_expand avx512_eqmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand) (unspec:avx512fmaskmode - [(match_operand:VI48_512 1 register_operand) - (match_operand:VI48_512 2 nonimmediate_operand)] + [(match_operand:VI48_AVX512VL 1 register_operand) + (match_operand:VI48_AVX512VL 2 nonimmediate_operand)] UNSPEC_MASKED_EQ))] TARGET_AVX512F ix86_fixup_binary_operands_no_copy (EQ, MODEmode, operands);) -(define_insn avx512f_eqmode3mask_scalar_merge_name_1 +(define_insn avx512_eqmode3mask_scalar_merge_name_1 [(set (match_operand:avx512fmaskmode 0 register_operand =Yk) (unspec:avx512fmaskmode -
Re: [PATCH i386 AVX512] [60/n] Update 128bit ashrv insn pattern.
On Fri, Sep 26, 2014 at 1:13 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This tiny patch extends 128bit ashrv expander. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator VI128_128 [V16QI V8HI V2DI]): Delete. (define_expand vashrmode3mask_name): Add masking, use VI12_128 mode iterator. (define_expand ashrv2di3mask_name): New. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 625a2e0..91d6778 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -498,7 +498,6 @@ (define_mode_iterator VI12_128 [V16QI V8HI]) (define_mode_iterator VI14_128 [V16QI V4SI]) (define_mode_iterator VI124_128 [V16QI V8HI V4SI]) -(define_mode_iterator VI128_128 [V16QI V8HI V2DI]) (define_mode_iterator VI24_128 [V8HI V4SI]) (define_mode_iterator VI248_128 [V8HI V4SI V2DI]) (define_mode_iterator VI48_128 [V4SI V2DI]) @@ -15720,17 +15719,36 @@ (match_operand:VI48_256 2 nonimmediate_operand)))] TARGET_AVX2) -(define_expand vashrmode3 - [(set (match_operand:VI128_128 0 register_operand) - (ashiftrt:VI128_128 - (match_operand:VI128_128 1 register_operand) - (match_operand:VI128_128 2 nonimmediate_operand)))] - TARGET_XOP +(define_expand vashrmode3mask_name + [(set (match_operand:VI12_128 0 register_operand) + (ashiftrt:VI12_128 + (match_operand:VI12_128 1 register_operand) + (match_operand:VI12_128 2 nonimmediate_operand)))] + TARGET_XOP || (TARGET_AVX512BW TARGET_AVX512VL) { - rtx neg = gen_reg_rtx (MODEmode); - emit_insn (gen_negmode2 (neg, operands[2])); - emit_insn (gen_xop_shamode3 (operands[0], operands[1], neg)); - DONE; + if (TARGET_XOP) +{ + rtx neg = gen_reg_rtx (MODEmode); + emit_insn (gen_negmode2 (neg, operands[2])); + emit_insn (gen_xop_shamode3 (operands[0], operands[1], neg)); + DONE; +} +}) + +(define_expand vashrv2di3mask_name + [(set (match_operand:V2DI 0 register_operand) + (ashiftrt:V2DI + (match_operand:V2DI 1 register_operand) + (match_operand:V2DI 2 nonimmediate_operand)))] + TARGET_XOP || TARGET_AVX512VL +{ + if (!TARGET_XOP) This condition is wrong. Please re-test the patch. +{ + rtx neg = gen_reg_rtx (V2DImode); + emit_insn (gen_negv2di2 (neg, operands[2])); + emit_insn (gen_xop_shav2di3 (operands[0], operands[1], neg)); + DONE; +} }) (define_expand vashrv4si3
Re: [PATCH i386 AVX512] [61/n] Update FP logic insn patterns.
On Fri, Sep 26, 2014 at 2:32 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends andnot and any_logic insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn sse_andnotVF_128_256:mode3mask_name): Add masking, use VF_128_256 mode iterator and update assembler emit code. (define_insn sse_andnotVF_512:mode3mask_name): New. (define_expand any_logic:codeVF_128_256:mode3mask_name): Add masking, use VF_128_256 mode iterator. (define_expand any_logic:codeVF_512:mode3mask_name): New. (define_insn *any_logic:codeVF_128_256:mode3mask_name): Add masking, use VF_128_256 mode iterator and update assembler emit code. (define_insn *any_logic:codeVF_512:mode3mask_name): New. (define_mode_attr avx512flogicsuff): Delete. (define_insn avx512f_logicmode): Ditto. (define_insn *andnotmode3mask_name): Update MODE_XI, MODE_OI, MODE_TI. (define_insn mask_codeforcodemode3mask_name): Ditto. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 91d6778..9835234 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2687,15 +2687,15 @@ ;; ; -(define_insn sse_andnotmode3 - [(set (match_operand:VF 0 register_operand =x,v) - (and:VF - (not:VF - (match_operand:VF 1 register_operand 0,v)) - (match_operand:VF 2 nonimmediate_operand xm,vm)))] - TARGET_SSE +(define_insn sse_andnotmode3mask_name + [(set (match_operand:VF_128_256 0 register_operand =x,v) + (and:VF_128_256 + (not:VF_128_256 + (match_operand:VF_128_256 1 register_operand 0,v)) + (match_operand:VF_128_256 2 nonimmediate_operand xm,vm)))] + TARGET_SSE mask_avx512vl_condition { - static char buf[32]; + static char buf[128]; const char *ops; const char *suffix; @@ -2715,17 +2715,17 @@ ops = andn%s\t{%%2, %%0|%%0, %%2}; break; case 1: - ops = vandn%s\t{%%2, %%1, %%0|%%0, %%1, %%2}; + ops = vandn%s\t{%%2, %%1, %%0mask_operand3_1|%%0mask_operand3_1, %%1, %%2}; break; default: gcc_unreachable (); } - /* There is no vandnp[sd]. Use vpandnq. */ - if (MODE_SIZE == 64) + /* There is no vandnp[sd] in avx512f. Use vpandn[qd]. */ + if (mask_applied !TARGET_AVX512DQ) { - suffix = q; - ops = vpandn%s\t{%%2, %%1, %%0|%%0, %%1, %%2}; + suffix = GET_MODE_INNER (MODEmode) == DFmode ? q : d; + ops = vpandn%s\t{%%2, %%1, %%0mask_operand3_1|%%0mask_operand3_1, %%1, %%2}; } snprintf (buf, sizeof (buf), ops, suffix); @@ -2745,30 +2745,63 @@ ] (const_string MODE)))]) -(define_expand codemode3 + +(define_insn sse_andnotmode3mask_name + [(set (match_operand:VF_512 0 register_operand =v) + (and:VF_512 + (not:VF_512 + (match_operand:VF_512 1 register_operand v)) + (match_operand:VF_512 2 nonimmediate_operand vm)))] + TARGET_AVX512F +{ + static char buf[128]; + const char *ops; + const char *suffix; + + suffix = ssemodesuffix; + ops = ; + + /* There is no vandnp[sd] in avx512f. Use vpandn[qd]. */ + if (!TARGET_AVX512DQ) All other patterns also have mask_applied condition here. Is the above condition correct? +{ + suffix = GET_MODE_INNER (MODEmode) == DFmode ? q : d; + ops = p; +} + + snprintf (buf, sizeof (buf), + v%sandn%s\t{%%2, %%1, %%0mask_operand3_1|%%0mask_operand3_1, %%1, %%2}, + ops, suffix); + return buf; +} + [(set_attr type sselog) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) + +(define_expand codemode3mask_name [(set (match_operand:VF_128_256 0 register_operand) - (any_logic:VF_128_256 - (match_operand:VF_128_256 1 nonimmediate_operand) - (match_operand:VF_128_256 2 nonimmediate_operand)))] - TARGET_SSE + (any_logic:VF_128_256 + (match_operand:VF_128_256 1 nonimmediate_operand) + (match_operand:VF_128_256 2 nonimmediate_operand)))] + TARGET_SSE mask_avx512vl_condition ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);) -(define_expand codemode3 +(define_expand codemode3mask_name [(set (match_operand:VF_512 0 register_operand) - (fpint_logic:VF_512 + (any_logic:VF_512 (match_operand:VF_512 1 nonimmediate_operand) (match_operand:VF_512 2 nonimmediate_operand)))] TARGET_AVX512F ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);) -(define_insn *codemode3 - [(set (match_operand:VF 0 register_operand =x,v) - (any_logic:VF - (match_operand:VF 1 nonimmediate_operand %0,v) -
Re: [PATCH i386 AVX512] [62/n] Add vpmaddubsw,vdbpsadbw insn patterns.
On Fri, Sep 26, 2014 at 4:09 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch introduces patterns for vpmaddubsw and vdbpsadbw insn. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_c_enum unspec): Add UNSPEC_DBPSADBW, UNSPEC_PMADDUBSW512. (define_insn avx512bw_pmaddubsw512modemask_name): New. (define_insn mask_codeforavx512bw_dbpsadbwmodemask_name): Ditto. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9835234..601373b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -130,6 +130,8 @@ UNSPEC_SHA256RNDS2 ;; For AVX512BW support + UNSPEC_DBPSADBW + UNSPEC_PMADDUBSW512 UNSPEC_PSHUFHW UNSPEC_PSHUFLW UNSPEC_CVTINT2MASK @@ -13401,6 +13403,19 @@ (set_attr prefix vex) (set_attr mode OI)]) +;; Unspec version for intrinsics. +(define_insn avx512bw_pmaddubsw512modemask_name + [(set (match_operand:VI2_AVX512VL 0 register_operand =v) + (unspec:VI2_AVX512VL +[(match_operand:dbpsadbwmode 1 register_operand v) + (match_operand:dbpsadbwmode 2 nonimmediate_operand vm)] + UNSPEC_PMADDUBSW512))] + TARGET_AVX512BW + vpmaddubsw\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}; + [(set_attr type sseiadd) + (set_attr prefix evex) + (set_attr mode XI)]) + Can the one above be described using standard RTX, perhaps something similar to avx2_pmaddubsw256? (define_insn ssse3_pmaddubsw128 [(set (match_operand:V8HI 0 register_operand =x,x) (ss_plus:V8HI @@ -18097,6 +18112,21 @@ [(set_attr prefix evex) (set_attr mode ssescalarmode)]) +(define_insn mask_codeforavx512bw_dbpsadbwmodemask_name + [(set (match_operand:VI2_AVX512VL 0 register_operand =v) + (unspec:VI2_AVX512VL + [(match_operand:dbpsadbwmode 1 register_operand v) + (match_operand:dbpsadbwmode 2 nonimmediate_operand vm) + (match_operand:SI 3 const_0_to_255_operand)] + UNSPEC_DBPSADBW))] + TARGET_AVX512BW + vdbpsadbw\t{%3, %2, %1, %0mask_operand4|%0mask_operand4, %1, %2, %3} + [(set_attr isa avx) + (set_attr type sselog1) + (set_attr length_immediate 1) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) + (define_insn clzmode2mask_name [(set (match_operand:VI48_AVX512VL 0 register_operand =v) (clz:VI48_AVX512VL
[PATCH, i386]: Enable reminder{sd,df,xf} and fmod{sf,df,xf} only for flag_finite_math_only.
Hello! According to C99, reminder function returns: If x or y is a NaN, a NaN is returned. If x is an infinity, and y is not a NaN, a domain error occurs, and a NaN is returned. If y is zero, and x is not a NaN, a domain error occurs, and a NaN is returned. and fmod returns: If x or y is a NaN, a NaN is returned. If x is an infinity, a domain error occurs, and a NaN is returned. If y is zero, a domain error occurs, and a NaN is returned. If x is +0 (-0), and y is not zero, +0 (-0) is returned. However, x87 fprem and fprem1 instructions that are used to implement these builtin functions do not return NaN for infinities, but generate invalid-arithmetic-operand exception. Attached patch enables these builtins for finite math only, consistent with gcc documentation: '-ffinite-math-only': Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs. This option is not turned on by any '-O' option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. 2014-09-30 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (fmodxf3): Enable for flag_finite_math_only only. (fmodmode3): Ditto. (fpremxf4_i387): Ditto. (reminderxf3): Ditto. (remindermode3): Ditto. (fprem1xf4_i387): Ditto. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. The patch also fixes ieee_2.f90 testsuite failure with FX's pending IEEE support improvement patch. 2014-09-30 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (fmodxf3): Enable for flag_finite_math_only only. (fmodmode3): Ditto. (fpremxf4_i387): Ditto. (reminderxf3): Ditto. (remindermode3): Ditto. (fprem1xf4_i387): Ditto. The patch will be committed to mainline and other release branches. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 215705) +++ config/i386/i386.md (working copy) @@ -13813,7 +13813,8 @@ (set (reg:CCFP FPSR_REG) (unspec:CCFP [(match_dup 2) (match_dup 3)] UNSPEC_C2_FLAG))] - TARGET_USE_FANCY_MATH_387 + TARGET_USE_FANCY_MATH_387 +flag_finite_math_only fprem [(set_attr type fpspc) (set_attr mode XF)]) @@ -13822,7 +13823,8 @@ [(use (match_operand:XF 0 register_operand)) (use (match_operand:XF 1 general_operand)) (use (match_operand:XF 2 general_operand))] - TARGET_USE_FANCY_MATH_387 + TARGET_USE_FANCY_MATH_387 +flag_finite_math_only { rtx_code_label *label = gen_label_rtx (); @@ -13845,7 +13847,8 @@ [(use (match_operand:MODEF 0 register_operand)) (use (match_operand:MODEF 1 general_operand)) (use (match_operand:MODEF 2 general_operand))] - TARGET_USE_FANCY_MATH_387 + TARGET_USE_FANCY_MATH_387 +flag_finite_math_only { rtx (*gen_truncxf) (rtx, rtx); @@ -13884,7 +13887,8 @@ (set (reg:CCFP FPSR_REG) (unspec:CCFP [(match_dup 2) (match_dup 3)] UNSPEC_C2_FLAG))] - TARGET_USE_FANCY_MATH_387 + TARGET_USE_FANCY_MATH_387 +flag_finite_math_only fprem1 [(set_attr type fpspc) (set_attr mode XF)]) @@ -13893,7 +13897,8 @@ [(use (match_operand:XF 0 register_operand)) (use (match_operand:XF 1 general_operand)) (use (match_operand:XF 2 general_operand))] - TARGET_USE_FANCY_MATH_387 + TARGET_USE_FANCY_MATH_387 +flag_finite_math_only { rtx_code_label *label = gen_label_rtx (); @@ -13916,7 +13921,8 @@ [(use (match_operand:MODEF 0 register_operand)) (use (match_operand:MODEF 1 general_operand)) (use (match_operand:MODEF 2 general_operand))] - TARGET_USE_FANCY_MATH_387 + TARGET_USE_FANCY_MATH_387 +flag_finite_math_only { rtx (*gen_truncxf) (rtx, rtx);
Re: [PATCH X86, PR62128] Rotate pattern for AVX2
On Tue, Sep 30, 2014 at 6:47 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Patch resubmitted from https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01400.html The patch fix PR62128 and gcc.target/i386/pr52252-atom.c in core-avx2 make check. The test in pr62128 is exactly TEST 22 from gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct or not. The patch developed similar to define_insn_and_split *avx_vperm_broadcast_mode. The patch passed x86 bootstrap and make check (+2 new passes for -march=core-avx2). Is it ok? Evgeny ChangeLog: 2014-09-30 Evgeny Stupachenko evstu...@gmail.com * config/i386/sse.md (avx2_palignrv4di): New. * config/i386/sse.md (avx2_rotatemode_perm): New. +(define_insn avx2_palignrv4di + [(set (match_operand:V4DI 0 register_operand =x) + (unspec:V4DI + [(match_operand:V4DI 1 register_operand x) + (match_operand:V4DI 2 nonimmediate_operand xm) + (match_operand:SI 3 const_0_to_255_operand n)] + UNSPEC_VPALIGNRDI))] + TARGET_AVX2 + vpalignr\t{%3, %2, %1, %0|%0, %1, %2, %3} + [(set_attr type sselog) + (set_attr prefix vex) + (set_attr mode OI)]) Just reuse UNSPEC_PALIGNR, no need for a new unspec. +(define_insn_and_split avx2_rotatemode_perm + [(set (match_operand:V_256 0 register_operand =x) + (vec_select:V_256 + (match_operand:V_256 1 register_operand x) + (match_parallel 2 palignr_operand + [(match_operand 3 const_int_operand n)])))] + TARGET_AVX2 + # + reload_completed + [(const_int 0)] This should be a define_expand. There is nothing that requires hard registers. You can achieve mode-changes by using gen_lowpart, see many examples in sse.md + if (shift 16) + emit_insn (gen_avx2_palignrv4di (op0, + op0, + op1, + GEN_INT (shift))); + else if (shift 16) + emit_insn (gen_avx2_palignrv4di (op0, + op1, + op0, + GEN_INT (shift - 16))); What happens when shift == 16? Uros.
Re: [PATCH X86, PR62128] Rotate pattern for AVX2
On Tue, Sep 30, 2014 at 7:06 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Sep 30, 2014 at 6:47 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Patch resubmitted from https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01400.html The patch fix PR62128 and gcc.target/i386/pr52252-atom.c in core-avx2 make check. The test in pr62128 is exactly TEST 22 from gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct or not. The patch developed similar to define_insn_and_split *avx_vperm_broadcast_mode. The patch passed x86 bootstrap and make check (+2 new passes for -march=core-avx2). Is it ok? Please try following (totally untested) expander: --cut here-- (define_expand avx2_rotatemode_perm [(set (match_operand:V_256 0 register_operand) (vec_select:V_256 (match_operand:V_256 1 register_operand) (match_parallel 2 palignr_operand [(match_operand 3 const_int_operand n)])))] TARGET_AVX2 { int shift = INTVAL (operands[3]) * ssescalarsize; rtx insn; rtx op0 = gen_lowpart (V4DImode, operands[0]); rtx op1 = gen_lowpart (V4DImode, operands[1]); emit_insn (gen_avx2_permv2ti (op0, op1, op1, GEN_INT (33))); op0 = gen_lowpart (V2TImode, operands[0]); op1 = gen_lowpart (V2TImode, operands[1]); if (shift GET_MODE_SIZE (TImode)) insn = gen_avx2_palignrv2ti (op0, op0, op1, GEN_INT (shift))); else insn = gen_avx2_palignrv2ti (op0, op1, op0, GEN_INT (shift - 16))); emit_insn (insn); DONE; } --cut here-- BTW: Looking at the code above, it looks to me that avx2_permv2ti should accept V2TImode operands, not V4DImode. Uros.
Re: [PATCH X86, PR62128] Rotate pattern for AVX2
On Tue, Sep 30, 2014 at 8:08 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Sep 30, 2014 at 7:06 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Sep 30, 2014 at 6:47 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Patch resubmitted from https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01400.html The patch fix PR62128 and gcc.target/i386/pr52252-atom.c in core-avx2 make check. The test in pr62128 is exactly TEST 22 from gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct or not. The patch developed similar to define_insn_and_split *avx_vperm_broadcast_mode. The patch passed x86 bootstrap and make check (+2 new passes for -march=core-avx2). Is it ok? Please try following (totally untested) expander: As usual, the wrong version was pasted. This should read: --cut here-- (define_expand avx2_rotatemode_perm [(set (match_operand:V_256 0 register_operand) (vec_select:V_256 (match_operand:V_256 1 register_operand) (match_parallel 2 palignr_operand [(match_operand 3 const_int_operand n)])))] TARGET_AVX2 { int shift = INTVAL (operands[3]) * ssescalarsize; rtx insn; rtx op1 = gen_lowpart (V4DImode, operands[1]); rtx t2 = gen_reg_rtx (V4DImode); emit_insn (gen_avx2_permv2ti (t2, op1, op1, GEN_INT (33))); op0 = gen_lowpart (V2TImode, operands[0]); op1 = gen_lowpart (V2TImode, operands[1]); t2 = gen_lowpart (V2TImode, t2); if (shift GET_MODE_SIZE (TImode)) insn = gen_avx2_palignrv2ti (op0, t2, op1, GEN_INT (shift))); else insn = gen_avx2_palignrv2ti (op0, op1, t2, GEN_INT (shift - 16))); emit_insn (insn); DONE; } --cut here-- Uros.
Re: [PATCH X86, PR62128] Rotate pattern for AVX2
On Wed, Oct 1, 2014 at 12:13 AM, Evgeny Stupachenko evstu...@gmail.com wrote: expand_vselect for some reason ignores the expander. Does it work with expanders? The comment talks about insn only: /* Construct (set target (vec_select op0 (parallel perm))) and return true if that's a valid instruction in the active ISA. */ It looks to me that the whole approach is wrong from the beginning. There is already a function that generates perm/palignr sequence, conveniently named expand_vec_perm_palignr. This function should be extended to handle AVX2 sequence. You don't have to add any new patterns, existing avx2_permv2ti and avx2_palignr2ti should do the trick. And, as said by H.J., please add the testcase from the PR that will exercise the code path. Without the testcase included, the patch is unreviewable, and this is the reason why no maintainer (including me) wants to approve it in its current form. Uros.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Getting back to initial patch, is it ok? IMO, we should start with Jakub's proposed patch [1] [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html Uros. It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check. X86 bootstrap is also ok. 2014-10-01 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2 PALINGR instruction. * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try for AVX2. On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko evstu...@gmail.com wrote: The test in pr62128 is exactly TEST 22 from gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct or not. Resubmitting patch looks good as current mail thread is already too complicated. On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko evstu...@gmail.com wrote: It fix gcc.target/i386/pr52252-atom.c in core-avx2 make check and pr62128. I suggest you resubmit the patch as a bug fix for pr62128 with testcases from pr62128 as well as gcc.target/i386/pr52252-atom.c. -- H.J.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Wed, Oct 1, 2014 at 1:38 PM, Jakub Jelinek ja...@redhat.com wrote: That doesn't compile, will post a new version; got interrupted when I found that in GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' one test is miscompiled even with unpatched compiler, debugging that now. Let's start with the bugfix. The || doesn't make any sense, and we really want to fill in 4 bits (0, 1, 4, 5) of the immediate, not just two, anyway. valid_perm_using_mode_p (V2TImode, d) should already guarantee that it is possible to permutate it as V2TI, so all we care about are the values of d-perm[0] and d-perm[nelt / 2], but we care not just which lane it is, but also which operand (src1 or src2). Tested with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' Ok for trunk/4.9/4.8? 2014-10-01 Jakub Jelinek ja...@redhat.com PR target/63428 * config/i386/i386.c (expand_vec_perm_pshufb): Fix up rperm[0] argument to avx2_permv2ti. * gcc.dg/torture/vshuf-4.inc: Move test 122 from EXPTESTS to test 24 in TESTS. OK. Thanks, Uros.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Wed, Oct 1, 2014 at 2:17 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Oct 01, 2014 at 01:45:54PM +0200, Uros Bizjak wrote: OK. Thanks. Second step is a tiny optimization, for the simplified 122 (now 24) vshuf-v4di.c testcase: typedef unsigned long long V __attribute__ ((vector_size (32))); V a, b, c, d; int main () { int i; for (i = 0; i 4; ++i) { a[i] = i + 2; b[i] = 4 + i + 2; } asm volatile ( : : : memory); c = __builtin_shuffle (a, b, (V) { 2, 5, 6, 3 }); d = __builtin_shuffle ((V) { 2, 3, 4, 5 }, (V) { 6, 7, 8, 9 }, (V) { 2, 5, 6, 3 }); if (__builtin_memcmp (c, d, sizeof (c))) __builtin_abort (); return 0; } this patch allows better code to be generated: - vmovdqa b(%rip), %ymm0 + vpermq $238, a(%rip), %ymm1 movl$32, %edx - movl$d, %esi - vmovdqa a(%rip), %ymm1 + vmovdqa b(%rip), %ymm0 + movl$d, %esi movl$c, %edi - vperm2i128 $17, %ymm0, %ymm1, %ymm1 vpblendd$195, %ymm1, %ymm0, %ymm0 vmovdqa %ymm0, c(%rip) That is because vperm2i128 $17 unnecessarily uses two operands when all the data it grabs are from a single one. So, by canonicalizing the permutation we can emit vpermq $238 instead. Perhaps more places might benefit from extra canonicalize_perm calls (two spots already use that beyond the single one on the expansion/testing entry point). Tested again with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \ RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' on x86_64-linux. Ok for trunk? 2014-10-01 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (expand_vec_perm_vperm2f128): Canonicalize dfirst permutation. OK. Thanks, Uros.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote: And now the expand_vec_perm_palignr improvement, tested with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \ RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' E.g. typedef unsigned long long V __attribute__ ((vector_size (32))); extern void abort (void); V a, b, c, d; void test_14 (void) { V mask = { 6, 1, 3, 4 }; int i; c = __builtin_shuffle (a, mask); d = __builtin_shuffle (a, b, mask); } (distilled from test 15 in vshuf-v4di.c) results in: - vmovdqa a(%rip), %ymm0 - vpermq $54, %ymm0, %ymm1 - vpshufb .LC1(%rip), %ymm0, %ymm0 - vmovdqa %ymm1, c(%rip) - vmovdqa b(%rip), %ymm1 - vpshufb .LC0(%rip), %ymm1, %ymm1 - vpermq $78, %ymm1, %ymm1 - vpor%ymm1, %ymm0, %ymm0 + vmovdqa a(%rip), %ymm1 + vpermq $54, %ymm1, %ymm0 + vmovdqa %ymm0, c(%rip) + vmovdqa b(%rip), %ymm0 + vpalignr$8, %ymm1, %ymm0, %ymm0 + vpermq $99, %ymm0, %ymm0 vmovdqa %ymm0, d(%rip) vzeroupper ret change (and two fewer .rodata constants). Ok for trunk? 2014-10-01 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (expand_vec_perm_palignr): Handle 256-bit vectors for TARGET_AVX2. Please mention PR 62128 and include the testcase from the PR. Also, please add a version of gcc.target/i386/pr52252-atom.c, compiled with -mavx2 (perhaps named pr52252-avx2.c) OK with a small adjustment below. Thanks, Uros. --- gcc/config/i386/i386.c.jj 2014-10-01 14:24:16.483138899 +0200 +++ gcc/config/i386/i386.c 2014-10-01 14:27:53.577222011 +0200 @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v rtx shift, target; struct expand_vec_perm_d dcopy; - /* Even with AVX, palignr only operates on 128-bit vectors. */ - if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) + /* Even with AVX, palignr only operates on 128-bit vectors, + in AVX2 palignr operates on both 128-bit lanes. */ + if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) + (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32)) Please simplify the above condition ... return false; - min = nelt, max = 0; + min = 2 * nelt, max = 0; for (i = 0; i nelt; ++i) { unsigned e = d-perm[i]; + if (GET_MODE_SIZE (d-vmode) == 32) + e = (e ((nelt / 2) - 1)) | ((e nelt) 1); if (e min) min = e; if (e max) max = e; } - if (min == 0 || max - min = nelt) + if (min == 0 + || max - min = (GET_MODE_SIZE (d-vmode) == 32 ? nelt / 2 : nelt)) return false; /* Given that we have SSSE3, we know we'll be able to implement the - single operand permutation after the palignr with pshufb. */ - if (d-testing_p) + single operand permutation after the palignr with pshufb for + 128-bit vectors. */ + if (d-testing_p GET_MODE_SIZE (d-vmode) == 16) return true; dcopy = *d; - shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode))); - target = gen_reg_rtx (TImode); - emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1), - gen_lowpart (TImode, d-op0), shift)); - - dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target); - dcopy.one_operand_p = true; in_order = true; for (i = 0; i nelt; ++i) { - unsigned e = dcopy.perm[i] - min; + unsigned e = dcopy.perm[i]; + if (GET_MODE_SIZE (d-vmode) == 32 + e = nelt + (e (nelt / 2 - 1)) min) + e = e - min - (nelt / 2); + else + e = e - min; if (e != i) in_order = false; dcopy.perm[i] = e; } + dcopy.one_operand_p = true; + + /* For AVX2, test whether we can permute the result in one instruction. */ + if (d-testing_p) +{ + if (in_order) + return true; + dcopy.op1 = dcopy.op0; + return expand_vec_perm_1 (dcopy); +} + + shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode))); + if (GET_MODE_SIZE (d-vmode) == 16) +{ + target = gen_reg_rtx (TImode); + emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1), + gen_lowpart (TImode, d-op0), shift)); +} + else +{ + target = gen_reg_rtx (V2TImode); + emit_insn (gen_avx2_palignrv2ti (target, gen_lowpart (V2TImode, d-op1), + gen_lowpart (V2TImode, d-op0), shift)); +} + + dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target); /* Test for the degenerate case where the alignment by itself produces the desired permutation. */ @@ -43345,7 +43376,7 @@ expand_vec_perm_palignr (struct expand_v } ok = expand_vec_perm_1 (dcopy); - gcc_assert (ok); + gcc_assert (ok || GET_MODE_SIZE (d-vmode) == 32); return ok; }
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Wed, Oct 1, 2014 at 4:12 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Oct 01, 2014 at 03:09:59PM +0200, Uros Bizjak wrote: 2014-10-01 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (expand_vec_perm_palignr): Handle 256-bit vectors for TARGET_AVX2. Please mention PR 62128 and include the testcase from the PR. Also, please add a version of gcc.target/i386/pr52252-atom.c, compiled with -mavx2 (perhaps named pr52252-avx2.c) This patch doesn't fix PR62128, and it is already tested (even without GCC_RUN_EXPENSIVE_TESTS=1) in the vshuf*.c torture tests (several of them). Ah, OK then. If you want coverage not just for the default flags, I'd say we should say for -O2 only just add gcc.target/i386/{avx,avx2,avx512}-vshuf-*.c tests that would include ../../gcc.dg/torture/vshuf-*.c and be compiled/run with -mavx/-mavx2/-mavx512 options instead of the default ones. No, the above should be good for now. The failure was triggered by the target that defaults to -mavx2, and for avx512f we can run this suite in the way you suggest. For PR62128, IMHO the right fix is attached. Note, this is again covered in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c). With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes: - vpshufb .LC0(%rip), %ymm0, %ymm1 - vpshufb .LC1(%rip), %ymm0, %ymm0 - vpermq $78, %ymm1, %ymm1 - vpor%ymm1, %ymm0, %ymm0 + vpermq $78, %ymm0, %ymm1 + vpalignr$1, %ymm0, %ymm1, %ymm0 ret --- gcc/config/i386/i386.c.jj 2014-10-01 14:24:16.483138899 +0200 +++ gcc/config/i386/i386.c 2014-10-01 14:27:53.577222011 +0200 @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v rtx shift, target; struct expand_vec_perm_d dcopy; - /* Even with AVX, palignr only operates on 128-bit vectors. */ - if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) + /* Even with AVX, palignr only operates on 128-bit vectors, + in AVX2 palignr operates on both 128-bit lanes. */ + if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) + (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32)) Please simplify the above condition ... How? I don't see how that can be simplified, it can be transformed into if (!((TARGET_SSSE3 GET_MODE_SIZE (d-vmode) == 16) || (TARGET_AVX2 GET_MODE_SIZE (d-vmode) == 32))) but I don't find that simpler. I was thinking of the above, but you are correct, the change doesn't bring us anything. So, the patch is OK as it was. Thanks, Uros.
Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation
On Wed, Oct 1, 2014 at 4:10 PM, Ilya Enkovich enkovich@gmail.com wrote: +;; Return true if size of VALUE can be stored in a sign +;; extended immediate field. +(define_predicate x86_64_immediate_size_operand + (match_code symbol_ref) +{ + if (!TARGET_64BIT) +return true; + + /* For 64 bit target we may assume size of object fits + immediate only when code model guarantees that. */ + return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL); +}) + This predicate causes bootstrap error: predicates.md:362:38: error: unused parameter 'op' [-Werror=unused-parameter] Huh? How is this predicate different from e.g. (define_predicate compare_operator (match_code compare)) ? Can you please show generated code from gcc/insn-preds.c? Uros.
Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation
On Wed, Oct 1, 2014 at 7:02 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-10-01 19:17 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Wed, Oct 1, 2014 at 4:10 PM, Ilya Enkovich enkovich@gmail.com wrote: +;; Return true if size of VALUE can be stored in a sign +;; extended immediate field. +(define_predicate x86_64_immediate_size_operand + (match_code symbol_ref) +{ + if (!TARGET_64BIT) +return true; + + /* For 64 bit target we may assume size of object fits + immediate only when code model guarantees that. */ + return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL); +}) + This predicate causes bootstrap error: predicates.md:362:38: error: unused parameter 'op' [-Werror=unused-parameter] Huh? How is this predicate different from e.g. (define_predicate compare_operator (match_code compare)) ? Can you please show generated code from gcc/insn-preds.c? Uros. It is different because it has a code block which is used to generate additional function. Here is what generated for the predicate: static inline int x86_64_immediate_size_operand_1 (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED) { if (!TARGET_64BIT) return true; /* For 64 bit target we may assume size of object fits immediate only when code model guarantees that. */ return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL); } int x86_64_immediate_size_operand (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED) { return ((GET_CODE (op) == SYMBOL_REF) ( (x86_64_immediate_size_operand_1 (op, mode ( (mode == VOIDmode || GET_MODE (op) == mode)); } Well, --cut here-- (define_predicate x86_64_immediate_size_operand (and (match_code symbol_ref) (ior (not (match_test TARGET_64BIT)) (and (match_test (ix86_cmodel == CM_SMALL)) (match_test (ix86_cmodel == CM_KERNEL)) --cut here-- Uros.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote: OK. And now the expand_vec_perm_palignr improvement, tested with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \ RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' E.g. typedef unsigned long long V __attribute__ ((vector_size (32))); extern void abort (void); V a, b, c, d; void test_14 (void) { V mask = { 6, 1, 3, 4 }; int i; c = __builtin_shuffle (a, mask); d = __builtin_shuffle (a, b, mask); } (distilled from test 15 in vshuf-v4di.c) results in: - vmovdqa a(%rip), %ymm0 - vpermq $54, %ymm0, %ymm1 - vpshufb .LC1(%rip), %ymm0, %ymm0 - vmovdqa %ymm1, c(%rip) - vmovdqa b(%rip), %ymm1 - vpshufb .LC0(%rip), %ymm1, %ymm1 - vpermq $78, %ymm1, %ymm1 - vpor%ymm1, %ymm0, %ymm0 + vmovdqa a(%rip), %ymm1 + vpermq $54, %ymm1, %ymm0 + vmovdqa %ymm0, c(%rip) + vmovdqa b(%rip), %ymm0 + vpalignr$8, %ymm1, %ymm0, %ymm0 + vpermq $99, %ymm0, %ymm0 vmovdqa %ymm0, d(%rip) vzeroupper ret change (and two fewer .rodata constants). On a related note, I would like to point out that gcc.target/i386/pr61403.c also fails to generate blend insn with -mavx2. The new insn sequence includes lots of new vpshufb insns with memory access. Uros.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Wed, Oct 1, 2014 at 10:43 PM, Jakub Jelinek ja...@redhat.com wrote: For PR62128, IMHO the right fix is attached. Note, this is again covered in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c). With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes: - vpshufb .LC0(%rip), %ymm0, %ymm1 - vpshufb .LC1(%rip), %ymm0, %ymm0 - vpermq $78, %ymm1, %ymm1 - vpor%ymm1, %ymm0, %ymm0 + vpermq $78, %ymm0, %ymm1 + vpalignr$1, %ymm0, %ymm1, %ymm0 ret 2014-10-01 Jakub Jelinek ja...@redhat.com PR target/62128 * config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr if it expands to a single insn only. (expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument. If true, fail unless in_order is true. Add forward declaration. (expand_vec_perm_vperm2f128): Fix up comment about which permutation is useful for one_operand_p. (ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller. Now bootstrapped/regtested on x86_64-linux and i686-linux (and additionally tested also with --target_board=unix/-mavx2), ok for trunk? OK. Thanks, Uros.
Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation
On Thu, Oct 2, 2014 at 10:23 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-10-01 Ilya Enkovich ilya.enkov...@intel.com * config/i386/i386.md (UNSPEC_SIZEOF): New. (move_size_reloc_mode): New. * config/i386/predicates.md (symbol_operand): New. (x86_64_immediate_size_operand): New. OK with a trivial adjustment. Thanks, Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 65990b1..1901023 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -79,6 +79,7 @@ UNSPEC_PLTOFF UNSPEC_MACHOPIC_OFFSET UNSPEC_PCREL + UNSPEC_SIZEOF ;; Prologue support UNSPEC_STACK_ALLOC @@ -18789,6 +18790,21 @@ bndstx\t{%2, %3|%3, %2} [(set_attr type mpxst)]) +(define_insn move_size_reloc_mode + [(set (match_operand:SWI48 0 register_operand =r) + (unspec:SWI48 +[(match_operand:SWI48 1 symbol_operand)] +UNSPEC_SIZEOF))] + TARGET_MPX +{ + if (x86_64_immediate_size_operand (operands[1], VOIDmode)) +return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE}; + else +return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE}; +} + [(set_attr type imov) + (set_attr mode MODE)]) + (include mmx.md) (include sse.md) (include sync.md) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index fea7754..1875339 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -124,6 +124,10 @@ (match_test TARGET_64BIT) (match_test REGNO (op) BX_REG))) +;; Return true if VALUE is symbol reference +(define_predicate symbol_operand + (match_code symbol_ref)) + ;; Return true if VALUE can be stored in a sign extended immediate field. (define_predicate x86_64_immediate_operand (match_code const_int,symbol_ref,label_ref,const) @@ -336,6 +340,14 @@ return false; }) +;; Return true if size of VALUE can be stored in a sign +;; extended immediate field. +(define_predicate x86_64_immediate_size_operand + (and (match_code symbol_ref) + (ior (not (match_test TARGET_64BIT)) + (ior (match_test (ix86_cmodel == CM_SMALL)) +(match_test (ix86_cmodel == CM_KERNEL)) Uh yes, I did a trivial thinko. Please note that above (ior) can be rewritten with a multiple arguments: (define_predicate x86_64_immediate_size_operand (and (match_code symbol_ref) (ior (not (match_test TARGET_64BIT)) (match_test ix86_cmodel == CM_SMALL) (match_test ix86_cmodel == CM_KERNEL Also, there were unneded parenthesis for match_test removed. ;; Return true if OP is general operand representable on x86_64. (define_predicate x86_64_general_operand (if_then_else (match_test TARGET_64BIT)
[RFC, RFH PATCH, i386] Fix gcc.target/i386/pr61403.c FAIL with -mavx2
On Wed, Oct 1, 2014 at 9:03 PM, Uros Bizjak ubiz...@gmail.com wrote: And now the expand_vec_perm_palignr improvement, tested with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \ RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' E.g. typedef unsigned long long V __attribute__ ((vector_size (32))); extern void abort (void); V a, b, c, d; void test_14 (void) { V mask = { 6, 1, 3, 4 }; int i; c = __builtin_shuffle (a, mask); d = __builtin_shuffle (a, b, mask); } (distilled from test 15 in vshuf-v4di.c) results in: - vmovdqa a(%rip), %ymm0 - vpermq $54, %ymm0, %ymm1 - vpshufb .LC1(%rip), %ymm0, %ymm0 - vmovdqa %ymm1, c(%rip) - vmovdqa b(%rip), %ymm1 - vpshufb .LC0(%rip), %ymm1, %ymm1 - vpermq $78, %ymm1, %ymm1 - vpor%ymm1, %ymm0, %ymm0 + vmovdqa a(%rip), %ymm1 + vpermq $54, %ymm1, %ymm0 + vmovdqa %ymm0, c(%rip) + vmovdqa b(%rip), %ymm0 + vpalignr$8, %ymm1, %ymm0, %ymm0 + vpermq $99, %ymm0, %ymm0 vmovdqa %ymm0, d(%rip) vzeroupper ret change (and two fewer .rodata constants). On a related note, I would like to point out that gcc.target/i386/pr61403.c also fails to generate blend insn with -mavx2. The new insn sequence includes lots of new vpshufb insns with memory access. Following patch fixes the failure: --cut here-- Index: i386.c === --- i386.c (revision 215802) +++ i386.c (working copy) @@ -43407,8 +43407,10 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d AVX and AVX2 as they require more than 2 instructions. */ if (d-one_operand_p) return false; - if (TARGET_SSE4_1 GET_MODE_SIZE (vmode) == 16) + if (TARGET_AVX2 GET_MODE_SIZE (vmode) == 32) ; + else if (TARGET_SSE4_1 GET_MODE_SIZE (vmode) == 16) +; else return false; --cut here-- The comment above expand_vec_perm_pblendv claims that: /* Use the same checks as in expand_vec_perm_blend, but skipping AVX and AVX2 as they require more than 2 instructions. */ However, I see a significant reduction in vpshufb and vpor instructions (33-16 and 22-11), and 6 new vblendps insns. BTW: I have no access to avx2 target, so I can't test the patch with a runtime tests. OTOH, it doesn't ICE for GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'. Jakub, what do you think? Uros.
Re: RFA: one more version of the patch for PR61360
On Thu, Oct 2, 2014 at 10:13 PM, Vladimir Makarov vmaka...@redhat.com wrote: I guess we achieved the consensus about the following patch to fix PR61360 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was successfully bootstrapped and tested (w/wo -march=amdfam10) on x86/x86-64. Is it ok to commit to trunk? I've tested your patch and unfortunately it doesn't work: In file included from /var/tmp/moz-build-dir/js/src/shell/Unified_cpp_js_src_shell0.cpp:15:0: /var/tmp/mozilla-central/js/src/shell/js.cpp: In function ‘void Process(JSContext*, JSObject*, const char*, bool)’: /var/tmp/mozilla-central/js/src/shell/js.cpp:592:1: internal compiler error: in lra_update_insn_recog_data, at lra.c:1221 } ^ 0xa9d9ec lra_update_insn_recog_data(rtx_insn*) ../../gcc/gcc/lra.c:1220 0xab450f eliminate_regs_in_insn ../../gcc/gcc/lra-eliminations.c:1077 0xab450f process_insn_for_elimination ../../gcc/gcc/lra-eliminations.c:1344 0xab450f lra_eliminate(bool, bool) ../../gcc/gcc/lra-eliminations.c:1408 0xa9f2da lra(_IO_FILE*) ../../gcc/gcc/lra.c:2270 0xa5d659 do_reload ../../gcc/gcc/ira.c:5311 0xa5d659 execute ../../gcc/gcc/ira.c:5470 Testcase is attached: % g++ -c -march=amdfam10 -w -O2 js.ii js.ii: In function ‘void RunFile(C)’: js.ii:64:1: internal compiler error: in lra_update_insn_recog_data, at lra.c:1221 Thanks for reporting this, Marcus. The problem now is in optimize_function_for_size_p. It is false, when we define and cache enable attributes at early stages (instantation of virtual regs) and true later. It is returning us to the same problem. I believe that we should not have enable attributes depending on optimization options or on the current running pass if we don't want the current solution in the trunk (with recog_init). Setting right value for optimize_function_for_size_p does not solve the problem as we can have different options for different functions in the same compilation file. So minimal solution would be removing optimize_function_for_size_p from the attribute definition. But I guess we could remove all condition. Unfortunately, Ganesh did not post is it really beneficial to switch off this alternative for AMD CPUs even if AMD optimization guide recommends it. I propose to split the pattern into size-optimized and normal pattern. The patch implements this proposal. Uros. Index: i386.md === --- i386.md (revision 215797) +++ i386.md (working copy) @@ -4766,6 +4766,38 @@ } }) +(define_insn *floatSWI48:modeMODEF:mode2_sse_size + [(set (match_operand:MODEF 0 register_operand =f,x,x) + (float:MODEF + (match_operand:SWI48 1 nonimmediate_operand m,r,m)))] + SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH +optimize_function_for_size_p (cfun) + @ + fild%Z1\t%1 + %vcvtsi2MODEF:ssemodesuffixSWI48:rex64suffix\t{%1, %d0|%d0, %1} + %vcvtsi2MODEF:ssemodesuffixSWI48:rex64suffix\t{%1, %d0|%d0, %1} + [(set_attr type fmov,sseicvt,sseicvt) + (set_attr prefix orig,maybe_vex,maybe_vex) + (set_attr mode MODEF:MODE) + (set (attr prefix_rex) + (if_then_else + (and (eq_attr prefix maybe_vex) + (match_test SWI48:MODEmode == DImode)) + (const_string 1) + (const_string *))) + (set_attr unit i387,*,*) + (set_attr athlon_decode *,double,direct) + (set_attr amdfam10_decode *,vector,double) + (set_attr bdver1_decode *,double,direct) + (set_attr fp_int_src true) + (set (attr enabled) + (cond [(eq_attr alternative 0) + (symbol_ref TARGET_MIX_SSE_I387 +X87_ENABLE_FLOAT (MODEF:MODEmode, +SWI48:MODEmode)) + ] + (const_int 1)))]) + (define_insn *floatSWI48:modeMODEF:mode2_sse [(set (match_operand:MODEF 0 register_operand =f,x,x) (float:MODEF @@ -4795,16 +4827,9 @@ X87_ENABLE_FLOAT (MODEF:MODEmode, SWI48:MODEmode)) (eq_attr alternative 1) - /* ??? For sched1 we need constrain_operands to be able to - select an alternative. Leave this enabled before RA. */ - (symbol_ref TARGET_INTER_UNIT_CONVERSIONS - || optimize_function_for_size_p (cfun) - || !(reload_completed -|| reload_in_progress -|| lra_in_progress)) + (symbol_ref TARGET_INTER_UNIT_CONVERSIONS) ] - (symbol_ref true))) - ]) + (const_int 1)))]) (define_insn *floatSWI48x:modeMODEF:mode2_i387 [(set (match_operand:MODEF 0 register_operand =f)
Re: [PATCH] palignr improvement (PR target/62128)
On Fri, Oct 3, 2014 at 8:52 AM, Jakub Jelinek ja...@redhat.com wrote: Tested with GCC_TEST_RUN_EXPENSIVE=1 make -k check-gcc \ RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' on x86_64-linux, ok for trunk if it passes bootstrap? As for the previous testcase with distilled pr52252-atom.c permutations, f1/f4 is now vpunpcklbw/vpunpckhbw/vperm2i128, f2 2x vpshufb/vpermq/vpor, f3/f5/f6 vperm2i128/vpalignr, suggestions how to improve that? 2014-10-02 Jakub Jelinek ja...@redhat.com PR target/62128 * config/i386/i386.c (expand_vec_perm_palignr): If op1, op0 order of palignr arguments can't be used due to min 0 or max - min too high, try also op0, op1 order of palignr arguments. * gcc.dg/torture/vshuf-16.inc (TESTS): Add 2 new permutations. * gcc.dg/torture/vshuf-32.inc (TESTS): Add 5 new permutations. Now successfully bootstrapped/regtested on x86_64-linux and i686-linux (without ada, as that doesn't bootstrap right now, Honza is looking into that). Ok for trunk? OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [60/n] Update 128bit ashrv insn pattern.
On Fri, Oct 3, 2014 at 12:26 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, On 29 Sep 09:54, Uros Bizjak wrote: +(define_expand vashrv2di3mask_name + [(set (match_operand:V2DI 0 register_operand) + (ashiftrt:V2DI + (match_operand:V2DI 1 register_operand) + (match_operand:V2DI 2 nonimmediate_operand)))] + TARGET_XOP || TARGET_AVX512VL +{ + if (!TARGET_XOP) This condition is wrong. Please re-test the patch. Great catch! Didn't tested whole i386.exp, so XOP tests didn't run. Fixed. Patch in the bottom. XOP tests are now pass. Is it ok for trunk now? OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [61/n] Update FP logic insn patterns.
On Fri, Oct 3, 2014 at 12:49 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, On 29 Sep 10:00, Uros Bizjak wrote: + /* There is no vandnp[sd] in avx512f. Use vpandn[qd]. */ + if (!TARGET_AVX512DQ) All other patterns also have mask_applied condition here. Is the above condition correct? I think this is correct since in this pattern we use AVX-512 only modes in iterator, so no chance to emit anything else but EVEX insn. In say, previous pattern we use modes are enabled for previous ISA extensions, so we emit this hack when masking (AVX-512 specific feature) is used. Thanks for the explanation, the patch is OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [62/n] Add vpmaddubsw,vdbpsadbw insn patterns.
On Fri, Oct 3, 2014 at 1:03 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, On 29 Sep 10:08, Uros Bizjak wrote: On Fri, Sep 26, 2014 at 4:09 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: +(define_insn avx512bw_pmaddubsw512modemask_name + [(set (match_operand:VI2_AVX512VL 0 register_operand =v) + (unspec:VI2_AVX512VL +[(match_operand:dbpsadbwmode 1 register_operand v) + (match_operand:dbpsadbwmode 2 nonimmediate_operand vm)] + UNSPEC_PMADDUBSW512))] + TARGET_AVX512BW + vpmaddubsw\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}; + [(set_attr type sseiadd) + (set_attr prefix evex) + (set_attr mode XI)]) + Can the one above be described using standard RTX, perhaps something similar to avx2_pmaddubsw256? Definetely, it can. We didn't do that because final pattern will be twice as long as 256-bit variant resulting to 96 (!) lines and I think in near future auto-vect won't pick MADD at all. But if you think that it will be better to have explicit RTX, I ready to do that! Uh, no. Let's be reasonable, and put only a comment explaining the situation, as in case of sse2_avx2_psadbw. Uros.
Re: [RFC, RFH PATCH, i386] Fix gcc.target/i386/pr61403.c FAIL with -mavx2
On Fri, Oct 3, 2014 at 1:11 PM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Oct 02, 2014 at 08:34:40PM +0200, Uros Bizjak wrote: Index: i386.c === --- i386.c (revision 215802) +++ i386.c (working copy) @@ -43407,8 +43407,10 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d AVX and AVX2 as they require more than 2 instructions. */ if (d-one_operand_p) return false; - if (TARGET_SSE4_1 GET_MODE_SIZE (vmode) == 16) + if (TARGET_AVX2 GET_MODE_SIZE (vmode) == 32) ; + else if (TARGET_SSE4_1 GET_MODE_SIZE (vmode) == 16) +; else return false; --cut here-- The comment above expand_vec_perm_pblendv claims that: /* Use the same checks as in expand_vec_perm_blend, but skipping AVX and AVX2 as they require more than 2 instructions. */ The comment is mostly right though, I'd say as they sometimes require more than 2 instructions. BTW: I have no access to avx2 target, so I can't test the patch with a runtime tests. OTOH, it doesn't ICE for GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'. Even the expensive testsuite has very limited coverage. As I wanted to prove your patch will ICE, I wrote following generator: #ifndef ITYPE #define ITYPE TYPE #endif #define S2(X) #X #define S(X) S2(X) int main () { int i, j, nelt = 32 / sizeof (TYPE); printf ( typedef S(TYPE) V __attribute__ ((vector_size (32)));\n typedef S(ITYPE) VI __attribute__ ((vector_size (32)));\n V a, b, c;\n \n #define T(n, m...) void foo##n (void) { c = __builtin_shuffle (a, b, (VI) m); }\n #define S(n, m...) T(n, m)\n); for (i = 0; i 10; i++) { printf (S (__LINE__, { ); for (j = 0; j nelt; j++) { int k = random () 3; int v = j; if (k 1) v = ((k 2) ? nelt : 0) + (random () (nelt - 1)); printf (%d%s, v, j (nelt - 1) ? , : })\n); } } } which can be compiled e.g. with -DTYPE=char -DTYPE=short -DTYPE=int -DTYPE=long -DTYPE=float -DITYPE=int -DTYPE=double -DITYPE=long and then in each case generate 10 tests (sort -u on it plus manual fixup can decrease that, for the V4DI/V4DF cases substantially). The first one triggered almost immediately an ICE, added to vshuf-32.inc (non-expensive). With the following updated patch all those generated testcases don't ICE (-mavx2 for the first four, -mavx for the last two). Also tested with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' I have had some problems testing with TARGET_AVX part of the change and /-mavx tests. I assume that your patch survives these tests. The pr61403.c testcase can be simplified into: typedef float V __attribute__ ((vector_size (32))); typedef int VI __attribute__ ((vector_size (32))); V a, b, c; #define T(n, m...) void foo##n (void) { c = __builtin_shuffle (a, b, (VI) m); } T (0, { 0, 1, 2, 3, 4, 5, 10, 13 }) T (1, { 0, 1, 2, 3, 4, 8, 11, 14 }) T (2, { 0, 1, 2, 3, 4, 9, 12, 15 }) T (3, { 0, 13, 2, 3, 14, 5, 6, 15 }) T (4, { 0, 1, 8, 3, 4, 9, 6, 7 }) T (5, { 0, 3, 11, 0, 4, 12, 0, 5 }) T (6, { 0, 3, 6, 9, 12, 15, 0, 0 }) T (7, { 0, 8, 0, 1, 9, 0, 2, 10 }) T (8, { 10, 1, 2, 11, 4, 5, 12, 7 }) T (9, { 13, 0, 6, 14, 0, 7, 15, 0 }) T (10, { 1, 4, 7, 10, 13, 0, 0, 0 }) T (11, { 2, 5, 8, 11, 14, 0, 0, 0 }) permutations, where both your and my patch optimize foo{0,1,2,3,4,8}. 2014-10-03 Jakub Jelinek ja...@redhat.com Uros Bizjak ubiz...@gmail.com PR tree-optimization/61403 * config/i386/i386.c (expand_vec_perm_palignr): Fix a spelling error in comment. Also optimize 256-bit vectors for AVX2 or AVX (floating vectors only), provided the first permutation can be performed in one insn. * gcc.dg/torture/vshuf-32.inc: Add a new test 29. OK if the patch bootstraps and regtests without problems. Thanks, Uros.
Re: [PATCH] Small pre-AVX512F optimization
On Fri, Oct 3, 2014 at 4:32 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! I've noticed that expand_vec_perm_1 completely uselessly builds GC garbage (CONST_VECTOR at least) when AVX512F isn't enabled at all. Ok to just call it for AVX512F? OK. Even better would be to check the modes first too depending on target (AVX512F will only handle V{8D,16S}{I,F}mode, AVX512BW would handle also V32HImode (not yet implemented?), AVX512VL could handle V{2D,4D,4S,8S}{I,F}mode (not yet implemented?, though is there any const permutation not handled yet earlier?), and AVX512VL+AVX512BW could handle V{8,16}HImode (not yet implemented?) before creating a CONST_VECTOR. AVX512BW and AVX512VL are currently moving targets, I propose to look at this issue once all changes are committed to the repository. 2014-10-03 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Fix up formatting. (ix86_expand_vec_perm): Only call ix86_expand_vec_perm_vpermi2 if TARGET_AVX512F. (expand_vec_perm_1): Likewise. OK. Thanks, Uros.
Re: [PATCH] 512-bit gcc.dg/torture/vshuf*.c
On Fri, Oct 3, 2014 at 4:25 PM, Jakub Jelinek ja...@redhat.com wrote: This patch extends the gcc.dg/torture/ testsuite for 512-bit vectors. Tested with GCC_TEST_RUN_EXPENSIVE=1 make -j32 check-gcc \ RUNTESTFLAGS='--target_board=unix\{-mavx2,-mavx,-mavx512f,-mavx512bw/-mavx512vl\} dg-torture.exp=vshuf*.c' (of course with expected AVX512* execution test failures, as I don't have hw and didn't bother with simulator). As before in vshuf-{8,16,32}.inc, some of the permutations are hand written (subset in the TESTS portion), the rest are random permutations. Ok for trunk? 2014-10-03 Jakub Jelinek ja...@redhat.com * gcc.dg/torture/vshuf-v8df.c: New test. * gcc.dg/torture/vshuf-v8di.c: New test. * gcc.dg/torture/vshuf-v16sf.c: New test. * gcc.dg/torture/vshuf-v16si.c: New test. * gcc.dg/torture/vshuf-v32hi.c: New test. * gcc.dg/torture/vshuf-v64qi.c: New test. * gcc.dg/torture/vshuf-64.inc: New file. Please note that -mavx512{bw,vl} are moving target ATM, quite some patches are still pending. That said, the patch is OK from x86 side, but a testsuite maintainer should OK it. Uros.
[PATCH, RTX]: Additional fix for PR 57003
Hello! My r215428 change exposed another PR 57003 problem on x86_64. When compiling gcc.target/i386/pr57003.c we refer to clobbered %rdi register after the call to memcpy: --- pr57003.s 2014-10-03 15:08:24.0 +0200 +++ pr57003_.s 2014-10-03 15:08:19.0 +0200 @@ -78,7 +78,7 @@ leaq-20(%rbx), %rdx movq%rax, %rdi callmemcpy - movq%rdi, c(%rip) + movq%rax, c(%rip) .L8: movaps (%rsp), %xmm6 movaps 16(%rsp), %xmm7 @@ -321,5 +321,5 @@ .byte 0xb .align 8 .LEFDE7: - .ident GCC: (GNU) 5.0.0 20141002 (experimental) [trunk revision 215797] + .ident GCC: (GNU) 4.9.2 20141001 (prerelease) [gcc-4_9-branch revision 215749] .section.note.GNU-stack,,@progbits The runtime failure happens only on CentOS5 (and not in Fedora20), which supports findings in Comment #17 of the PR. The difference is, that now we emit memcpy for MS_ABI-ELF_ABI cross-ABI call as: #(call_insn:TI 24 23 27 3 (set (reg:DI 0 ax) #(call (mem:QI (symbol_ref:DI (memcpy) [flags 0x41] function_decl 0x7fce6f586438 memcpy) [0 memcpy S1 A8]) #(const_int 0 [0]))) pr57003.c:32 661 {*call_value} # (expr_list:REG_DEAD (reg:DI 5 di) #(expr_list:REG_DEAD (reg:DI 4 si) #(expr_list:REG_DEAD (reg:DI 1 dx) #(expr_list:REG_UNUSED (reg:DI 0 ax) #(expr_list:REG_RETURNED (reg/v/f:DI 2 cx [orig:87 e ] [87]) #(expr_list:REG_CALL_DECL (symbol_ref:DI (memcpy) [flags 0x41] function_decl 0x7fce6f586438 memcpy) #(expr_list:REG_EH_REGION (const_int 0 [0]) #(nil #(expr_list (clobber (reg:TI 52 xmm15)) #(expr_list (clobber (reg:TI 51 xmm14)) #(expr_list (clobber (reg:TI 50 xmm13)) #(expr_list (clobber (reg:TI 49 xmm12)) #(expr_list (clobber (reg:TI 48 xmm11)) #(expr_list (clobber (reg:TI 47 xmm10)) #(expr_list (clobber (reg:TI 46 xmm9)) #(expr_list (clobber (reg:TI 45 xmm8)) #(expr_list (clobber (reg:TI 28 xmm7)) #(expr_list (clobber (reg:TI 27 xmm6)) #(expr_list (clobber (reg:DI 5 di)) #(expr_list (clobber (reg:DI 4 si)) #(expr_list:DI (set (reg:DI 0 ax) #(reg:DI 5 di)) #(expr_list:DI (use (reg:DI 5 di)) # (expr_list:DI (use (reg:DI 4 si)) # (expr_list:DI (use (reg:DI 1 dx)) # (nil)) which is alternate, but equivalent form of what was generated previously: #(call_insn:TI 24 23 27 3 (parallel [ #(set (reg:DI 0 ax) #(call (mem:QI (symbol_ref:DI (memcpy) [flags 0x41] function_decl 0x7fd91824a800 memcpy) [0 memcpy S1 A8]) #(const_int 0 [0]))) #(unspec [ #(const_int 0 [0]) #] UNSPEC_MS_TO_SYSV_CALL) #(clobber (reg:DI 4 si)) #(clobber (reg:DI 5 di)) #(clobber (reg:TI 27 xmm6)) #(clobber (reg:TI 28 xmm7)) #(clobber (reg:TI 45 xmm8)) #(clobber (reg:TI 46 xmm9)) #(clobber (reg:TI 47 xmm10)) #(clobber (reg:TI 48 xmm11)) #(clobber (reg:TI 49 xmm12)) #(clobber (reg:TI 50 xmm13)) #(clobber (reg:TI 51 xmm14)) #(clobber (reg:TI 52 xmm15)) #]) pr57003.c:32 652 {*call_value_rex64_ms_sysv} # (expr_list:REG_DEAD (reg:DI 5 di) #(expr_list:REG_DEAD (reg:DI 4 si) #(expr_list:REG_DEAD (reg:DI 1 dx) #(expr_list:REG_RETURNED (reg/v/f:DI 2 cx [orig:87 e ] [87]) #(expr_list:REG_EH_REGION (const_int 0 [0]) #(nil)) #(expr_list:DI (set (reg:DI 0 ax) #(reg:DI 5 di)) #(expr_list:DI (use (reg:DI 5 di)) #(expr_list:DI (use (reg:DI 4 si)) #(expr_list:DI (use (reg:DI 1 dx)) #(nil)) It looks that Jakub's patch, proposed in Comment #21 doesn't cover alternative form, so it doesn't record clobbers properly. Attached patch fixes this omission. 2014-10-03 Uros Bizjak ubiz...@gmail.com PR rtl-optimization/57003 * regcprop.c (copyprop_hardreg_forward_1): If ksvd.ignore_set_reg, also check CALL_INSN_FUNCTION_USAGE for clobbers again after killing regs_invalidated_by_call. Tested on x86_64-linux-gnu {,-m32}. OK for mainline and release branches? Uros. Index: regcprop.c === --- regcprop.c (revision 215861) +++ regcprop.c (working copy
Re: RFA: one more version of the patch for PR61360
On Sat, Oct 4, 2014 at 12:49 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Uros Bizjak ubiz...@gmail.com writes: On Thu, Oct 2, 2014 at 10:13 PM, Vladimir Makarov vmaka...@redhat.com wrote: I guess we achieved the consensus about the following patch to fix PR61360 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was successfully bootstrapped and tested (w/wo -march=amdfam10) on x86/x86-64. Is it ok to commit to trunk? I've tested your patch and unfortunately it doesn't work: In file included from /var/tmp/moz-build-dir/js/src/shell/Unified_cpp_js_src_shell0.cpp:15:0: /var/tmp/mozilla-central/js/src/shell/js.cpp: In function ‘void Process(JSContext*, JSObject*, const char*, bool)’: /var/tmp/mozilla-central/js/src/shell/js.cpp:592:1: internal compiler error: in lra_update_insn_recog_data, at lra.c:1221 } ^ 0xa9d9ec lra_update_insn_recog_data(rtx_insn*) ../../gcc/gcc/lra.c:1220 0xab450f eliminate_regs_in_insn ../../gcc/gcc/lra-eliminations.c:1077 0xab450f process_insn_for_elimination ../../gcc/gcc/lra-eliminations.c:1344 0xab450f lra_eliminate(bool, bool) ../../gcc/gcc/lra-eliminations.c:1408 0xa9f2da lra(_IO_FILE*) ../../gcc/gcc/lra.c:2270 0xa5d659 do_reload ../../gcc/gcc/ira.c:5311 0xa5d659 execute ../../gcc/gcc/ira.c:5470 Testcase is attached: % g++ -c -march=amdfam10 -w -O2 js.ii js.ii: In function ‘void RunFile(C)’: js.ii:64:1: internal compiler error: in lra_update_insn_recog_data, at lra.c:1221 Thanks for reporting this, Marcus. The problem now is in optimize_function_for_size_p. It is false, when we define and cache enable attributes at early stages (instantation of virtual regs) and true later. It is returning us to the same problem. I believe that we should not have enable attributes depending on optimization options or on the current running pass if we don't want the current solution in the trunk (with recog_init). Setting right value for optimize_function_for_size_p does not solve the problem as we can have different options for different functions in the same compilation file. So minimal solution would be removing optimize_function_for_size_p from the attribute definition. But I guess we could remove all condition. Unfortunately, Ganesh did not post is it really beneficial to switch off this alternative for AMD CPUs even if AMD optimization guide recommends it. I propose to split the pattern into size-optimized and normal pattern. The patch implements this proposal. An alternative would be to add two new enabled-like attributes, good_for_size and good_for_speed, that say whether it is efficient to use a particular alternative. These attributes wouldn't ever stop an existing instruction from being recognised; they would just say whether the RA and optimisers should consider the alternative when optimising for size or speed respectively. The attributes would have the same restrictions as the enabled attribute and could be cached in the same way. The enabled attribute would then be purely about whether an instruction is available, not whether it's efficient in a particular situation. The main advantage is that it would be possible to make the size/speed choice at a basic block level rather than a function level. In the worst case we might move an instruction from a hot block to a cold block or vice versa, but with intelligent spilling that shouldn't happen too often and at least it wouldn't trigger an ICE. If that sounds OK, I'll try to get something together next week. I think Ramana said he had a use for this on ARM too. I think that this would be way better than duplication of patterns. Perhaps we can name these attributes enable_for_size and enable_for_speed, and have them in addition to enable attribute. The final enable condition would be an union of enable, enable_for_speed and enable_for_size attributes. Uros.
Re: [PATCH,i386] Fix adxintrin on mingw.
On Thu, Oct 2, 2014 at 4:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, sizeof (long) == 4 on windows, so we should use long long as param type. Patch below does it. Ok for trunk? 2014-10-02 Ilya Tocar ilya.to...@intel.com * config/i386/adxintrin.h (_subborrow_u64): Use long long for param type. (_addcarry_u64): Ditto. (_addcarryx_u64): Ditto. OK everywhere. Thanks, Uros.
[PATCH, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference
Hello! This message revives an old thread [1], where the miscompilation of gfortran on alpha was found that that resulted in: FAIL: gfortran.dg/assumed_rank_3.f90:15.20: print *, ubound(x,dim=3) ! wrong dim 1 Error: Assumed-rank variable x at (1) may only be used as actual argument The problem was in the miscompilation of resolve_actual_arglist from resolve.c. This function initializes two nearby global bool variables with the following sequence: actual_arg = true; first_actual_arg = true; but due to the miscompilation, the actual_arg was never set to true. This happened due to the way stores to QImode and HImode locations are implemented on non-BWX targets. The sequence reads full word, does its magic to the part and stores the full word with changed part back to the memory. However - the scheduler mixed two sequences, violating the atomicity of RMW sequence. As demostrated in the great detail in the PR [2], the problem is in early exit for MEM_READOLNY_P in true_dependence_1 in alias.c. This early exit declares all MEM_READONLY_P references as non-aliasing, which is not true when possibly aliasing references with alignment ANDs are involved. Proposed patch prevents MEM_READONLY_P memory references to be moved over possibly aliased memory (with alignment ANDs). The patch prevents early exit for MEM_READONLY_P references when alignment ANDs are involved. The aliasing is then determined later in the function. In effect, it changes early exit to: - if (MEM_READONLY_P (x)) -return 0; + if (MEM_READONLY_P (x) + GET_CODE (x_addr) != AND + GET_CODE (mem_addr) != AND) +return 0; The comment also mentions ... We don't expect to find read-only set on MEM, but stupid user tricks can produce them, so don't die.. We certainly don't die anymore, as confirmed by a native alpha-linux-gnu (please note - not alphaev6!) bootstrap and regression test. 2014-10-08 Uros Bizjak ubiz...@gmail.com * alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P references when alignment ANDs are involved. The patch was bootstrapped and regression tested on alpha-linux-gnu. OK for mainline and release branches? Please note that the patch by itself is not enough to fix the original problem with gfortran miscompilation. Another problem in this area is summarised in PR 63475 [3], where postreload CSE propagates aliased memory operand. [1] https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02251.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483 [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475 Uros. Index: alias.c === --- alias.c (revision 215966) +++ alias.c (working copy) @@ -2458,18 +2458,6 @@ true_dependence_1 (const_rtx mem, enum machine_mod || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; - /* Read-only memory is by definition never modified, and therefore can't - conflict with anything. We don't expect to find read-only set on MEM, - but stupid user tricks can produce them, so don't die. */ - if (MEM_READONLY_P (x)) -return 0; - - /* If we have MEMs referring to different address spaces (which can - potentially overlap), we cannot easily tell from the addresses - whether the references overlap. */ - if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) -return 1; - if (! mem_addr) { mem_addr = XEXP (mem, 0); @@ -2493,6 +2481,22 @@ true_dependence_1 (const_rtx mem, enum machine_mod } } + /* Read-only memory is by definition never modified, and therefore can't + conflict with anything. However, don't assume anything when AND + addresses are involved and leave to the code below to determine + dependence. We don't expect to find read-only set on MEM, but + stupid user tricks can produce them, so don't die. */ + if (MEM_READONLY_P (x) + GET_CODE (x_addr) != AND + GET_CODE (mem_addr) != AND) +return 0; + + /* If we have MEMs referring to different address spaces (which can + potentially overlap), we cannot easily tell from the addresses + whether the references overlap. */ + if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) +return 1; + base = find_base_term (x_addr); if (base (GET_CODE (base) == LABEL_REF || (GET_CODE (base) == SYMBOL_REF
Re: [PATCH, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference
On Wed, Oct 8, 2014 at 12:51 PM, Richard Biener rguent...@suse.de wrote: This message revives an old thread [1], where the miscompilation of gfortran on alpha was found that that resulted in: [...] As said in the audit trail of the bugreport I think that the caller of alpha_set_memflags is wrong in applying MEM flags from the _source_ operand to MEMs generated for the RMW sequence for the destination. It would be better to fix that instead. Please see comment #6 of the referred PR [1] for further analysis and ammended testcase. The testcase and analysis will show a native read passing possibly aliasing store. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483#c6 Uros.
Re: [patch] Excessive alignment in ix86_data_alignment
On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: On 08 Oct 23:02, Petr Murzin wrote: Hi, I have measured performance impact on Haswell platform according to this input: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html What about older processors? The optimization was introduced well before Haswell for then current processors, and it was based on the recommendation from Intel optimization guide. If this optimization doesn't apply for new processors, then tune option should be introduced and set accordingly. Uros.
Re: [i386] Replace builtins with vector extensions
On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse marc.gli...@inria.fr wrote: Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html (another part of the discussion is around https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html ) Most people who commented seem cautiously in favor. The least favorable was Ulrich who suggested to go with it but keep the old behavior accessible if the user defines some macro (which imho would lose a large part of the simplification benefits of the patch) https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html If this is accepted, I will gladly prepare patches removing the unused builtins and extending this to a few more operations (integer vectors in particular). If this is not the direction we want to go, I'd like to hear it clearly so I can move on... Well, I'm undecided. The current approach is proven to work OK, there is no bugs reported in this area and the performance is apparently OK. There should be clear benefits in order to change something that ain't broken, and at least some proof that we won't regress in this area with the new approach. On the other hand, if the new approach opens new optimization opportunities (without regression!), I'm in favor of it, including the fact that new code won't produce equivalent assembly - as long as functionality of the optimized asm stays the same (obviously, I'd say). Please also note that this is quite big project. There are plenty of intrinsics and I for one don't want another partial transition ... TL/DR: If there are benefits, no regressions and you think you'll finish the transition, let's go for it. Uros.
Re: [i386] Replace builtins with vector extensions
On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse marc.gli...@inria.fr wrote: On Thu, 9 Oct 2014, Uros Bizjak wrote: On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse marc.gli...@inria.fr wrote: Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html (another part of the discussion is around https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html ) Most people who commented seem cautiously in favor. The least favorable was Ulrich who suggested to go with it but keep the old behavior accessible if the user defines some macro (which imho would lose a large part of the simplification benefits of the patch) https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html If this is accepted, I will gladly prepare patches removing the unused builtins and extending this to a few more operations (integer vectors in particular). If this is not the direction we want to go, I'd like to hear it clearly so I can move on... Well, I'm undecided. First, thanks for answering, it helps me a lot to know what others think. The current approach is proven to work OK, there is no bugs reported in this area and the performance is apparently OK. There should be clear benefits in order to change something that ain't broken, and at least some proof that we won't regress in this area with the new approach. There are quite a few enhancement PRs asking for more performance, but indeed no (or very few) complaints about correctness or about gcc turning their code into something worse than what they wrote, which I completely agree weighs more. On the other hand, if the new approach opens new optimization opportunities (without regression!), I'm in favor of it, including the fact that new code won't produce equivalent assembly - as long as functionality of the optimized asm stays the same (obviously, I'd say). Please also note that this is quite big project. There are plenty of intrinsics and I for one don't want another partial transition ... That might be an issue : this transition is partial by nature. Many intrinsics cannot (easily) be expressed in GIMPLE, and among those that can be represented, we only want to change those for which we are confident that we will not regress the quality of the code. From the reactions, I would assume that we want to be quite conservative at the beginning, and maybe we can reconsider some other intrinsics later. The best I can offer is consistency: if addition of v2df is changed, addition of v4df is changed as well (and say any +-*/ of float/double vectors of any supported size). Another block would be +-*/% for integer vectors. And construction / access (most construction is already builtin-free). And remove the unused builtins in the same patch that makes them unused. If you don't like those blocks, I can write one mega-patch that does all these, if we roughly agree on the list beforehand, so it goes in all at once. Would that be good enough? OK, let's go in the proposed way, more detailed: - we begin with +-*/ of float/double vectors. IMO, this would result in a relatively small and easily reviewable patch to iron out the details of the approach. Alternatively, we can begin with floats only. - commit the patch and wait for the sky to fall down. - we play a bit with the compiler to check generated code and corner cases (some kind of Q/A) and wait if someone finds a problem (say, a couple of weeks). - if there are no problems, continue with integer builtins following the established approach, otherwise we revert everything and go back to the drawing board. - repeat the procedure for other builtins. I propose to wait a couple of days for possible comments before we get the ball rolling. Uros.
Re: [PATCH i386 AVX512] [64/n] Add rest of VI1-AVX2: vpack[us]wb.
On Thu, Oct 9, 2014 at 12:09 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch adds rest of vpack instruction patterns. Bootstrapped. gcc.target/i386.exp tests on top of patch-set show no regressions. under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn sse2_avx2_packsswbmask_name): Add masking. (define_insn sse2_avx2_packuswbmask_name): Ditto. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [65/n] Add rest of VI1-AVX2: mul insn pattern.
On Thu, Oct 9, 2014 at 12:19 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This tiny patch extend mulmode insn pattern to support masking. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand mulmode3mask_name): Add masking. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [66/n] Extend vpalignr insn patterns.
On Thu, Oct 9, 2014 at 12:28 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends vpalignr insn patterns. It also introduces dedicated `masked' version of pattern w/o substing. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator SSESCALARMODE): Add V4TI mode. (define_insn ssse3_avx2_palignrmode_mask): New. (define_insn ssse3_avx2_palignrmode): Add EVEX version. OK, although SSESCALARMODE became even more messy ... Just FYI: V1TI in VIMAX_AVX2 iterator is used to prevent moves of TImode values from SSE to general regs on x86_64. The same reasoning applies to V1DI MMX mode on x86_32. Thanks, Uros.
Re: [PATCH i386 AVX512] [67/n] Update constraints in vec_dup insn pattern.
On Thu, Oct 9, 2014 at 12:34 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This tiny patch updates constraints in vec_dup insn pattern. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn vec_dupmode): Update constraints. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.
On Thu, Oct 9, 2014 at 1:07 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends vpmullw, vpacksdw and pmaddwd insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_c_enum unspec): Add UNSPEC_PMADDWD512. (define_mode_iterator VI2_AVX2): Add V32HI mode. (define_expand mulmode3mask_name): Add masking. (define_insn *mulmode3mask_name): Ditto. (define_expand smulmode3_highpartmask_name): Ditto. (define_insn *smulmode3_highpartmask_name): Ditto. (define_insn avx512bw_pmaddwd512modemask_name): New. (define_mode_attr SDOT_PMADD_SUF): Ditto. (define_expand sdot_prodmode): Add SDOT_PMADD_SUF. (define_insn sse2_avx2_packssdwmask_name): Add masking. (define_insn *ssse3_avx2_pmulhrswmode3mask_name): Ditto. (define_insn avx2_packusdw): Delete. (define_insn sse4_1_packusdw): Ditto. (define_insn sse4_1_avx2_packusdwmask_name): New. OK. + TARGET_SSE2 +ix86_binary_operator_ok (MULT, MODEmode, operands) +mask_mode512bit_condition mask_avx512bw_condition Just noticed, that need to swap target check with operads check. No need to worry for minor issues now, but looking at the sse.md, it looks to me like a case for a quick cleanup patch to correct these inconsistencies. Thanks, Uros.
Re: [PATCH i386 AVX512] [69/n] Add vpmulhrsw insn support.
On Thu, Oct 9, 2014 at 1:12 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch adds support for vpmulhrsw insn. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx512bw_umulhrswv32hi3mask_name): New. (define_expand ssse3_avx2_pmulhrswmode3_mask): Ditto. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [70/n]
On Thu, Oct 9, 2014 at 1:36 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch further extends maxmin patterns. You didn't update Subject field ;) Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn *codemode3_finitemask_nameround_saeonly_name): Fix pattern conditions order. No, not yet. As recommended earlier, this change should be part of a later cleanup patch. Do not mix functional changes and cleanups together, it makes review and eventual bisections harder. (define_insn *sse4_1_codemode3mask_name): Add masking. (define_insn *sse4_1_codemode3mask_name0: Ditto. OK without the part, mentioned above. Thanks, Uros.
Re: [PATCH i386 AVX512] [71/n] [Obvious?] Remove redudant iterator attribute.
On Thu, Oct 9, 2014 at 1:39 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This obvious patch removes redundant iterator attribute Bootstrapped. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_attr avx2_avx512f): Remove. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [72/n] Extend VI itterator.
On Thu, Oct 9, 2014 at 1:47 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends VI mode iterator. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_vector_logical_operator): Handle V16SF and V8DF modes. * config/i386/sse.md (define_mode_iterator VI): Add V64QI and V32HI modes. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [73/n] Extend reduc min/max autogen.
On Thu, Oct 9, 2014 at 1:55 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends pattern for reducation maxmin autogen. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator REDUC_SMINMAX_MODE): Add V64QI and V32HI modes. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [74/n] Add byte/word max/mix reduction.
On Thu, Oct 9, 2014 at 2:02 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (emit_reduc_half): Handle V64QI and V32HI mode. * config/i386/sse.md (define_mode_iterator VI_AVX512BW): New. (define_expand reduc_code_mode): Use VI512_48F_12BW. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [75/n] Update vec_init.
On Thu, Oct 9, 2014 at 2:13 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends vec_init-related routines/patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_vector_init_duplicate): Handle V64QI and V32HI modes, update V8HI, V16QI, V32QI modes handling. (ix86_expand_vector_init_general): Handle V64QI and V32HI modes. * config/i386/sse.md (define_mode_iterator VI48F_512): Rename to ... (define_mode_iterator VI48F_I12_AVX512BW): ... this. Extend to AVX-512BW modes. (define_expand vec_initmode): Use VI48F_I12_AVX512BW. LGTM, but I'd like to ask Jakub for his opinion on vec_init stuff. Uros.
Re: [PATCH i386 AVX512] [76/n] Extend int 2 float conversions.
On Thu, Oct 9, 2014 at 5:01 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends autogeneration of SI-2-SF conversions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_vector_convert_uns_vsivsf): Handle V16SI mode and TARGET_AVX512VL. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7c34431..8a7853e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18811,6 +18811,19 @@ ix86_expand_vector_convert_uns_vsivsf (rtx target, rtx val) enum machine_mode fltmode = GET_MODE (target); rtx (*cvt) (rtx, rtx); Please handle this directly in floatunssseintvecmodelowermode2 expander. The V16SImode is already handled from there. Uros. + if (intmode == V16SImode) +{ + emit_insn (gen_ufloatv16siv16sf2 (target, val)); + return; +} + if (TARGET_AVX512VL) +{ + if (intmode == V4SImode) +emit_insn (gen_ufloatv4siv4sf2 (target, val)); + else +emit_insn (gen_ufloatv8siv8sf2 (target, val)); + return; +} if (intmode == V4SImode) cvt = gen_floatv4siv4sf2; else
Re: [PATCH i386 AVX512] [77/n] Use blend for cond-set V32HI and V64QI.
On Thu, Oct 9, 2014 at 5:07 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends movcc/vcond autogen. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_sse_movcc): Handle V64QI and V32HI mode. (ix86_expand_int_vcond): Ditto. OK. Thanks, Uros.
Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target
On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com wrote: It appeared I changed a semantics of BNDMK expand when replaced tree operations with rtl ones. Original code: + op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1), + arg1, integer_minus_one_node)); + op1 = force_reg (Pmode, op1); Modified code: + op1 = expand_normal (arg1); + + if (!register_operand (op1, Pmode)) + op1 = ix86_zero_extend_to_Pmode (op1); + + /* Builtin arg1 is size of block but instruction op1 should +be (size - 1). */ + op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, +op1, 1, OPTAB_DIRECT); The problem is that in the fixed version we may modify value of a pseudo register into which arg1 is expanded which means incorrect value for all following usages of arg1. Didn't reveal it early because programs surprisingly rarely hit this bug. I do following change to fix it: op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, -op1, 1, OPTAB_DIRECT); +NULL, 1, OPTAB_DIRECT); Similar problem was also fixed for BNDNARROW. Does it look OK? I'm not aware of this type of limitation, and there are quite some similar constructs in i386.c. It is hard to say without the testcase what happens, perhaps one of RTX experts (CC'd) can advise what is recommended here. Uros.
Re: [i386] Replace builtins with vector extensions
On Thu, Oct 9, 2014 at 7:46 PM, Marc Glisse marc.gli...@inria.fr wrote: If this is accepted, I will gladly prepare patches removing the unused builtins and extending this to a few more operations (integer vectors in particular). If this is not the direction we want to go, I'd like to hear it clearly so I can move on... As we discussed offlist, removing all the builtins would be problematic for Ada as they are the only medium allowing flexible access to vector instructions (aside autovectorization) for users. Today, the model is very simple: people who want to build on top of vector operations just bind to the builtins they need and expose higher level interfaces if they like, provided proper type definitions (see g-sse.ads for example). It is sad that this prevents us from removing the builtins, but I agree that we can't just drop ada+sse users like that. Well, less work for me if I don't have to remove the builtins, and my main motivation is optimization, even if I tried to sell the clean up to convince people. Uros, is it still ok if I change the intrinsics without removing the builtins? (with testcases for HJ and not before Kirill says it is ok) Given that this will be a substantial work and considering the request from Kirill, what do you think about separate development branch until AVXn stuff is finished? This will give a couple of weeks and a playground to finalize the approach for the conversion. Maybe even ada can be tested there to not regress with the compatibility stuff. Uros.
[4.9 PATCH, testsuite]: Fix g++.dg/cpp1y/feat-cxx14.C testsuite errors
Hello! 2014-10-09 Uros Bizjak ubiz...@gmail.com * g++.dg/cpp1y/feat-cxx14.C: Variable templates not in yet. (dg-do): Use c++1y target. Tested on x86_64. OK for branch? Uros. Index: g++.dg/cpp1y/feat-cxx14.C === --- g++.dg/cpp1y/feat-cxx14.C (revision 216044) +++ g++.dg/cpp1y/feat-cxx14.C (working copy) @@ -1,4 +1,4 @@ -// { dg-do compile { target c++14 } } +// { dg-do compile { target c++1y } } // { dg-options -I${srcdir}/g++.dg/cpp1y -I${srcdir}/g++.dg/cpp1y/testinc } // Begin C++11 tests. @@ -125,10 +125,9 @@ # error __cpp_aggregate_nsdmi #endif -#ifndef __cpp_variable_templates +// Variable templates not in yet. +#ifdef __cpp_variable_templates # error __cpp_variable_templates -#elif __cpp_variable_templates != 201304 -# error __cpp_variable_templates != 201304 #endif #ifndef __cpp_digit_separators
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. Couple of nits below. +/* Set regs_ever_live for PIC base address register + to true if required. */ +static void +set_pic_reg_ever_alive () Please rename this function to set_pic_reg_ever_live. -#define PIC_OFFSET_TABLE_REGNUM \ - ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ - || TARGET_PECOFF)) \ - || !flag_pic ? INVALID_REGNUM \ - : reload_completed ? REGNO (pic_offset_table_rtx) \ +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : pic_offset_table_rtx ? INVALID_REGNUM \ : REAL_PIC_OFFSET_TABLE_REGNUM) No negative conditions, please. Also, please follow established multi-level condition format, please see e.g. HARD_REGNO_NREGS definition in i386.h. OK for mainline after infrastructure patch is approved. Thanks, Uros.
[PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference
On Wed, Oct 8, 2014 at 1:56 PM, Uros Bizjak ubiz...@gmail.com wrote: This message revives an old thread [1], where the miscompilation of gfortran on alpha was found that that resulted in: [...] As said in the audit trail of the bugreport I think that the caller of alpha_set_memflags is wrong in applying MEM flags from the _source_ operand to MEMs generated for the RMW sequence for the destination. It would be better to fix that instead. Please see comment #6 of the referred PR [1] for further analysis and ammended testcase. The testcase and analysis will show a native read passing possibly aliasing store. Attached v2 patch implements the same approach in all alias.c places that declare MEM_READONLY_P operands as never aliasing. 2014-10-10 Uros Bizjak ubiz...@gmail.com * alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P references when alignment ANDs are involved. (write_dependence_p): Ditto. (may_alias_p): Ditto. Patch was boostrapped and regression tested on x86_64-linux-gnu and alpha-linux-gnu. Unfortunately, there are still failures remaining in gfortran testsuite due to independent RTL infrastructure problem with VALUEs leaking into aliasing detecting functions [2], [3]. The patch was discussed and OK'd by Richi in the PR audit trail. If there are no objections from RTL maintainers, I plan to commit it to the mainline early next week. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483 [2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475 Uros. Index: alias.c === --- alias.c (revision 216025) +++ alias.c (working copy) @@ -2458,18 +2458,6 @@ true_dependence_1 (const_rtx mem, enum machine_mod || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; - /* Read-only memory is by definition never modified, and therefore can't - conflict with anything. We don't expect to find read-only set on MEM, - but stupid user tricks can produce them, so don't die. */ - if (MEM_READONLY_P (x)) -return 0; - - /* If we have MEMs referring to different address spaces (which can - potentially overlap), we cannot easily tell from the addresses - whether the references overlap. */ - if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) -return 1; - if (! mem_addr) { mem_addr = XEXP (mem, 0); @@ -2493,6 +2481,22 @@ true_dependence_1 (const_rtx mem, enum machine_mod } } + /* Read-only memory is by definition never modified, and therefore can't + conflict with anything. However, don't assume anything when AND + addresses are involved and leave to the code below to determine + dependence. We don't expect to find read-only set on MEM, but + stupid user tricks can produce them, so don't die. */ + if (MEM_READONLY_P (x) + GET_CODE (x_addr) != AND + GET_CODE (mem_addr) != AND) +return 0; + + /* If we have MEMs referring to different address spaces (which can + potentially overlap), we cannot easily tell from the addresses + whether the references overlap. */ + if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) +return 1; + base = find_base_term (x_addr); if (base (GET_CODE (base) == LABEL_REF || (GET_CODE (base) == SYMBOL_REF @@ -2576,16 +2580,6 @@ write_dependence_p (const_rtx mem, || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; - /* A read from read-only memory can't conflict with read-write memory. */ - if (!writep MEM_READONLY_P (mem)) -return 0; - - /* If we have MEMs referring to different address spaces (which can - potentially overlap), we cannot easily tell from the addresses - whether the references overlap. */ - if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) -return 1; - mem_addr = XEXP (mem, 0); if (!x_addr) { @@ -2603,6 +2597,21 @@ write_dependence_p (const_rtx mem, } } + /* A read from read-only memory can't conflict with read-write memory. + Don't assume anything when AND addresses are involved and leave to + the code below to determine dependence. */ + if (!writep + MEM_READONLY_P (mem) + GET_CODE (x_addr) != AND + GET_CODE (mem_addr) != AND) +return 0; + + /* If we have MEMs referring to different address spaces (which can + potentially overlap), we cannot easily tell from the addresses + whether the references overlap. */ + if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) +return 1; + base = find_base_term (mem_addr); if (! writep base @@ -2690,18 +2699,6 @@ may_alias_p (const_rtx mem, const_rtx x) || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; - /* Read-only memory is by definition never modified, and therefore can't - conflict with anything. We don't expect to find read-only set on MEM, - but stupid user tricks can
Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com wrote: the patch improves performance when previous are applied. It makes RTL loop invariant behavior for GOT loads same as it was before the 2 previous patches. The patch fixes x86 address cost so that cost for addresses with GOT register becomes less, how it was before enabling EBX. In x86_address_cost the result of “REGNO (parts.base) = FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes condition result the same when parts.base is GOT register (the same for parts.index). 2014-10-08 Evgeny Stupachenko evstu...@gmail.com * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for when address contains GOT register. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b43e870..9d8cfd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) cost++; Please add a short comment here, explaining the reason for new condition. if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++; Otherwise LGTM, but please repost the patch with a comment. Uros.
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. Please mention *which* code you removed here. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. This is not New check, but changed one. Please mention *what* changed. - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) Hm, can you please add a comment for this change? Uros.
Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune
On Fri, Oct 10, 2014 at 5:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote: We've met several performance issues (up to 15%) on Silvermont caused by the PARTIAL_REG_DEPENDENCY tuning. Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY. The patch passed bootstrap, make check. Is it ok for trunk? OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).
On Fri, Oct 10, 2014 at 5:47 PM, Ilya Tocar tocarip.in...@gmail.com wrote: My strong preference would be: enum machine_mode maskmode = mode; rtx (*gen) (rtx, rtx, rtx, rtx); right below the enum machine_mode mode = GET_MODE (d ? d-op0 : op0); line and then inside of the first switch just do: ... case V16SImode: if (!TARGET_AVX512F) return false; gen = gen_avx512f_vpermi2varv16si3; break; case V4SFmode: if (!TARGET_AVX512VL) return false; gen = gen_avx512vl_vpermi2varv4sf3; maskmode = V4SImode; break; ... etc., then in the mask = line use: mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d-nelt, vec)); and finally instead of the second switch do: emit_insn (gen (target, op0, force_reg (maskmode, mask), op1)); return true; Updated patch below. Please recode that horrible first switch statement to: --cut here-- rtx (*gen) (rtx, rtx, rtx, rtx) = NULL; switch (mode) { case V8HImode: if (TARGET_AVX512VL TARGET_AVX152BW) gen = gen_avx512vl_vpermi2varv8hi3; break; ... case V2DFmode: if (TARGET_AVX512VL) { gen = gen_avx512vl_vpermi2varv2df3; maskmode = V2DImode; } break; default: break; } if (gen == NULL) return false; --cut here-- The patch is OK with the above improvement. (Please also note that the patch has a bunch of i386.md changes that will clash with followup patch series). Thanks, Uros.
Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI
On Wed, Oct 1, 2014 at 8:57 PM, Vladimir Makarov vmaka...@redhat.com wrote: The problem is in code introduced by Bernd in IRA and caller-saves.c in 2012. It is basically an optimization for functions returning always the same result as one argument (e.g. memcpy returning 1st argument). There are two possible solutions. First one is to prohibit the optimizations when there is a parallel in SET. Second one is to go deeper if the call result is guaranteed in the first element which is true for the patch. I suspect that the first solution will regress gcc.target/i386/retarg.c on i686 - that testcase test if referred optimization is effective. All things equal, I think we should go with the second solution. For the first solution, the patch would be Index: lra-constraints.c === --- lra-constraints.c (revision 215748) +++ lra-constraints.c (working copy) @@ -5348,16 +5348,19 @@ if (GET_CODE (pat) == PARALLEL) pat = XVECEXP (pat, 0, 0); dest = SET_DEST (pat); - start_sequence (); - emit_move_insn (cheap, copy_rtx (dest)); - restore = get_insns (); - end_sequence (); - lra_process_new_insns (curr_insn, NULL, restore, -Inserting call parameter restore); - /* We don't need to save/restore of the pseudo from -this call. */ - usage_insns[regno].calls_num = calls_num; - bitmap_set_bit (check_only_regs, regno); + if (REG_P (dest)) + { + start_sequence (); + emit_move_insn (cheap, copy_rtx (dest)); + restore = get_insns (); + end_sequence (); + lra_process_new_insns (curr_insn, NULL, restore, +Inserting call parameter restore); + /* We don't need to save/restore of the pseudo +from this call. */ + usage_insns[regno].calls_num = calls_num; + bitmap_set_bit (check_only_regs, regno); + } } } to_inherit_num = 0; For the second solution, the patch is Index: lra-constraints.c === --- lra-constraints.c (revision 215748) +++ lra-constraints.c (working copy) @@ -5348,16 +5348,25 @@ if (GET_CODE (pat) == PARALLEL) pat = XVECEXP (pat, 0, 0); dest = SET_DEST (pat); - start_sequence (); - emit_move_insn (cheap, copy_rtx (dest)); - restore = get_insns (); - end_sequence (); - lra_process_new_insns (curr_insn, NULL, restore, -Inserting call parameter restore); - /* We don't need to save/restore of the pseudo from -this call. */ - usage_insns[regno].calls_num = calls_num; - bitmap_set_bit (check_only_regs, regno); + if (GET_CODE (dest) == PARALLEL) + { + dest = XVECEXP (dest, 0, 0); + if (GET_CODE (dest) == EXPR_LIST) + dest = XEXP (dest, 0); + } + if (REG_P (dest)) + { + start_sequence (); + emit_move_insn (cheap, copy_rtx (dest)); + restore = get_insns (); + end_sequence (); + lra_process_new_insns (curr_insn, NULL, restore, +Inserting call parameter restore); + /* We don't need to save/restore of the pseudo from +this call. */ + usage_insns[regno].calls_num = calls_num; + bitmap_set_bit (check_only_regs, regno); + } } } The first patch is safer but the second one is ok too. I have no particular preferences. Whatever we choose, analogous code in caller-saves.c should be changed too. Uros.
Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI
On Fri, Oct 10, 2014 at 7:29 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-10-10 21:10 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Wed, Oct 1, 2014 at 8:57 PM, Vladimir Makarov vmaka...@redhat.com wrote: The problem is in code introduced by Bernd in IRA and caller-saves.c in 2012. It is basically an optimization for functions returning always the same result as one argument (e.g. memcpy returning 1st argument). There are two possible solutions. First one is to prohibit the optimizations when there is a parallel in SET. Second one is to go deeper if the call result is guaranteed in the first element which is true for the patch. I suspect that the first solution will regress gcc.target/i386/retarg.c on i686 - that testcase test if referred optimization is effective. All things equal, I think we should go with the second solution. The first solutions is in trunk since October 3 (https://gcc.gnu.org/ml/gcc-cvs/2014-10/msg00094.html) and I don't see such fail. Patch actually just checks for a case which never occurs right now and therefore should be quite safe. True, but after MPX patches are committed, PARALLELs will be passed as call targets. I wonder if the testcase fails then. Uros.
Re: [PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference
On Fri, Oct 10, 2014 at 7:25 PM, Jeff Law l...@redhat.com wrote: This message revives an old thread [1], where the miscompilation of gfortran on alpha was found that that resulted in: [...] As said in the audit trail of the bugreport I think that the caller of alpha_set_memflags is wrong in applying MEM flags from the _source_ operand to MEMs generated for the RMW sequence for the destination. It would be better to fix that instead. Please see comment #6 of the referred PR [1] for further analysis and ammended testcase. The testcase and analysis will show a native read passing possibly aliasing store. Attached v2 patch implements the same approach in all alias.c places that declare MEM_READONLY_P operands as never aliasing. 2014-10-10 Uros Bizjak ubiz...@gmail.com * alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P references when alignment ANDs are involved. (write_dependence_p): Ditto. (may_alias_p): Ditto. Patch was boostrapped and regression tested on x86_64-linux-gnu and alpha-linux-gnu. Unfortunately, there are still failures remaining in gfortran testsuite due to independent RTL infrastructure problem with VALUEs leaking into aliasing detecting functions [2], [3]. The patch was discussed and OK'd by Richi in the PR audit trail. If there are no objections from RTL maintainers, I plan to commit it to the mainline early next week. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483 [2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475 No objection. In fact, after reading everything it's pretty obvious to me that a /u MEM must be considered as potentially conflicting with writes that are implemented as RMW sequences to deal with the lack of byte access support. Thanks, I went ahead and commit the patch to SVN mainline. I wonder, if they should be also committed to release branches? The escaping VALUE stuff is still in my queue. Great, I can test them on alpha native, there are many gfortran testsuite failures due to this problem. Thanks, Uros.
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 5:01 PM, Evgeny Stupachenko evstu...@gmail.com wrote: -#define PIC_OFFSET_TABLE_REGNUM \ - ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ - || TARGET_PECOFF)) \ - || !flag_pic ? INVALID_REGNUM \ - : reload_completed ? REGNO (pic_offset_table_rtx) \ +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : pic_offset_table_rtx ? INVALID_REGNUM \ : REAL_PIC_OFFSET_TABLE_REGNUM) No negative conditions, please. Also, please follow established multi-level condition format, please see e.g. HARD_REGNO_NREGS definition in i386.h. I don't see how we can avoid negative condition here. If we remove not from !flag_pic we'll need to add not to TARGET_64BIT and TARGET_PECOFF. I've done it this way: +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic \ + ? INVALID_REGNUM\ + : pic_offset_table_rtx \ + ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) Is it ok? Oh, indeed. I missed the logical or. Maybe put the first condition into parenthesis, to avoid confusion. OK in any case. Thanks, Uros.
Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 5:17 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Patch updated with the comment: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2a64d2d..5fd6a82 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12455,9 +12455,18 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER))) cost++; + /* When address base or index is pic_offset_table_rtx we don't increase + address cost. When a memop with pic_offset_table_rtx is not invariant + itself it most likely means that base or index is not invariant. + Therefore only pic_offset_table_rtx could be hoisted out, which is not + profitable for x86. */ if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++; LGTM. OK. Thanks, Uros.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. Uros.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 6:32 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. OK. Thanks, Uros.
[PATCH, rtl-optimization] Fix PR63475, Postreload CSE propagates aliased memory operand
Hello! Attached patch fixes PR63475, where postreload CSE propagates aliased memory operand. The core of the problem was with the call to base_alias_check when VALUE RTXes are involved. Before the call, find_base_term is used to extract the base of x_addr and mem_addr. Please note that find_base_term is able to extract the bases from VALUE RTXes. These extracted bases were passed to base_alias_check, together with original VALUE RTXes x_addr and mem_addr. The problem begins here. base_alias_check doesn't handle VALUE RTXes, and uses e.g. canon_rtx on VALUEs and various GET_CODE accessors to determine various properties of passed x_addr and mem_addr. One of these check checks for the AND alignment addresses to prevent: /* Differing symbols not accessed via AND never alias. */ if (GET_CODE (x_base) != ADDRESS GET_CODE (y_base) != ADDRESS) return 0; early exit. However, when x and y are passed as VALUE RTXes (that corresponds and hides the address with AND), and preceding calls to find_base_term are nevertheless able to extract the bases of x and y, this condition fires erroneously and invalid return value is returned (with 0 meaning that the addresses X and Y are known to point to different objects). The solution is to always extract values for x_addr and mem_addr and use them in the calls to find_base_term and base_alias_check. [It can happen that get_addr is not able to match VALUE RTX with some address, so it is not possible to simply add a bunch of GET_CODE (x) != VALUE asserts in base_alias_check. But in this case find_base_term returns ADDRESS RTX, so we stay in sync as far as base_alias_check is concerned (see the quoted code above).] Added benefit of the patch is, that canon_rtx now works as expected. canon_rtx does NOT handle VALUE RTXes. A small optimization is also present. If the address is already canonicalized, we pass original address to memrefs_conflict_p, but we have to extract original address for preceding functions nevertheless. Also, we use extracted original address in recently added check for AND aligned addresses when checking for MEM_READONLY_P. The patch also removes a couple of unneeded and unused calls to canon_rtx, also to show the level of bitrot in this area ... 2014-10-14 Uros Bizjak ubiz...@gmail.com PR rtl-optimization/63475 * alias.c (true_dependence_1): Always use get_addr to extract true address operands from x_addr and mem_addr. Use extracted address operands to check for references with alignment ANDs. Use extracted address operands with find_base_term and base_alis_check. For noncanonicalized operands call canon_rtx with extracted address operand. (write_dependence_1): Ditto. (may_alias_p): Ditto. Remove unused calls to canon_rtx. Patch was thoroughly tested on x86_64-linux-gnu {,-m32} and alpha-linux-gnu for all default languages plus obj-c++ and go. While there was no differences on x86_64-linux-gnu (as expected), alpha-linux-gnu improved the result [1] for some hundred of PASSes in gfortran testsuite [2]. OK for mainline? [1] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01151.html [2] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01478.html Uros. Index: alias.c === --- alias.c (revision 216149) +++ alias.c (working copy) @@ -2439,6 +2439,7 @@ static int true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr, const_rtx x, rtx x_addr, bool mem_canonicalized) { + rtx true_mem_addr; rtx base; int ret; @@ -2458,6 +2459,10 @@ true_dependence_1 (const_rtx mem, enum machine_mod || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; + if (! x_addr) +x_addr = XEXP (x, 0); + x_addr = get_addr (x_addr); + if (! mem_addr) { mem_addr = XEXP (mem, 0); @@ -2464,23 +2469,8 @@ true_dependence_1 (const_rtx mem, enum machine_mod if (mem_mode == VOIDmode) mem_mode = GET_MODE (mem); } + true_mem_addr = get_addr (mem_addr); - if (! x_addr) -{ - x_addr = XEXP (x, 0); - if (!((GET_CODE (x_addr) == VALUE - GET_CODE (mem_addr) != VALUE - reg_mentioned_p (x_addr, mem_addr)) - || (GET_CODE (x_addr) != VALUE -GET_CODE (mem_addr) == VALUE -reg_mentioned_p (mem_addr, x_addr - { - x_addr = get_addr (x_addr); - if (! mem_canonicalized) - mem_addr = get_addr (mem_addr); - } -} - /* Read-only memory is by definition never modified, and therefore can't conflict with anything. However, don't assume anything when AND addresses are involved and leave to the code below to determine @@ -2488,7 +2478,7 @@ true_dependence_1 (const_rtx mem, enum machine_mod stupid user tricks can produce them, so don't die. */ if (MEM_READONLY_P (x) GET_CODE (x_addr) != AND - GET_CODE (mem_addr
Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.
On Tue, Oct 14, 2014 at 9:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, It seems like I missed to post uppdated patch. On 25 Sep 20:11, Uros Bizjak wrote: I'd rather go with the second approach, it is less confusing from the maintainer POV. All other patterns with masking use some consistent template, so I'd suggest using the same approach for everything. If it is indeed too many patterns, then please split the patch to smaller pieces. Goal was not to decrease size of the patch, I wanted to make pattern look simpler by hiding masking stuff beyond `subst'. Anyway, I've updated the patch. Here it is (bootstrapped and regtested). Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator VI_AVX2): Extend to support AVX-512BW. (define_mode_iterator VI124_AVX2_48_AVX512F): Remove. (define_expand plusminus_insnmode3): Remove masking support. (define_insn *plusminus_insnmode3): Ditto. (define_expand plusminus_insnVI48_AVX512VL:mode3_mask): New. (define_expand plusminus_insnVI12_AVX512VL:mode3_mask): Ditto. (define_insn *plusminus_insnVI48_AVX512VL:mode3_mask): Ditto. (define_insn *plusminus_insnVI12_AVX512VL:mode3_mask): Ditto. (define_expand sse2_avx2_andnotmode3): Remove masking support. (define_insn *andnotmode3): Ditto. (define_expand sse2_avx2_andnotVI48_AVX512VL:mode3_mask): New. (define_expand sse2_avx2_andnotVI12_AVX512VL:mode3_mask): Ditto. (define_insn *andnotVI48_AVX512VL:mode3mask_name): Ditto. (define_insn *andnotVI12_AVX512VL:mode3mask_name): Ditto. (define_insn *absmode2): Remove masking support. (define_insn absVI48_AVX512VL:mode2_mask): New. (define_insn absVI12_AVX512VL:mode2_mask): Ditto. (define_expand absmode2): Use VI_AVX2 mode iterator. IMO, it seems much more readable this way. OK for mainline. Thanks, Uros.
Re: [PATCH i386 AVX512] [76/n] Extend int 2 float conversions.
On Wed, Oct 15, 2014 at 3:37 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7c34431..8a7853e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18811,6 +18811,19 @@ ix86_expand_vector_convert_uns_vsivsf (rtx target, rtx val) enum machine_mode fltmode = GET_MODE (target); rtx (*cvt) (rtx, rtx); Please handle this directly in floatunssseintvecmodelowermode2 expander. The V16SImode is already handled from there. Done. Bootstrapped. gcc/ * config/i386/sse.md (define_expand floatunssseintvecmodelowermode2): Extend to support AVX-512VL instructions. Is it ok for main trunk? OK. Thanks, Uros.
[PATCH, i386]: Fix PR 59432, sync/atomic FAILs on 32bit x86 systems without .cfi directives
Hello! Now that %ebx is no more fixed, we can remove all PIC related complications in atomic_compare_and_swapdwi_doubleword pattern. The immediate consequence is, that we avoid hidden xchgs that clobbered unwinding state. Earlier fix by Ian [1] partly solved this issue using various .cfi directives to fixup the mess, but these were not available on the systems without .cfi directives (e.g. Centos 5 and Solaris). The patch fixes this problem for good by removing problematic alternative that tried to skip %ebx allocations. 2014-10-15 Uros Bizjak ubiz...@gmail.com PR go/59432 * config/i386/sync.md (atomic_compare_and_swapdwi_doubleword): Remove the second alternative. (regprefix): Remove mode attribute. (atomic_compare_and_swapmode): Do not fixup operand 2. * config/i386/predicates.md (cmpxchg8b_pic_memory_operand): Remove. Revert: 2013-11-05 Ian Lance Taylor i...@google.com * config/i386/sync.md (atomic_compare_and_swapdwi_doubleword): If possible, add .cfi directives to record change to bx. * config/i386/i386.c (ix86_emit_cfi): New function. * config/i386/i386-protos.h (ix86_emit_cfi): Declare. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32} on Fedora 20 and Centos 5.11, where fixes Go sync/atomic failure on 32bit targets. Patch was committed to mainline SVN. Uros Index: predicates.md === --- predicates.md (revision 216245) +++ predicates.md (working copy) @@ -1092,43 +1092,6 @@ return parts.disp != NULL_RTX; }) -;; Return true if OP is memory operand which will need zero or -;; one register at most, not counting stack pointer or frame pointer. -(define_predicate cmpxchg8b_pic_memory_operand - (match_operand 0 memory_operand) -{ - struct ix86_address parts; - int ok; - - if (TARGET_64BIT || !flag_pic) -return true; - - ok = ix86_decompose_address (XEXP (op, 0), parts); - gcc_assert (ok); - - if (parts.base GET_CODE (parts.base) == SUBREG) -parts.base = SUBREG_REG (parts.base); - if (parts.index GET_CODE (parts.index) == SUBREG) -parts.index = SUBREG_REG (parts.index); - - if (parts.base == NULL_RTX - || parts.base == arg_pointer_rtx - || parts.base == frame_pointer_rtx - || parts.base == hard_frame_pointer_rtx - || parts.base == stack_pointer_rtx) -return true; - - if (parts.index == NULL_RTX - || parts.index == arg_pointer_rtx - || parts.index == frame_pointer_rtx - || parts.index == hard_frame_pointer_rtx - || parts.index == stack_pointer_rtx) -return true; - - return false; -}) - - ;; Return true if OP is memory operand that cannot be represented ;; by the modRM array. (define_predicate long_memory_operand Index: i386-protos.h === --- i386-protos.h (revision 216245) +++ i386-protos.h (working copy) @@ -142,7 +142,6 @@ extern void ix86_split_lshr (rtx *, rtx, enum mach extern rtx ix86_find_base_term (rtx); extern bool ix86_check_movabs (rtx, int); extern void ix86_split_idivmod (enum machine_mode, rtx[], bool); -extern bool ix86_emit_cfi (); extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot); extern int ix86_attr_length_immediate_default (rtx_insn *, bool); Index: sync.md === --- sync.md (revision 216245) +++ sync.md (working copy) @@ -351,10 +351,9 @@ else { enum machine_mode hmode = CASHMODEmode; - rtx lo_o, lo_e, lo_n, hi_o, hi_e, hi_n, mem; + rtx lo_o, lo_e, lo_n, hi_o, hi_e, hi_n; lo_o = operands[1]; - mem = operands[2]; lo_e = operands[3]; lo_n = operands[4]; hi_o = gen_highpart (hmode, lo_o); @@ -364,12 +363,9 @@ lo_e = gen_lowpart (hmode, lo_e); lo_n = gen_lowpart (hmode, lo_n); - if (!cmpxchg8b_pic_memory_operand (mem, MODEmode)) - mem = replace_equiv_address (mem, force_reg (Pmode, XEXP (mem, 0))); - emit_insn (gen_atomic_compare_and_swapmode_doubleword -(lo_o, hi_o, mem, lo_e, hi_e, lo_n, hi_n, operands[6])); +(lo_o, hi_o, operands[2], lo_e, hi_e, lo_n, hi_n, operands[6])); } ix86_expand_setcc (operands[0], EQ, gen_rtx_REG (CCZmode, FLAGS_REG), @@ -398,57 +394,27 @@ ;; That said, in order to take advantage of possible lower-subreg opts, ;; treat all of the integral operands in the same way. -;; Operands 5 and 6 really need to be different registers, which in -;; this case means op5 must not be ecx. If op5 and op6 are the same -;; (like when the input is -1LL) GCC might chose to allocate op5 to ecx, -;; like op6. This breaks, as the xchg will move the PIC register -;; contents to %ecx then -- boom. - (define_mode_attr doublemodesuffix [(SI 8) (DI 16)]) -(define_mode_attr regprefix [(SI e) (DI r)]) (define_insn
Re: [PATCH, i386]: Fix PR 59432, sync/atomic FAILs on 32bit x86 systems without .cfi directives
On Thu, Oct 16, 2014 at 11:06 AM, Andi Kleen a...@firstfloor.org wrote: Now that %ebx is no more fixed, we can remove all PIC related complications in atomic_compare_and_swapdwi_doubleword pattern. The immediate consequence is, that we avoid hidden xchgs that clobbered unwinding state. Could also do the same in cpuid.h now I am just writing the patch submission ;) Uros.
[RFC PATCH, i386]: Remove special PIC related __cpuid definitions from config/i386/cpuid.h
Hello! Now that %ebx is also allocatable in PIC modes, we can cleanup config/i386/cpuid considerably. I propose to remove all PIC related specializations of __cpuid and __cpuid_count and protect the compilation with #if __GNUC__ = 5. The only drawback would be that non-bootstrapped build with gcc 5.0 will ignore -march=native, but I think this should be acceptable. Bootstrapped build will still work as expected. 2014-10-16 Uros Bizjak ubiz...@gmail.com * config/i386/cpuid.h (__cpuid): Remove definitions that handle %ebx register in a special way. (__cpuid_count): Ditto. * config/i386/driver-i386.h: Protect with #if __GNUC__ = 5. (host_detect_local_cpu): Mention that GCC that is able to handle %ebx register in PIC and non-PIC modes is required to compile the function. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Any comments? Uros. Index: config/i386/cpuid.h === --- config/i386/cpuid.h (revision 216282) +++ config/i386/cpuid.h (working copy) @@ -146,56 +146,7 @@ #define signature_VORTEX_ecx 0x436f5320 #define signature_VORTEX_edx 0x36387865 -#if defined(__i386__) defined(__PIC__) -/* %ebx may be the PIC register. */ -#if __GNUC__ = 3 #define __cpuid(level, a, b, c, d) \ - __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \ - cpuid\n\t \ - xchg{l}\t{%%}ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \ - cpuid\n\t \ - xchg{l}\t{%%}ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#else -/* Host GCCs older than 3.0 weren't supporting Intel asm syntax - nor alternatives in i386 code. */ -#define __cpuid(level, a, b, c, d) \ - __asm__ (xchgl\t%%ebx, %k1\n\t \ - cpuid\n\t \ - xchgl\t%%ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchgl\t%%ebx, %k1\n\t \ - cpuid\n\t \ - xchgl\t%%ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#endif -#elif defined(__x86_64__) (defined(__code_model_medium__) || defined(__code_model_large__)) defined(__PIC__) -/* %rbx may be the PIC register. */ -#define __cpuid(level, a, b, c, d) \ - __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \ - cpuid\n\t \ - xchg{q}\t{%%}rbx, %q1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \ - cpuid\n\t \ - xchg{q}\t{%%}rbx, %q1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#else -#define __cpuid(level, a, b, c, d) \ __asm__ (cpuid\n\t \ : =a (a), =b (b), =c (c), =d (d) \ : 0 (level)) @@ -204,8 +155,8 @@ __asm__ (cpuid\n\t \ : =a (a), =b (b), =c (c), =d (d) \ : 0 (level), 2 (count)) -#endif + /* Return highest supported input value for cpuid instruction. ext can be either 0x0 or 0x800 to return highest supported value for basic or extended cpuid information. Function returns 0 if cpuid Index: config/i386/driver-i386.c === --- config/i386/driver-i386.c (revision 216282) +++ config/i386/driver-i386.c (working copy) @@ -24,7 +24,7 @@ along with GCC; see the file COPYING3. If not see const char *host_detect_local_cpu (int argc, const char **argv); -#ifdef __GNUC__ +#if __GNUC__ = 5 #include cpuid.h struct cache_desc @@ -942,9 +942,10 @@ done: } #else -/* If we aren't compiling with GCC then the driver will just ignore - -march and -mtune native target and will leave to the newly - built compiler to generate code for its default target. */ +/* If we aren't compiling with GCC that is able to handle %EBX + register in PIC and non-PIC modes, then the driver will just + ignore -march and -mtune native target and will leave to the + newly built compiler to generate code for its default target. */ const char *host_detect_local_cpu (int, const char **) {
Re: [RFC PATCH, i386]: Remove special PIC related __cpuid definitions from config/i386/cpuid.h
On Thu, Oct 16, 2014 at 11:36 AM, Jakub Jelinek ja...@redhat.com wrote: Now that %ebx is also allocatable in PIC modes, we can cleanup config/i386/cpuid considerably. I propose to remove all PIC related specializations of __cpuid and __cpuid_count and protect the compilation with #if __GNUC__ = 5. The only drawback would be that non-bootstrapped build with gcc 5.0 will ignore -march=native, but I think this should be acceptable. I'm worried about that. Can't you instead keep the current cpuid.h stuff as is, just add __GNUC__ 5 to that, so it treats GCC 5+ PIC as if __PIC__ wasn't defined? Or, at least use cpuid.h even for older GCC if __PIC__ is not defined (or __x86_64__ is defined and not medium/large PIC model)? Do we really care that much about non-bootstrapped build? I don't see At least on Linux, driver-i386.c should not be built with PIC normally, so at least changing #if __GNUC__ = 5 to #if defined(__GNUC__) (__GNUC__ = 5 || !defined(__PIC__)) would limit the -march=native change for non-bootstrapped compilers to Darwin only (or what other targets use PIC by default?). Yes, this would work for me - the goal is to keep only one universal __cpuid (and __cpuid_count) define, and the above condition fits this goal. Uros.
Re: [PATCH i386 AVX512] [79/n] Extend expand_mul_widen_hilo.
On Thu, Oct 16, 2014 at 8:28 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends expand_mul_widen_hilo to 512-bit QI,SI,HI modes. Bootstrapped and regtested gcc/ * config/i386/i386.c (ix86_expand_mul_widen_hilo): Handle V32HI, V16SI, V64QI modes. Is it ok for trunk? OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [78/n] Use blend for inserting.
On Thu, Oct 16, 2014 at 9:28 AM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Oct 16, 2014 at 10:24:45AM +0400, Kirill Yukhin wrote: Hello, This patch extends insertion hook. AVX-512* tests on top of patch-set all pass under simulator. gcc/ * config/i386/i386.c (ix86_expand_vector_set): Handle V8DF, V8DI, V16SF, V16SI, V32HI, V64QI modes. Just a ChangeLog comment style (seen in several entries you've committed and several posted patches). Please don't put a line break right after the filename if the (functionname): part fits nicely on the same line, the description can be wrapped anywhere as appropriate. In this case, * config/i386/i386.c (ix86_expand_vector_set): Handle V8DF, V8DI, V16SF, V16SI, V32HI, V64QI modes. is shorter and more readable. Other than that, this particular patch LGTM (unless we'd want for the 4 mostly repetitious cases add a common handling spot, which would need the gen fnpointer and kmode vars set before goto), but I'll leave it to Uros to ack it. Let's leave this as it is for now. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [80/n] Extend expand_sse2_mulvxdi3.
On Thu, Oct 16, 2014 at 1:55 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends expand_sse2_mulvxdi3. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Extend expand_sse2_mulvxdi3. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1ee947a..945bc8d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -45667,7 +45667,19 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) enum machine_mode mode = GET_MODE (op0); rtx t1, t2, t3, t4, t5, t6; - if (TARGET_XOP mode == V2DImode) + if (TARGET_AVX512DQ) +{ + if (mode == V8DImode) + emit_insn (gen_avx512dq_mulv8di3 (op0, op1, op2)); + else if (TARGET_AVX512VL) + { + if (mode == V4DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); + else if (mode == V2DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); Should this be v2di ? + } +} + else if (TARGET_XOP mode == V2DImode) { /* op1: A,B,C,D, op2: E,F,G,H */ op1 = gen_lowpart (V4SImode, op1); Please use function pointers in the added part. Thanks, Uros.
Re: [RFC PATCH, i386]: Remove special PIC related __cpuid definitions from config/i386/cpuid.h
On Thu, Oct 16, 2014 at 12:25 PM, Uros Bizjak ubiz...@gmail.com wrote: Now that %ebx is also allocatable in PIC modes, we can cleanup config/i386/cpuid considerably. I propose to remove all PIC related specializations of __cpuid and __cpuid_count and protect the compilation with #if __GNUC__ = 5. The only drawback would be that non-bootstrapped build with gcc 5.0 will ignore -march=native, but I think this should be acceptable. I'm worried about that. Can't you instead keep the current cpuid.h stuff as is, just add __GNUC__ 5 to that, so it treats GCC 5+ PIC as if __PIC__ wasn't defined? Or, at least use cpuid.h even for older GCC if __PIC__ is not defined (or __x86_64__ is defined and not medium/large PIC model)? Do we really care that much about non-bootstrapped build? I don't see At least on Linux, driver-i386.c should not be built with PIC normally, so at least changing #if __GNUC__ = 5 to #if defined(__GNUC__) (__GNUC__ = 5 || !defined(__PIC__)) would limit the -march=native change for non-bootstrapped compilers to Darwin only (or what other targets use PIC by default?). Yes, this would work for me - the goal is to keep only one universal __cpuid (and __cpuid_count) define, and the above condition fits this goal. I have committed the attached patch to mainline SVN. 2014-10-17 Uros Bizjak ubiz...@gmail.com * config/i386/cpuid.h (__cpuid): Remove definitions that handle %ebx register in a special way. (__cpuid_count): Ditto. * config/i386/driver-i386.h: Protect with #if defined(__GNUC__) (__GNUC__ = 5 || !defined(__PIC__)). (host_detect_local_cpu): Mention that GCC with non-fixed %ebx is required to compile the function. Bootstrapped and regression tested on x86_64-linux-gnu. Uros. Index: config/i386/cpuid.h === --- config/i386/cpuid.h (revision 216298) +++ config/i386/cpuid.h (working copy) @@ -146,56 +146,7 @@ #define signature_VORTEX_ecx 0x436f5320 #define signature_VORTEX_edx 0x36387865 -#if defined(__i386__) defined(__PIC__) -/* %ebx may be the PIC register. */ -#if __GNUC__ = 3 #define __cpuid(level, a, b, c, d) \ - __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \ - cpuid\n\t \ - xchg{l}\t{%%}ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \ - cpuid\n\t \ - xchg{l}\t{%%}ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#else -/* Host GCCs older than 3.0 weren't supporting Intel asm syntax - nor alternatives in i386 code. */ -#define __cpuid(level, a, b, c, d) \ - __asm__ (xchgl\t%%ebx, %k1\n\t \ - cpuid\n\t \ - xchgl\t%%ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchgl\t%%ebx, %k1\n\t \ - cpuid\n\t \ - xchgl\t%%ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#endif -#elif defined(__x86_64__) (defined(__code_model_medium__) || defined(__code_model_large__)) defined(__PIC__) -/* %rbx may be the PIC register. */ -#define __cpuid(level, a, b, c, d) \ - __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \ - cpuid\n\t \ - xchg{q}\t{%%}rbx, %q1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \ - cpuid\n\t \ - xchg{q}\t{%%}rbx, %q1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#else -#define __cpuid(level, a, b, c, d) \ __asm__ (cpuid\n\t \ : =a (a), =b (b), =c (c), =d (d) \ : 0 (level)) @@ -204,8 +155,8 @@ __asm__ (cpuid\n\t \ : =a (a), =b (b), =c (c), =d (d) \ : 0 (level), 2 (count)) -#endif + /* Return highest supported input value for cpuid instruction. ext can be either 0x0 or 0x800 to return highest supported value for basic or extended cpuid information. Function returns 0 if cpuid Index: config/i386/driver-i386.c
Re: [PATCH i386 AVX512] [80/n] Extend expand_sse2_mulvxdi3.
On Fri, Oct 17, 2014 at 2:32 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, On 16 Oct 14:29, Uros Bizjak wrote: + if (mode == V4DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); + else if (mode == V2DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); Should this be v2di ? Right, copy-and-paste :( + } +} + else if (TARGET_XOP mode == V2DImode) { /* op1: A,B,C,D, op2: E,F,G,H */ op1 = gen_lowpart (V4SImode, op1); Please use function pointers in the added part. Done. Updated patch in the bottom. Is it ok? OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [75/n] Update vec_init.
On Fri, Oct 17, 2014 at 2:57 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 17, 2014 at 04:28:12PM +0400, Kirill Yukhin wrote: I wonder whether for these modes it can ever be beneficial to build them through interleaves/concatenations etc., if it wouldn't be better to build them by storing all values into memory and just reading it back. I've tried this example: #include immintrin.h unsigned char a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63; __m512i foo () { return __extension__ (__m512i)(__v64qi){ a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63 }; } w/ and w/o -mavx512bw (and always -mavx512f). When, this code works, we've got 127 lines of assembly to do this init. W/o AVX-512BW we've got 300 lines of code (mostly on GPRs, using sal, and etc.) Then I've looked into actual assembly w/ -mavx512bw and it turns out that no AVX-512BW insn were generated, only AVX-512F (and below). Fixed iterator. Ok, if it is shorter than copying all those into memory and reading from memory, so be it. -(define_mode_iterator VI48F_512 [V16SI V16SF V8DI V8DF]) +(define_mode_iterator VI48F_I12_AVX512BW + [V16SI V16SF V8DI V8DF + (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512BW)]) What does the I12 stand for? Wasn't it meant to be VI48F_512_AVX512BW or I512? Actually, I am not awere of any name convention for iterators. As far as I understand, name [more or less] for vector mode should reflect: - Type family of the unit: float or int - Size of the unit: 1, 2, 4 etc. bytes - If possible, target predicates to enable certain modes in given iterator. The name is: - Vector (V) - I48F - contains both ints and floats of size 4 and 8 - I12 - contains ints of size 1 and 2 - AVX512BW - affected by the target (according to previous note - to be removed) Maybe it'll be better to name it: VF48_I1248? I'll leave that to Uros, the patch is ok by me. Don't want to bikeshed, but VF48_I1248 looks somehow better to me. Anyway, the patch is OK even without this change. Thanks, Uros.
Re: [PATCH i386 AVX512 Boostrap] [80/n] Extend expand_sse2_mulvxdi3.
On Fri, Oct 17, 2014 at 4:25 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This is fix for bootstrap failure. Is it OK? gcc/ * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Refactor conditions to fix bootstrap. Well, OK. Uros.
Re: [PATCH 5/5] Use preferred_for_speed in i386.md
On Fri, Oct 17, 2014 at 4:54 PM, Richard Sandiford richard.sandif...@arm.com wrote: Undo the original fix for 61630 and use preferred_for_speed in the problematic pattern. I've not written many gcc.target/i386 tests so the markup might need some work. Richard gcc/ * lra.c (lra): Remove call to recog_init. * config/i386/i386.md (preferred_for_speed): New attribute (*floatSWI48:modeMODEF:mode2_sse): Override it instead of enabled. gcc/testsuite/ * gcc.target/i386/conversion-2.c: New test. Please use the attached testcase that is also compatible with 32bit compiles (and removes unnecessary comments in the asm). Please also mention the revert in ChangeLog. OK for x86 part with these changes. Thanks, Uros. /* { dg-do compile } */ /* { dg-options -O2 -fno-toplevel-reorder -mtune=bdver2 } */ /* { dg-additional-options -mregparm=1 -msse -mfpmath=sse { target ia32 } } */ void __attribute__ ((hot)) f1 (int x) { register float f asm (%xmm0) = x; asm volatile ( :: x (f)); } void __attribute__ ((cold)) f2 (int x) { register float f asm (%xmm1) = x; asm volatile ( :: x (f)); } void __attribute__ ((hot)) f3 (int x) { register float f asm (%xmm2) = x; asm volatile ( :: x (f)); } void __attribute__ ((cold)) f4 (int x) { register float f asm (%xmm3) = x; asm volatile ( :: x (f)); } /* { dg-final { scan-assembler sp\\\), %xmm0 } } */ /* { dg-final { scan-assembler (ax|di), %xmm1 } } */ /* { dg-final { scan-assembler sp\\\), %xmm2 } } */ /* { dg-final { scan-assembler (ax|di), %xmm3 } } */
Re: [x86] Replace builtins with vector extensions
On Sun, Oct 12, 2014 at 10:36 PM, Marc Glisse marc.gli...@inria.fr wrote: for the first patch, it is actually easier to repost the old patch with some new testcases. That doesn't mean it has to go in all at once, but you can comment on various things. If that's a problem I'll separate them and repost separately. For simple +-*/ for double, I mostly wonder if __A + __B is good enough or if (__m128d)((__v2df)__A + (__v2df)__B) would be better. Note that for integer vectors (future patch) those casts will be necessary. Maybe I should write the integer patch, for a single header, so you can compare. This is a header, so I think some more casts won't hurt. Also for consistency with integer ops, as you mentioned. I implemented _mm_cvtsi128_si64 with simply __A[0], but the corresponding V4SI operation will need ((__v4si)__A)[0], and I don't know if it is better to cast everywhere for uniformity, or just where it is necessary. I doubt it matters much for the generated code. Let's go with more casts, just to avoid surprises. This is a header after all. Since we are keeping the builtins for Ada, it would be possible to follow Ulrich's suggestion and keep the old version of the intrinsics, protected by a macro. I would not like that much... Nope. I also hope that Ada will convert to ppc's approach someday, so these builtins will be removed. Also, please note that builtins are not published interface. They change from time to time ;) Something like _mm_load[hl]_pd is in my opinion roughly where the complexity limit should be. They would be nice to have, I expect the compiler will almost always generate a sequence at least as good, but we are getting quite far from the user's code so the likelihood of pessimizing somehow may increase. Also, an alternate implementation would be __A[i] = *__B, but IIRC the middle-end optimizers have a harder time handling this form. Those 2 intrinsics should probably not be included in the first batch, but I wanted to show them. Let's start without them, so we convert simple arithmetic first. All testcases fail without the patch. The 3rd testcase almost works, there is a note in RTL saying that the result is 0.0, but it doesn't use it. (I just noticed that all functions are still called myfma, I'll rename those not related to fma to 'f') If I omit the avx512fintrin.h part, I think it is very unlikely this can conflict with Kirill's work in any way (but I can still wait / use a branch). While looking correct, I am a bit nervous about avx512fintrin.h changes, mainly because I have not much experience with these patterns. I have adder Kirill to CC for possible comments. Bootstrap+testsuite on x86_64-linux-gnu. 2014-10-13 Marc Glisse marc.gli...@inria.fr gcc/ * config/i386/xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps, _mm_div_ps, _mm_store_ss, _mm_cvtss_f32): Use vector extensions instead of builtins. * config/i386/avxintrin.h (_mm256_add_pd, _mm256_add_ps, _mm256_div_pd, _mm256_div_ps, _mm256_mul_pd, _mm256_mul_ps, _mm256_sub_pd, _mm256_sub_ps): Likewise. * config/i386/avx512fintrin.h (_mm512_add_pd, _mm512_add_ps, _mm512_sub_pd, _mm512_sub_ps, _mm512_mul_pd, _mm512_mul_ps, _mm512_div_pd, _mm512_div_ps): Likewise. * config/i386/emmintrin.h (_mm_store_sd, _mm_cvtsd_f64, _mm_storeh_pd, _mm_cvtsi128_si64, _mm_cvtsi128_si64x, _mm_add_pd, _mm_sub_pd, _mm_mul_pd, _mm_div_pd, _mm_storel_epi64, _mm_movepi64_pi64, _mm_loadh_pd, _mm_loadl_pd): Likewise. (_mm_sqrt_sd): Fix comment. gcc/testsuite/ * gcc.target/i386/intrinsics_opt-1.c: New testcase. * gcc.target/i386/intrinsics_opt-2.c: Likewise. * gcc.target/i386/intrinsics_opt-3.c: Likewise. * gcc.target/i386/intrinsics_opt-4.c: Likewise. I don't have many comments on simple arithmetic, and changes look trivial. I'd say that simple arithmetic part is OK for branch. One last note: scalar builtins preserve highpart of target register. IIRC, I have tried to convert _mm_frcz_s{s,d} to use scalars, but resulted in a horrible code. Current approach uses __builtin_ia32_movs{s,d} to generate optimal code, but I didn't test if current gcc improved in this part. Thanks, Uros. -- Marc Glisse Index: gcc/config/i386/avx512fintrin.h === --- gcc/config/i386/avx512fintrin.h (revision 216116) +++ gcc/config/i386/avx512fintrin.h (working copy) @@ -10742,26 +10742,21 @@ _mm512_maskz_sqrt_ps (__mmask16 __U, __m (__v16sf) _mm512_setzero_ps (), (__mmask16) __U, _MM_FROUND_CUR_DIRECTION); } extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__,
[PATCH, rtl-optimization]: Remove const_alias_set
Hello! The fix that fixed scheduler issues with AND addresses (the fix prevented early exit for MEM_READONLY_P addresses when AND alignment addresses were involved) caused some fall-out for libgo testsuite. These tests triggered an assert in mems_in_disjoint_alias_sets_p, which checks for zero alias set when flag_strict_aliasing is false. We have had some off-list discussion with Ian Lance Taylor about this issue. The problem was, that Go dynamically switches off flag_strict_aliasing after compilation started and when unsafe package is imported (similar to when__attribute__ ((optimize (-fno-strict-aliasing))) is used in c). To mitigate this issue, the Go frontend called varasm_init_once again to recalculated (= cleared) const_alias_set in this case. As observed in [1], the fix for canon_true_depence [2] that introduced quick exit for a MEM_READONLY_P operands made const_alias_set redundant, it is no longer user for anything. The patch that fixed scheduling of AND operands removed early MEM_READONLY_P exit for memory operands with AND realignment, so operands could reach more complex code later in the function that was able to determine dependence of memory operands. This code includes the call to mems_in_disjoint_alias_sets_p, and the assert triggered again for some MEM_READONLY_P operands that have had non-zero alias set, set from the value, cached in const_alias_set from before flag_strict_aliasing flag was cleared. The proposed solution is to remove const_alias_set altogether. The MEM_READONLY_P successfully supersedes const_alias_set functionality, and this is also confirmed by the removal of the second varasm_init_once call in the Go frontend. In an off-list discussion, Ian agrees that attached patch should also fix the problem. 2014-10-19 Uros Bizjak ubiz...@gmail.com * varasm.c (const_alias_set): Remove. (init_varasm_once): Remove initialization of const_alias_set. (build_constant_desc): Do not set alias set to const_alias_set. The patch was tested on alpha-linux-gnu [3], alphaev68-linux-gnu and x86_64-linux-gnu {,-m32} for all default languages plus Go and obj-c++. The patch fixes all mentioned libgo failures on alpha. OK for mainline? [1] https://gcc.gnu.org/ml/gcc-patches/2013-07/msg01033.html [2] https://gcc.gnu.org/ml/gcc-patches/2010-07/msg01758.html [3] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg02041.html Uros. Index: varasm.c === --- varasm.c(revision 216362) +++ varasm.c(working copy) @@ -98,11 +98,6 @@ tree last_assemble_variable_decl; bool first_function_block_is_cold; -/* We give all constants their own alias set. Perhaps redundant with - MEM_READONLY_P, but pre-dates it. */ - -static alias_set_type const_alias_set; - /* Whether we saw any functions with no_split_stack. */ static bool saw_no_split_stack; @@ -3231,7 +3226,6 @@ build_constant_desc (tree exp) rtl = gen_const_mem (TYPE_MODE (TREE_TYPE (exp)), symbol); set_mem_attributes (rtl, exp, 1); set_mem_alias_set (rtl, 0); - set_mem_alias_set (rtl, const_alias_set); /* We cannot share RTX'es in pool entries. Mark this piece of RTL as required for unsharing. */ @@ -5928,7 +5922,6 @@ init_varasm_once (void) object_block_htab = hash_tableobject_block_hasher::create_ggc (31); const_desc_htab = hash_tabletree_descriptor_hasher::create_ggc (1009); - const_alias_set = new_alias_set (); shared_constant_pool = create_constant_pool (); #ifdef TEXT_SECTION_ASM_OP
Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.
On Mon, Oct 20, 2014 at 3:41 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Oct 20, 2014 at 05:30:36PM +0400, Kirill Yukhin wrote: Unfortunately this caused PR63600. The problem is that VI_AVX2 mode iterator includes V2DI and for AVX2 also V4DI, but for pre-ssse3 ix86_expand_sse2_abs doesn't handle V2DI (and can't easily, we don't have PSRAQ instruction), for ssse3 there is no vpabsq instruction, and for avx2 neither. We can handle V2DI/V4DI only for TARGET_AVX512VL, and V8DI for TARGET_AVX512F. Thus, IMHO the mode iterator on at least (define_insn *absmode2 and on (define_expand absmode2 is wrong, should not include V2DI/V4DI unless TARGET_AVX512VL (so new (or ressurrected, was that VI124_AVX2_48_AVX512F?) specialized mode iterator?). This patch removes absq insn patterns for non-AVX-512 targets. gcc/ * config/i386/sse.md (define_mode_iterator VI_AVX2): Restore to 128-, 256- bit integer modes only. (define_mode_iterator VI_AVX2_AVX512): New. (define_expand negmode2): Use VI_AVX2_AVX512 mode iterator. (define_expand plusminus_insnmode3): Ditto. (define_insn *plusminus_insnmode3): Ditto. (define_expand sse2_avx2_andnotmode3): Ditto. (define_mode_iterator VI1248_AVX512VL_AVX512BW): New. (define_insn absVI1248_AVX512VL_AVX512BW:mode2): Ditto. Bootstrap in progress. AVX-512 tests pass. Is it ok for trunk? I'll certainly leave the review to Uros, whatever he prefers. That said, I was expecting you'd keep VI_AVX2 as is (because from the patch clearly that is what is used most commonly, the V?DI modes are for most insns normal integral vector modes, VI* uses the same modes and VI_AVX2 used to be just like VI, just with TARGET_AVX conditions replaced with TARGET_AVX2), and just add a new mode iterator for the two abs patterns (*absmode2 and absmode2), it can be specialized mode iterator just for the abs with ABS in names or something. Yes, I like this idea, too. Just add IV1248_AVX512VL_AVX512BW and use it in abs patterns. The changed patch is pre-approved, but please still make full bootstrap and regtest cycle. Thanks, Uros.
Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).
On Mon, Oct 20, 2014 at 5:19 PM, Ilya Tocar tocarip.in...@gmail.com wrote: The patch is OK with the above improvement. Will commit version below, if no objections in 24 hours. Sorry, I've missed palignr, which should also have v64qi version, and lost return in expand_vec_perm_palignr case (this caused avx512f-vec-unpack test failures). Patch below fixes it. Ok for trunk? 2014-10-20 Ilya Tocar ilya.to...@intel.com * config/i386/i386.c (expand_vec_perm_1): Fix expand_vec_perm_palignr case. * config/i386/sse.md (ssse3_avx2_palignrmode_mask): Use VI1_AVX512. OK. Thanks, Uros.
[PATCH, fixincludes]: Add pthread.h to glibc_c99_inline_4 fix
On Thu, Oct 16, 2014 at 2:05 PM, Jakub Jelinek ja...@redhat.com wrote: Recent change caused bootstrap failure on CentOS 5.11: /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. unwind-dw2-fde-dip_s.o: In function `__pthread_cleanup_routine': unwind-dw2-fde-dip.c:(.text+0x1590): multiple definition of `__pthread_cleanup_routine' /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. unwind-sjlj_s.o: In function `__pthread_cleanup_routine': unwind-sjlj.c:(.text+0x0): multiple definition of `__pthread_cleanup_routine' unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. emutls_s.o: In function `__pthread_cleanup_routine': emutls.c:(.text+0x170): multiple definition of `__pthread_cleanup_routine' unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here collect2: error: ld returned 1 exit status gmake[5]: *** [libgcc_s.so] Error 1 $ ld --version GNU ld version 2.17.50.0.6-26.el5 20061020 It looks like a switch-to-c11 fallout. Older glibc versions have issues with c99 (and c11) conformance [1]. Changing extern __inline void __pthread_cleanup_routine (...) in system /usr/include/pthread.h to if __STDC_VERSION__ 199901L extern #endif __inline__ void __pthread_cleanup_routine (...) fixes this issue and allows bootstrap to proceed. However, fixincludes is not yet built in stage1 bootstrap. Is there a way to fix this issue without changing system headers? [1] https://gcc.gnu.org/ml/gcc-patches/2006-11/msg01030.html Yeah, old glibcs are totally incompatible with -fno-gnu89-inline. Not sure if it is easily fixincludable, if yes, then -fgnu89-inline should be used for code like libgcc which is built with the newly built compiler before it is fixincluded. Or we need -fgnu89-inline by default for old glibcs (that is pretty much what we do e.g. in Developer Toolset for RHEL5). At the end of the day, adding pthread.h to glibc_c99_inline_4 fix fixes the bootstrap. The fix applies __attribute__((__gnu_inline__)) to the declaration: extern __inline __attribute__ ((__gnu_inline__)) void __pthread_cleanup_routine (struct __pthread_cleanup_frame *__frame) 2014-10-21 Uros Bizjak ubiz...@gmail.com * inclhack.def (glibc_c99_inline_4): Add pthread.h to files. * fixincl.x: Regenerate. Bootstrapped and regression tested on CentOS 5.11 x86_64-linux-gnu {,-m32}. OK for mainline? Uros. Index: fixincl.x === --- fixincl.x (revision 216501) +++ fixincl.x (working copy) @@ -2,11 +2,11 @@ * * DO NOT EDIT THIS FILE (fixincl.x) * - * It has been AutoGen-ed August 12, 2014 at 02:09:58 PM by AutoGen 5.12 + * It has been AutoGen-ed October 21, 2014 at 10:18:16 AM by AutoGen 5.16.2 * From the definitionsinclhack.def * and the template file fixincl */ -/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Aug 12 14:09:58 MSK 2014 +/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Oct 21 10:18:17 CEST 2014 * * You must regenerate it. Use the ./genfixes script. * @@ -3173,7 +3173,7 @@ * File name selection pattern */ tSCC zGlibc_C99_Inline_4List[] = - sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0; + sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0pthread.h\0*/pthread.h\0; /* * Machine/OS name selection pattern */ Index: inclhack.def === --- inclhack.def(revision 216501) +++ inclhack.def(working copy) @@ -1687,7 +1687,8 @@ */ fix = { hackname = glibc_c99_inline_4; -files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h'; +files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h', +pthread.h, '*/pthread.h'; bypass= __extern_inline|__gnu_inline__; select= (^| )extern __inline; c_fix = format;
Re: [PATCH] Fix ubsan i?86 {add,sub,mul}vmode4 patterns
On Tue, Mar 25, 2014 at 8:18 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Mar 25, 2014 at 04:06:40PM +0100, Jakub Jelinek wrote: On Tue, Mar 25, 2014 at 12:39:18PM +0100, Uros Bizjak wrote: The patch is OK in principle, but we could follow established practice and use separate predicates - please see general_szext_operand mode attribute definition. So like this? I've tried to use non-VOIDmode of the predicates that were used previously (i.e. general_operand or x86_64_general_operand). 2014-03-25 Jakub Jelinek ja...@redhat.com * config/i386/i386.md (general_sext_operand): New mode attr. (addvmode4, subvmode4, mulvmode4): If operands[2] is CONST_INT, don't generate (sign_extend (const_int)). (*addvmode4, *subvmode4, *mulvmode4): Disallow CONST_INT_P operands[2]. Use We constraint instead of i and general_sext_operand predicate instead of general_operand. (*addvmode4_1, *subvmode4_1, *mulvmode4_1): New insns. * config/i386/constraints.md (We): New constraint. * config/i386/predicates.md (x86_64_sext_operand, sext_operand): New predicates. Now successfully bootstrapped/regtested on x86_64-linux and i686-linux. The patch is OK for mainline. Thanks, Uros.