[PATCH][x86] Remove duplicated headers includes

2018-05-30 Thread Peryt, Sebastian
Hi,

I have made some cleaning to remove redundancy in includes call of some of the 
headers in x86intrin.h.
Removed headers were included in both x86intrin.h and immintrin.h which is 
included into x86intrin.h.

Is it ok for trunk?

2018-05-30  Sebastian Peryt  

gcc/

* config/i386/cldemoteintrin.h: Change define from _X86INTRIN_H_INCLUDED
to _IMMINTRIN_H_INCLUDED.
* config/i386/pconfigintrin.h: Ditto.
* config/i386/waitpkgintrin.h: Ditto.
* config/i386/immintrin.h: Add includes for sgxintrin.h,
pconfigintrin.h, waitpkgintrin.h and cldemoteintrin.h.
* config/i386/x86intrin.h: Remove includes for mintrin.h, xmmintrin.h,
emmintrin.h, pmmintrin.h, tmmintrin.h, smmintrin.h, wmmintrin.h,
bmiintrin.h, bmi2intrin.h, lzcntintrin.h, sgxintrin.h, pconfigintrin.h,
waitpkgintrin.h and cldemoteintrin.h.

Thanks,
Sebastian




0001-Headers-changes.patch
Description: 0001-Headers-changes.patch


RE: [RFT PATCH, AVX512]: Implement scalar unsigned int->float conversions with AVX512F

2018-05-23 Thread Peryt, Sebastian
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Tuesday, May 22, 2018 8:43 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Peryt, Sebastian <sebastian.pe...@intel.com>; Jakub Jelinek
> <ja...@redhat.com>
> Subject: [RFT PATCH, AVX512]: Implement scalar unsigned int->float conversions
> with AVX512F
> 
> Hello!
> 
> Attached patch implements scalar unsigned int->float conversions with
> AVX512F.
> 
> 2018-05-22  Uros Bizjak  <ubiz...@gmail.com>
> 
> * config/i386/i386.md (*floatuns2_avx512):
> New insn pattern.
> (floatunssi2): Also enable for AVX512F and TARGET_SSE_MATH.
> Rewrite expander pattern.  Emit gen_floatunssi2_i387_with_xmm
> for non-SSE modes.
> (floatunsdisf2): Rewrite expander pattern.  Hanlde TARGET_AVX512F.
> (floatunsdidf2): Ditto.
> 
> testsuite/ChangeLog:
> 
> 2018-05-22  Uros Bizjak  <ubiz...@gmail.com>
> 
> * gcc.target/i386/cvt-3.c: New test.
> 
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}., but
> not tested on AVX512 target.

I have checked it on x86_64-linux-gnu {,-m32} on SKX and don't see any 
stability regressions.

Sebastian

> 
> Uros.


RE: [RFT PATCH, AVX512]: Implement scalar float->unsigned int truncations with AVX512F

2018-05-23 Thread Peryt, Sebastian
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- 
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Monday, May 21, 2018 9:55 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jakub Jelinek ; Kirill Yukhin 
> 
> Subject: Re: [RFT PATCH, AVX512]: Implement scalar float->unsigned int 
> truncations with AVX512F
> 
> On Mon, May 21, 2018 at 4:53 PM, Uros Bizjak  wrote:
> > Hello!
> >
> > Attached patch implements scalar float->unsigned int truncations 
> > with
> AVX512F.
> >
> > 2018-05-21  Uros Bizjak  
> >
> > * config/i386/i386.md (fixuns_truncdi2): New insn pattern.
> > (fixuns_truncsi2_avx512f): Ditto.
> > (*fixuns_truncsi2_avx512f_zext): Ditto.
> > (fixuns_truncsi2): Also enable for AVX512F and TARGET_SSE_MATH.
> > Emit fixuns_truncsi2_avx512f for AVX512F targets.
> >
> > testsuite/ChangeLog:
> >
> > 2018-05-21  Uros Bizjak  
> >
> > * gcc.target/i386/cvt-2.c: New test.
> >
> > Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Unfortunately, I have to means to test the patch on AVX512 target, 
> > so to avoid some hidden issue, I'd like to ask someone to test it on 
> > live target.

I've bootstrapped and regression tested your patch on x86_64-linux-gnu {,-m32} 
on SKX machine and I don't see any stability regression.

Sebastian

> 
> Ops, ssemodesuffix handling was missing in the insn mnemonic. Fixed in 
> the attached v-2 patch.
> 
> Uros.


RE: [PATCH][i386] Adding WAITPKG instructions

2018-05-10 Thread Peryt, Sebastian
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Thursday, May 10, 2018 3:26 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][i386] Adding WAITPKG instructions
> 
> On Thu, May 10, 2018 at 2:50 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > Hi Uros,
> >
> > Updated patch attached, please find comments below.
> >
> >> -Original Message-
> >> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> >> Sent: Wednesday, May 9, 2018 1:47 PM
> >> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> >> Subject: Re: [PATCH][i386] Adding WAITPKG instructions
> >>
> >> On Tue, May 8, 2018 at 1:34 PM, Peryt, Sebastian
> >> <sebastian.pe...@intel.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > This patch adds support for WAITPKG instructions.
> >> >
> >> > Is it ok for trunk and after few day for backport to GCC-8?
> >> >
> > (Removed)
> >> >
> >> >
> >>
> >> +case IX86_BUILTIN_UMONITOR:
> >> +  arg0 = CALL_EXPR_ARG (exp, 0);
> >> +  op0 = expand_normal (arg0);
> >> +  if (!REG_P (op0))
> >> +op0 = ix86_zero_extend_to_Pmode (op0);
> >> +
> >> +  emit_insn (ix86_gen_umonitor (op0));
> >> +  return 0;
> >>
> >> Please see how movdir64b handles its address operand. Also, do not
> >> use global ix86_gen_monitor, just expand directly in the same way as
> movdir64b.
> >>
> >
> > Fixed.
> >
> >> +case IX86_BUILTIN_UMWAIT:
> >> +case IX86_BUILTIN_TPAUSE:
> >> +  rtx eax, edx, op1_lo, op1_hi;
> >> +  arg0 = CALL_EXPR_ARG (exp, 0);
> >> +  arg1 = CALL_EXPR_ARG (exp, 1);
> >> +  op0 = expand_normal (arg0);
> >> +  op1 = expand_normal (arg1);
> >> +  eax = gen_rtx_REG (SImode, AX_REG);
> >> +  edx = gen_rtx_REG (SImode, DX_REG);
> >> +  if (!REG_P (op0))
> >> +op0 = copy_to_mode_reg (SImode, op0);
> >> +  if (!REG_P (op1))
> >> +op1 = copy_to_mode_reg (DImode, op1);
> >> +  op1_lo = gen_lowpart (SImode, op1);
> >> +  op1_hi = expand_shift (RSHIFT_EXPR, DImode, op1,
> >> + GET_MODE_BITSIZE (SImode), 0, 1);
> >> +  op1_hi = convert_modes (SImode, DImode, op1_hi, 1);
> >> +  emit_move_insn (eax, op1_lo);
> >> +  emit_move_insn (edx, op1_hi);
> >> +  emit_insn (fcode == IX86_BUILTIN_UMWAIT
> >> +? gen_umwait (op0, eax, edx)
> >> +: gen_tpause (op0, eax, edx));
> >> +
> >> +  /* Return current CF value.  */
> >> +  op3 = gen_rtx_REG (CCCmode, FLAGS_REG);
> >> +  target = gen_rtx_LTU (QImode, op3, const0_rtx);
> >> +
> >> +  return target;
> >>
> >> For the above code, please see how xsetbv expansion and patterns are
> >> handling their input operands. There should be two patterns, one for
> >> 32bit and the other for 64bit targets. The patterns will need to set
> >> FLAGS_REG, otherwise the test will be removed.
> >>
> >
> > I copied what is done for xsetbv expansion and most likely I found some bug 
> > in
> GCC.
> > The problem is that when I use 3 arguments and compile as 64bit
> > version upper part of rax is not cleared. It doesn't appear when I'm using 
> > 2 or 4
> function arguments.
> > Most likely error is caused by the fact that rdx is used both as an
> > input for function and argument in instruction.
> 
> There is no need to clear upper parts of 64bit register. As specified in the 
> ISA
> (and modelled with RTX pattern), the instruction (e.g.
> tpause) reads only lower 32 bits from %rax and %rdx. Implicitly, the 
> instruction
> should ignore upper 32 bits by itself, so we can use SUBREGs. If this is not 
> the
> case, we need to use DImode input arguments in RTX pattern and explicitly emit
> zero-extension insns to clear upper 32 bits of input arguments.
> 

Ok, I agree with you regarding clearing.

But there is still one thing bothering me as explained in last email. The 
problem appears when I use 3
arguments and compile as 64bit version. Assembly generated is different from 
when I'm adding extra unused
argument or removing one function argument not related to 

RE: [PATCH][i386] Adding CLDEMOTE instruction

2018-05-10 Thread Peryt, Sebastian
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Wednesday, May 9, 2018 1:53 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][i386] Adding CLDEMOTE instruction
> 
> On Tue, May 8, 2018 at 1:58 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > Sorry, forgot attachment.
> >
> > Sebastian
> >
> >
> > -Original Message-
> > From: Peryt, Sebastian
> > Sent: Tuesday, May 8, 2018 1:56 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Uros Bizjak <ubiz...@gmail.com>; Kirill Yukhin
> > <kirill.yuk...@gmail.com>; Peryt, Sebastian
> > <sebastian.pe...@intel.com>
> > Subject: [PATCH][i386] Adding CLDEMOTE instruction
> >
> > Hi,
> >
> > This patch adds support for CLDEMOTE instruction.
> >
> > Is it ok for trunk and after few day for backport to GCC-8?
> >
> > 2018-05-08  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/
> >
> > * common/config/i386/i386-common.c
> (OPTION_MASK_ISA_CLDEMOTE_SET,
> > OPTION_MASK_ISA_CLDEMOTE_UNSET): New defines.
> > (ix86_handle_option): Handle -mcldemote.
> > * config.gcc: New header.
> > * config/i386/cldemoteintrin.h: New file.
> > * config/i386/cpuid.h (bit_CLDEMOTE): New bit.
> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> > -mcldemote.
> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> > OPTION_MASK_ISA_CLDEMOTE.
> > * config/i386/i386.c (ix86_target_string): Added -mcldemote.
> > (ix86_valid_target_attribute_inner_p): Ditto.
> > (enum ix86_builtins): Added IX86_BUILTIN_CLDEMOTE.
> > (ix86_init_mmx_sse_builtins): Define __builtin_ia32_cldemote.
> > (ix86_expand_builtin): Expand IX86_BUILTIN_CLDEMOTE.
> > * config/i386/i386.h (TARGET_CLDEMOTE, TARGET_CLDEMOTE_P): New.
> > * config/i386/i386.md (UNSPECV_CLDEMOTE): New.
> > (cldemote): New.
> > * config/i386/i386.opt: Added -mcldemote.
> > * config/i386/x86intrin.h: New header.
> > * doc/invoke.texi: Added -mcldemote.
> >
> > 2018-05-08  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/testsuite/
> >
> > * gcc.target/i386/cldemote-1.c: New test.
> 
> OK for mainline.
> 
> is there a compelling reason why we want this new feature in gcc-8 release
> branch?
>

After some additional internal discussion I figured for now it's not required 
to backport it.
I'll backport it if/when it'll be required in the future.
 
> Thanks,
> Uros.

Thanks,
Sebastian


RE: [PATCH][i386] Adding WAITPKG instructions

2018-05-10 Thread Peryt, Sebastian
Hi Uros,

Updated patch attached, please find comments below.

> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Wednesday, May 9, 2018 1:47 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][i386] Adding WAITPKG instructions
> 
> On Tue, May 8, 2018 at 1:34 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > Hi,
> >
> > This patch adds support for WAITPKG instructions.
> >
> > Is it ok for trunk and after few day for backport to GCC-8?
> >
(Removed)
> >
> >
> 
> +case IX86_BUILTIN_UMONITOR:
> +  arg0 = CALL_EXPR_ARG (exp, 0);
> +  op0 = expand_normal (arg0);
> +  if (!REG_P (op0))
> +op0 = ix86_zero_extend_to_Pmode (op0);
> +
> +  emit_insn (ix86_gen_umonitor (op0));
> +  return 0;
> 
> Please see how movdir64b handles its address operand. Also, do not use global
> ix86_gen_monitor, just expand directly in the same way as movdir64b.
> 

Fixed.

> +case IX86_BUILTIN_UMWAIT:
> +case IX86_BUILTIN_TPAUSE:
> +  rtx eax, edx, op1_lo, op1_hi;
> +  arg0 = CALL_EXPR_ARG (exp, 0);
> +  arg1 = CALL_EXPR_ARG (exp, 1);
> +  op0 = expand_normal (arg0);
> +  op1 = expand_normal (arg1);
> +  eax = gen_rtx_REG (SImode, AX_REG);
> +  edx = gen_rtx_REG (SImode, DX_REG);
> +  if (!REG_P (op0))
> +op0 = copy_to_mode_reg (SImode, op0);
> +  if (!REG_P (op1))
> +op1 = copy_to_mode_reg (DImode, op1);
> +  op1_lo = gen_lowpart (SImode, op1);
> +  op1_hi = expand_shift (RSHIFT_EXPR, DImode, op1,
> + GET_MODE_BITSIZE (SImode), 0, 1);
> +  op1_hi = convert_modes (SImode, DImode, op1_hi, 1);
> +  emit_move_insn (eax, op1_lo);
> +  emit_move_insn (edx, op1_hi);
> +  emit_insn (fcode == IX86_BUILTIN_UMWAIT
> +? gen_umwait (op0, eax, edx)
> +: gen_tpause (op0, eax, edx));
> +
> +  /* Return current CF value.  */
> +  op3 = gen_rtx_REG (CCCmode, FLAGS_REG);
> +  target = gen_rtx_LTU (QImode, op3, const0_rtx);
> +
> +  return target;
> 
> For the above code, please see how xsetbv expansion and patterns are handling
> their input operands. There should be two patterns, one for 32bit and the 
> other
> for 64bit targets. The patterns will need to set FLAGS_REG, otherwise the test
> will be removed.
> 

I copied what is done for xsetbv expansion and most likely I found some bug in 
GCC.
The problem is that when I use 3 arguments and compile as 64bit version upper 
part
of rax is not cleared. It doesn't appear when I'm using 2 or 4 function 
arguments.
Most likely error is caused by the fact that rdx is used both as an input for 
function and
argument in instruction.

When using 3 operands:
bar:
.LFB5450:
.cfi_startproc
movq%rdx, %rax
umonitor%rdi
movq%rdx, %rcx
shrq$32, %rcx
movq%rcx, %rdx
umwait  %esi
setc%al
ret
.cfi_endproc

When using 4 operands:
bar:
.LFB5450:
.cfi_startproc
movl%edx, %esi
umonitor%rdi
movq%rcx, %rax
shrq$32, %rax
movq%rax, %rdx
movl%ecx, %eax
umwait  %esi
setc%al
ret
.cfi_endproc


Can you please suggest how to proceed here? I cannot open new PR without
adding this instruction first. Or maybe you know how to resolve it?

> +(define_insn "umwait"
> +  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
> + (use (match_operand:SI 1 "register_operand" "a"))
> + (use (match_operand:SI 2 "register_operand" "d"))]
> +UNSPECV_UMWAIT)]
> +  "TARGET_WAITPKG"
> +  "umwait\t{%0}"
> +  [(set_attr "length" "3")])
> 
> No need for "use" RTX here and in other patterns. You should also remove {}
> from insn template, otherwise there will be no operand printed in some asm
> dialect.
> 

Fixed.

> Uros.

Sebastian


0001-WAITPKG-v2.patch
Description: 0001-WAITPKG-v2.patch


RE: [PATCH 1/3] Add PTWRITE builtins for x86

2018-05-09 Thread Peryt, Sebastian
I have rebased this patch to the latest trunk and addressed comments. Also, 
there was a test in changelog,
but not in the patch itself - this has been added.

Is it ok for trunk and backport to GCC-8 after few days?

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PTWRITE_SET,
OPTION_MASK_ISA_PTWRITE_UNSET): New.
(ix86_handle_option): Handle OPT_mptwrite.
* config/i386/cpuid.h (bit_PTWRITE): Add.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PTWRITE CPUID.
* config/i386/i386-builtin.def (PTWRITE): Add PTWRITE.
* config/i386/i386-c.c (ix86_target_macros_internal):
Support __PTWRITE__.
* config/i386/i386.c (ix86_target_string): Add -mptwrite.
(ix86_valid_target_attribute_inner_p): Support ptwrite.
(ix86_init_mmx_sse_builtins): Add edges detection for ptwrites
generated by vartrace.
* config/i386/i386.h (TARGET_PTWRITE): Add.
(TARGET_PTWRITE_P): Add.
* config/i386/i386.md: Add ptwrite.
* config/i386/i386.opt: Add -mptwrite.
* config/i386/immintrin.h (target):
(_ptwrite64): Add.
(_ptwrite32): Add.
* doc/extend.texi: Document ptwrite builtins.
* doc/invoke.texi: Document -mptwrite.

gcc/testsuite/

* gcc.target/i386/ptwrite-1.c: New test.

Sebastian


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Andi Kleen
> Sent: Monday, February 12, 2018 3:53 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Metzger, Markus T ; ubiz...@gmail.com;
> Andi Kleen 
> Subject: [PATCH 1/3] Add PTWRITE builtins for x86
> 
> From: Andi Kleen 
> 
> Add builtins/intrinsics for PTWRITE. PTWRITE is a new instruction on Intel 
> Cherry
> Trail that allows to write values into the Processor Trace log.
> 
> This is fairly straight forward, except I had to add isa2 support for variable
> number of operands.
> 
> gcc/:
> 
> 2018-02-10  Andi Kleen  
> 
>   * common/config/i386/i386-common.c
> (OPTION_MASK_ISA_PTWRITE_SET):
>   (OPTION_MASK_ISA_PTWRITE_UNSET): New.
>   (ix86_handle_option): Handle OPT_mptwrite.
>   * config/i386/cpuid.h (bit_PTWRITE): Add.
>   * config/i386/driver-i386.c (host_detect_local_cpu): Detect
>   PTWRITE CPUID.
>   * config/i386/i386-builtin.def (PTWRITE): Add PTWRITE.
>   * config/i386/i386-c.c (ix86_target_macros_internal):
>   Support __PTWRITE__.
>   * config/i386/i386.c (ix86_target_string): Add -mptwrite.
>   (ix86_valid_target_attribute_inner_p): Support ptwrite.
>   (BDESC_VERIFYS): Verify SPECIAL_ARGS2.
>   (ix86_init_mmx_sse_builtins): Handle special args2.
>   * config/i386/i386.h (TARGET_PTWRITE): Add.
>   (TARGET_PTWRITE_P): Add.
>   * config/i386/i386.md: Add ptwrite.
>   * config/i386/i386.opt: Add -mptwrite.
>   * config/i386/immintrin.h (target):
>   (_ptwrite_u64): Add.
>   (_ptwrite_u32): Add.
>   * doc/extend.texi: Document ptwrite builtins.
>   * doc/invoke.texi: Document -mptwrite.
> 
> gcc/testsuite/:
> 
> 2018-02-10  Andi Kleen  
> 
>   * gcc.target/i386/ptwrite1.c: New test.
>   * gcc.target/i386/ptwrite2.c: New test.


0001-PTWRITE-intrinsics.patch
Description: 0001-PTWRITE-intrinsics.patch


RE: [PATCH][i386] Adding CLDEMOTE instruction

2018-05-08 Thread Peryt, Sebastian
Sorry, forgot attachment.

Sebastian


-Original Message-
From: Peryt, Sebastian 
Sent: Tuesday, May 8, 2018 1:56 PM
To: gcc-patches@gcc.gnu.org
Cc: Uros Bizjak <ubiz...@gmail.com>; Kirill Yukhin <kirill.yuk...@gmail.com>; 
Peryt, Sebastian <sebastian.pe...@intel.com>
Subject: [PATCH][i386] Adding CLDEMOTE instruction

Hi,

This patch adds support for CLDEMOTE instruction.

Is it ok for trunk and after few day for backport to GCC-8?

2018-05-08  Sebastian Peryt  <sebastian.pe...@intel.com>

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_CLDEMOTE_SET,
OPTION_MASK_ISA_CLDEMOTE_UNSET): New defines.
(ix86_handle_option): Handle -mcldemote.
* config.gcc: New header.
* config/i386/cldemoteintrin.h: New file.
* config/i386/cpuid.h (bit_CLDEMOTE): New bit.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
-mcldemote.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_CLDEMOTE.
* config/i386/i386.c (ix86_target_string): Added -mcldemote.
(ix86_valid_target_attribute_inner_p): Ditto.
(enum ix86_builtins): Added IX86_BUILTIN_CLDEMOTE.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_cldemote.
(ix86_expand_builtin): Expand IX86_BUILTIN_CLDEMOTE.
* config/i386/i386.h (TARGET_CLDEMOTE, TARGET_CLDEMOTE_P): New.
* config/i386/i386.md (UNSPECV_CLDEMOTE): New.
(cldemote): New.
* config/i386/i386.opt: Added -mcldemote.
* config/i386/x86intrin.h: New header.
* doc/invoke.texi: Added -mcldemote.

2018-05-08  Sebastian Peryt  <sebastian.pe...@intel.com>

gcc/testsuite/

* gcc.target/i386/cldemote-1.c: New test.

Thanks,
Sebastian


0002-CLDEMOTE.PATCH
Description: 0002-CLDEMOTE.PATCH


[PATCH][i386] Adding CLDEMOTE instruction

2018-05-08 Thread Peryt, Sebastian
Hi,

This patch adds support for CLDEMOTE instruction.

Is it ok for trunk and after few day for backport to GCC-8?

2018-05-08  Sebastian Peryt  

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_CLDEMOTE_SET,
OPTION_MASK_ISA_CLDEMOTE_UNSET): New defines.
(ix86_handle_option): Handle -mcldemote.
* config.gcc: New header.
* config/i386/cldemoteintrin.h: New file.
* config/i386/cpuid.h (bit_CLDEMOTE): New bit.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
-mcldemote.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_CLDEMOTE.
* config/i386/i386.c (ix86_target_string): Added -mcldemote.
(ix86_valid_target_attribute_inner_p): Ditto.
(enum ix86_builtins): Added IX86_BUILTIN_CLDEMOTE.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_cldemote.
(ix86_expand_builtin): Expand IX86_BUILTIN_CLDEMOTE.
* config/i386/i386.h (TARGET_CLDEMOTE, TARGET_CLDEMOTE_P): New.
* config/i386/i386.md (UNSPECV_CLDEMOTE): New.
(cldemote): New.
* config/i386/i386.opt: Added -mcldemote.
* config/i386/x86intrin.h: New header.
* doc/invoke.texi: Added -mcldemote.

2018-05-08  Sebastian Peryt  

gcc/testsuite/

* gcc.target/i386/cldemote-1.c: New test.

Thanks,
Sebastian


[PATCH][i386] Adding WAITPKG instructions

2018-05-08 Thread Peryt, Sebastian
Hi,

This patch adds support for WAITPKG instructions.

Is it ok for trunk and after few day for backport to GCC-8?

2018-05-08  Sebastian Peryt  

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_WAITPKG_SET,
OPTION_MASK_ISA_WAITPKG_UNSET): New defines.
(ix86_handle_option): Handle -mwaitpkg.
* config.gcc: New header.
* config/i386/cpuid.h (bit_WAITPKG): New bit.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect -mwaitpkg.
* config/i386/i386-builtin-types.def ((UINT8, UNSIGNED, UINT64)): New
function type.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_WAITPKG
* config/i386/i386.c (ix86_target_string): Added -mwaitpkg.
(ix86_option_override_internal): Added PTA_WAITPKG.
(ix86_valid_target_attribute_inner_p): Added -mwaitpkg.
(enum ix86_builtins): Added IX86_BUILTIN_UMONITOR, IX86_BUILTIN_UMWAIT,
IX86_BUILTIN_TPAUSE.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_umonitor,
__builtin_ia32_umwait and __builtin_ia32_tpause.
(ix86_expand_builtin):Expand  IX86_BUILTIN_UMONITOR,
IX86_BUILTIN_UMWAIT, IX86_BUILTIN_TPAUSE.
* config/i386/i386.h (TARGET_WAITPKG, TARGET_WAITPKG_P): New.
* config/i386/i386.opt: Added -mwaitpkg.
* config/i386/sse.md (UNSPECV_UMWAIT, UNSPECV_UMONITOR,
UNSPECV_TPAUSE): New.
(umwait, umonitor_, tpause): New.
* config/i386/waitpkgintrin.h: New file.
* config/i386/x86intrin.h: New header.
* doc/invoke.texi: Added -mwaitpkg.

2018-05-08  Sebastian Peryt  

gcc/testsuite/

* gcc.target/i386/tpause-1.c: New test.
* gcc.target/i386/umonitor-1.c: New test.

Thanks,
Sebastian




0001-WAITPKG.patch
Description: 0001-WAITPKG.patch


RE: [PATCH][i386] PR target/85473, Fix _movdir64b expansion with -mx32

2018-04-25 Thread Peryt, Sebastian
Hi,

Patch has been updated and tested. Now I don't see any new regressions.

Changelog stays the same.

Is it ok for trunk?

Thanks,
Sebastian


> -Original Message-
> From: Peryt, Sebastian
> Sent: Saturday, April 21, 2018 5:36 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Uros Bizjak <ubiz...@gmail.com>; Kirill Yukhin <kirill.yuk...@gmail.com>;
> H.J. Lu <hjl.to...@gmail.com>; Peryt, Sebastian <sebastian.pe...@intel.com>
> Subject: RE: [PATCH][i386] PR target/85473, Fix _movdir64b expansion with -
> mx32
> 
> Hi,
> 
> I just realized this patch introduces some new regressions.
> 
> Sorry, I must have mixed up something in testing. Will update this patch 
> shortly.
> 
> Sebastian
> 
> > -Original Message-
> > From: Peryt, Sebastian
> > Sent: Friday, April 20, 2018 6:38 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Uros Bizjak <ubiz...@gmail.com>; Kirill Yukhin
> > <kirill.yuk...@gmail.com>; H.J. Lu <hjl.to...@gmail.com>; Peryt,
> > Sebastian <sebastian.pe...@intel.com>
> > Subject: [PATCH][i386] PR target/85473, Fix _movdir64b expansion with
> > -mx32
> >
> > Hi,
> >
> > This fixes PR85473 by fixing _movdir64b expansion for -mx32.
> >
> > Ok for trunk?
> >
> > 2018-04-20  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/ChangeLog:
> >
> > PR target/85473
> > * config/i386/i386.c (ix86_expand_builtin): Change memory
> > operand to XI, op0 extend to Pmode.
> > * config/i386/i386.md: Change unspec volatile and operand 1
> > mode to XI, change operand 0 mode to P
> >
> > 2018-04-20  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/85473
> > * gcc.target/i386/pr85473-1.c: New test.
> > * gcc.target/i386/pr85473-2.c: New test.
> >
> > Sebastian
> >



0001-PR85473-fix-v2.patch
Description: 0001-PR85473-fix-v2.patch


RE: [PATCH][i386] PR target/85473, Fix _movdir64b expansion with -mx32

2018-04-21 Thread Peryt, Sebastian
Hi,

I just realized this patch introduces some new regressions.

Sorry, I must have mixed up something in testing. Will update this patch 
shortly.

Sebastian

> -Original Message-
> From: Peryt, Sebastian
> Sent: Friday, April 20, 2018 6:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Uros Bizjak <ubiz...@gmail.com>; Kirill Yukhin <kirill.yuk...@gmail.com>;
> H.J. Lu <hjl.to...@gmail.com>; Peryt, Sebastian <sebastian.pe...@intel.com>
> Subject: [PATCH][i386] PR target/85473, Fix _movdir64b expansion with -mx32
> 
> Hi,
> 
> This fixes PR85473 by fixing _movdir64b expansion for -mx32.
> 
> Ok for trunk?
> 
> 2018-04-20  Sebastian Peryt  <sebastian.pe...@intel.com>
> 
> gcc/ChangeLog:
> 
>   PR target/85473
>   * config/i386/i386.c (ix86_expand_builtin): Change memory
>   operand to XI, op0 extend to Pmode.
>   * config/i386/i386.md: Change unspec volatile and operand 1
>   mode to XI, change operand 0 mode to P
> 
> 2018-04-20  Sebastian Peryt  <sebastian.pe...@intel.com>
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/85473
>   * gcc.target/i386/pr85473-1.c: New test.
>   * gcc.target/i386/pr85473-2.c: New test.
> 
> Sebastian
> 



[PATCH][i386] PR target/85473, Fix _movdir64b expansion with -mx32

2018-04-20 Thread Peryt, Sebastian
Hi,

This fixes PR85473 by fixing _movdir64b expansion for -mx32.

Ok for trunk?

2018-04-20  Sebastian Peryt  

gcc/ChangeLog:

PR target/85473
* config/i386/i386.c (ix86_expand_builtin): Change memory
operand to XI, op0 extend to Pmode.
* config/i386/i386.md: Change unspec volatile and operand 1
mode to XI, change operand 0 mode to P

2018-04-20  Sebastian Peryt  

gcc/testsuite/ChangeLog:

PR target/85473
* gcc.target/i386/pr85473-1.c: New test.
* gcc.target/i386/pr85473-2.c: New test.

Sebastian




fix-PR85473.patch
Description: fix-PR85473.patch


RE: [PATCH][i386] Adding MOVDIRI and MOVDIR64B instructions

2018-04-19 Thread Peryt, Sebastian
> On Thu, Apr 19, 2018 at 3:11 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> >> On Thu, Apr 19, 2018 at 2:35 PM, Peryt, Sebastian
> >> <sebastian.pe...@intel.com>
> >> wrote:
> >> >> On Wed, Apr 18, 2018 at 2:56 PM, Peryt, Sebastian
> >> >> <sebastian.pe...@intel.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This patch enables new instructions - MOVDIRI and MOVDIR64B.
> >> >> >
> >> >> > Is it ok for trunk?
> >> >>
> >> >> Is there a reason that one flag goes to ix86_isa_flags and the
> >> >> other to ix86_isa_flags2?
> >> >
> >> > This is because of usage of OPTION_MASK_ISA_MOVDIRI |
> >> > OPTION_MASK_ISA_64BIT which would end up in different isa flags
> >> > tables. And MOVDIR64B doesn't use this option, so it can be in
> ix86_isa_flags2.
> >>
> >> Ah, indeed.
> >>
> >> The patch is OK for mainline then.
> >
> > Thanks!
> >
> >>
> >> (Please note that until gcc-8 is branched, patches that add new
> >> features won't be approved as we are nearing the release.)
> >
> > Can you please explain what this actually mean? I got confused. Also
> > I'd like to mention that I have few more patches I'm going to send soon.
> >
> > That this mean I can merge this one in trunk, but there is no guarantee it 
> > will
> be added into GCC-8?
> 
> No, the patch will be included in gcc-8, as gcc-8 has not yet branched from 
> the
> trunk. But since branch date is approaching, we don't want to destabilize 
> trunk
> by accepting patches that introduce new features, so they will have to be
> postponed and committed to the trunk after gcc-8 is branched.
> 
> Uros.

Ok, thank you. That explains it now.

Sebastian


RE: [PATCH][i386] Adding MOVDIRI and MOVDIR64B instructions

2018-04-19 Thread Peryt, Sebastian
> On Thu, Apr 19, 2018 at 2:35 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> >> On Wed, Apr 18, 2018 at 2:56 PM, Peryt, Sebastian
> >> <sebastian.pe...@intel.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > This patch enables new instructions - MOVDIRI and MOVDIR64B.
> >> >
> >> > Is it ok for trunk?
> >>
> >> Is there a reason that one flag goes to ix86_isa_flags and the other
> >> to ix86_isa_flags2?
> >
> > This is because of usage of OPTION_MASK_ISA_MOVDIRI |
> > OPTION_MASK_ISA_64BIT which would end up in different isa flags
> > tables. And MOVDIR64B doesn't use this option, so it can be in 
> > ix86_isa_flags2.
> 
> Ah, indeed.
> 
> The patch is OK for mainline then.

Thanks!

> 
> (Please note that until gcc-8 is branched, patches that add new features won't
> be approved as we are nearing the release.)

Can you please explain what this actually mean? I got confused. Also I'd like 
to 
mention that I have few more patches I'm going to send soon.

That this mean I can merge this one in trunk, but there is no guarantee it will 
be added into GCC-8?

Thanks,
Sebastian

> 
> Thanks,
> Uros.
> 
> > Sebastian
> >
> >>
> >> Uros.
> >>
> >> > 2018-04-18  Sebastian Peryt  <sebastian.pe...@intel.com>
> >> >
> >> > gcc/
> >> >
> >> > * common/config/i386/i386-common.c
> >> > (OPTION_MASK_ISA_MOVDIRI_SET,
> >> OPTION_MASK_ISA_MOVDIR64B_SET,
> >> > OPTION_MASK_ISA_MOVDIRI_UNSET,
> >> > OPTION_MASK_ISA_MOVDIR64B_UNSET): New defines.
> >> > (ix86_handle_option): Handle -mmovdiri and -mmovdir64b.
> >> > * config.gcc (movdirintrin.h): New header.
> >> > * config/i386/cpuid.h (bit_MOVDIRI,
> >> > bit_MOVDIR64B): New bits.
> >> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect 
> >> > -mmovdiri
> >> > and -mmvodir64b.
> >> > * config/i386/i386-builtin-types.def ((VOID, PUNSIGNED, 
> >> > UNSIGNED),
> >> > (VOID, PVOID, PCVOID)): New function types.
> >> > * config/i386/i386-builtin.def (__builtin_ia32_directstoreu_u32,
> >> > __builtin_ia32_directstoreu_u64, __builtin_ia32_movdir64b):
> >> > New
> >> builtins.
> >> > * config/i386/i386-c.c (__MOVDIRI__, __MOVDIR64B__): New.
> >> > * config/i386/i386.c (ix86_target_string): Added
> >> > -mmovdir64b and -
> >> mmovdiri.
> >> > (ix86_valid_target_attribute_inner_p): Ditto.
> >> > (ix86_expand_special_args_builtin):  Added
> >> VOID_FTYPE_PUNSIGNED_UNSIGNED
> >> > and VOID_FTYPE_PUNSIGNED_UNSIGNED.
> >> > (ix86_expand_builtin): Expand IX86_BUILTIN_MOVDIR64B.
> >> > * config/i386/i386.h (TARGET_MOVDIRI, TARGET_MOVDIRI_P,
> >> > TARGET_MOVDIR64B, TARGET_MOVDIR64B_P): New.
> >> > * config/i386/i386.md (UNSPECV_MOVDIRI, UNSPECV_MOVDIR64B):
> >> New.
> >> > (movdiri, movdir64b_): New.
> >> > * config/i386/i386.opt: Add -mmovdiri and -mmovdir64b.
> >> > * config/i386/immintrin.h: Include movdirintrin.h.
> >> > * config/i386/movdirintrin.h: New file.
> >> > * doc/invoke.texi: Added -mmovdiri and -mmovdir64b.
> >> >
> >> > 2018-04-18  Sebastian Peryt  <sebastian.pe...@intel.com>
> >> >
> >> > gcc/testsuite/
> >> >
> >> > * gcc.target/i386/movdir-1.c: New test.
> >> >
> >> >
> >> > Thanks,
> >> > Sebastian


RE: [PATCH][i386] Adding MOVDIRI and MOVDIR64B instructions

2018-04-19 Thread Peryt, Sebastian
> On Wed, Apr 18, 2018 at 2:56 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > Hi,
> >
> > This patch enables new instructions - MOVDIRI and MOVDIR64B.
> >
> > Is it ok for trunk?
> 
> Is there a reason that one flag goes to ix86_isa_flags and the other to
> ix86_isa_flags2?

This is because of usage of OPTION_MASK_ISA_MOVDIRI | OPTION_MASK_ISA_64BIT 
which would end up in different isa flags tables. And MOVDIR64B doesn't use 
this option,
so it can be in ix86_isa_flags2.

Sebastian

> 
> Uros.
> 
> > 2018-04-18  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/
> >
> > * common/config/i386/i386-common.c
> > (OPTION_MASK_ISA_MOVDIRI_SET,
> OPTION_MASK_ISA_MOVDIR64B_SET,
> > OPTION_MASK_ISA_MOVDIRI_UNSET,
> > OPTION_MASK_ISA_MOVDIR64B_UNSET): New defines.
> > (ix86_handle_option): Handle -mmovdiri and -mmovdir64b.
> > * config.gcc (movdirintrin.h): New header.
> > * config/i386/cpuid.h (bit_MOVDIRI,
> > bit_MOVDIR64B): New bits.
> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect 
> > -mmovdiri
> > and -mmvodir64b.
> > * config/i386/i386-builtin-types.def ((VOID, PUNSIGNED, UNSIGNED),
> > (VOID, PVOID, PCVOID)): New function types.
> > * config/i386/i386-builtin.def (__builtin_ia32_directstoreu_u32,
> > __builtin_ia32_directstoreu_u64, __builtin_ia32_movdir64b): New
> builtins.
> > * config/i386/i386-c.c (__MOVDIRI__, __MOVDIR64B__): New.
> > * config/i386/i386.c (ix86_target_string): Added -mmovdir64b and -
> mmovdiri.
> > (ix86_valid_target_attribute_inner_p): Ditto.
> > (ix86_expand_special_args_builtin):  Added
> VOID_FTYPE_PUNSIGNED_UNSIGNED
> > and VOID_FTYPE_PUNSIGNED_UNSIGNED.
> > (ix86_expand_builtin): Expand IX86_BUILTIN_MOVDIR64B.
> > * config/i386/i386.h (TARGET_MOVDIRI, TARGET_MOVDIRI_P,
> > TARGET_MOVDIR64B, TARGET_MOVDIR64B_P): New.
> > * config/i386/i386.md (UNSPECV_MOVDIRI, UNSPECV_MOVDIR64B):
> New.
> > (movdiri, movdir64b_): New.
> > * config/i386/i386.opt: Add -mmovdiri and -mmovdir64b.
> > * config/i386/immintrin.h: Include movdirintrin.h.
> > * config/i386/movdirintrin.h: New file.
> > * doc/invoke.texi: Added -mmovdiri and -mmovdir64b.
> >
> > 2018-04-18  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/testsuite/
> >
> > * gcc.target/i386/movdir-1.c: New test.
> >
> >
> > Thanks,
> > Sebastian


[PATCH][i386] Adding MOVDIRI and MOVDIR64B instructions

2018-04-18 Thread Peryt, Sebastian
Hi,

This patch enables new instructions - MOVDIRI and MOVDIR64B.

Is it ok for trunk?


2018-04-18  Sebastian Peryt  

gcc/

* common/config/i386/i386-common.c 
(OPTION_MASK_ISA_MOVDIRI_SET, OPTION_MASK_ISA_MOVDIR64B_SET,
OPTION_MASK_ISA_MOVDIRI_UNSET,
OPTION_MASK_ISA_MOVDIR64B_UNSET): New defines.
(ix86_handle_option): Handle -mmovdiri and -mmovdir64b.
* config.gcc (movdirintrin.h): New header.
* config/i386/cpuid.h (bit_MOVDIRI,
bit_MOVDIR64B): New bits.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect -mmovdiri
and -mmvodir64b.
* config/i386/i386-builtin-types.def ((VOID, PUNSIGNED, UNSIGNED),
(VOID, PVOID, PCVOID)): New function types.
* config/i386/i386-builtin.def (__builtin_ia32_directstoreu_u32,
__builtin_ia32_directstoreu_u64, __builtin_ia32_movdir64b): New 
builtins.
* config/i386/i386-c.c (__MOVDIRI__, __MOVDIR64B__): New.
* config/i386/i386.c (ix86_target_string): Added -mmovdir64b and 
-mmovdiri.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_expand_special_args_builtin):  Added VOID_FTYPE_PUNSIGNED_UNSIGNED
and VOID_FTYPE_PUNSIGNED_UNSIGNED.
(ix86_expand_builtin): Expand IX86_BUILTIN_MOVDIR64B.
* config/i386/i386.h (TARGET_MOVDIRI, TARGET_MOVDIRI_P,
TARGET_MOVDIR64B, TARGET_MOVDIR64B_P): New.
* config/i386/i386.md (UNSPECV_MOVDIRI, UNSPECV_MOVDIR64B): New.
(movdiri, movdir64b_): New.
* config/i386/i386.opt: Add -mmovdiri and -mmovdir64b.
* config/i386/immintrin.h: Include movdirintrin.h.
* config/i386/movdirintrin.h: New file.
* doc/invoke.texi: Added -mmovdiri and -mmovdir64b.

2018-04-18  Sebastian Peryt  

gcc/testsuite/

* gcc.target/i386/movdir-1.c: New test.


Thanks,
Sebastian


0001-MOVDIRI.PATCH
Description: 0001-MOVDIRI.PATCH


RE: [PATCH][i386,AVX] Fix PR84783 - backport missing permutexvar to GCC7

2018-03-27 Thread Peryt, Sebastian
Hi Jakub,

Gentle ping.

Thanks,
Sebastian

> -Original Message-
> From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com]
> Sent: Friday, March 23, 2018 6:49 AM
> To: ja...@redhat.com; Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: 'gcc-patches@gcc.gnu.org' <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH][i386,AVX] Fix PR84783 - backport missing permutexvar to
> GCC7
> 
> Hello Sebastian!
> 
> On 22 мар 13:01, Peryt, Sebastian wrote:
> > Hi,
> >
> > This patch adds missing permutexvar intrinsics for backporting to GCC 7 to
> resolve PR84783.
> >
> > 2018-03-22  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc:
> > PR84783
> > * config/i386/avx512vlintrin.h (_mm256_permutexvar_epi64)
> > (_mm256_permutexvar_epi32, _mm256_permutex_epi64): New
> intrinsics.
> >
> > gcc/testsuite:
> > PR84783
> >
> > * gcc.target/i386/avx512vl-vpermd-1.c (_mm256_permutexvar_epi32):
> > Test new intrinsic.
> > * gcc.target/i386/avx512vl-vpermq-imm-1.c
> (_mm256_permutex_epi64):
> > Ditto.
> > * gcc.target/i386/avx512vl-vpermq-var-1.c
> (_mm256_permutexvar_epi64):
> > Ditto.
> > * gcc.target/i386/avx512f-vpermd-2.c: Do not check for AVX512F_LEN.
> > * gcc.target/i386/avx512f-vpermq-imm-2.c: Ditto.
> > * gcc.target/i386/avx512f-vpermq-var-2.c: Ditto.
> >
> > Is it ok for merge?
> Your patch is pretty much simple and is OK to me.
> 
> However, since you're aiming to GCC 7, I'd like to here GM's OK here as well.
> 
> --
> Thanks, K
> 
> >
> > Thanks,
> > Sebastian
> 



[PATCH][i386,AVX] Fix PR84783 - backport missing permutexvar to GCC7

2018-03-22 Thread Peryt, Sebastian
Hi,

This patch adds missing permutexvar intrinsics for backporting to GCC 7 to 
resolve PR84783.

2018-03-22  Sebastian Peryt  

gcc:
PR84783
* config/i386/avx512vlintrin.h (_mm256_permutexvar_epi64)
(_mm256_permutexvar_epi32, _mm256_permutex_epi64): New intrinsics.

gcc/testsuite:
PR84783

* gcc.target/i386/avx512vl-vpermd-1.c (_mm256_permutexvar_epi32):
Test new intrinsic.
* gcc.target/i386/avx512vl-vpermq-imm-1.c (_mm256_permutex_epi64):
Ditto.
* gcc.target/i386/avx512vl-vpermq-var-1.c (_mm256_permutexvar_epi64):
Ditto.
* gcc.target/i386/avx512f-vpermd-2.c: Do not check for AVX512F_LEN.
* gcc.target/i386/avx512f-vpermq-imm-2.c: Ditto.
* gcc.target/i386/avx512f-vpermq-var-2.c: Ditto.

Is it ok for merge?

Thanks,
Sebastian


PR84783.patch
Description: PR84783.patch


[PATCH][x86] Fix PR84460

2018-02-19 Thread Peryt, Sebastian
Hi,

This is fix for PR84460.

gcc/testsuite
   PR target/84460
   * gcc.target/i386/pr57193.c (dg-options): Add -mtune=generic.

Is it ok for trunk?

Thanks,
Sebastian




PR84460.patch
Description: PR84460.patch


[PATCH][i386] Fix PR83546 - missing RDRND for -march=silvermont

2018-01-15 Thread Peryt, Sebastian
Hi,

This patch re-enables RDRND for Silvermont. It got lost in r206178 as pointed 
out in PR.
Bootstraped and tested.

2018-01-15  Sebastian Peryt  

gcc/

PR target/83546
* config/i386/i386.c (ix86_option_override_internal): Add PTA_RDRND
to PTA_SILVERMONT.

2018-01-15  Sebastian Peryt  

gcc/testsuite/

PR target/83546
* gcc.target/i386/pr83546.c: New test.

Is it ok for trunk?

Sebastian


0001-PR83546.patch
Description: 0001-PR83546.patch


[Patch][x86, backport] Backport to GCC-6 vzeroupper patches

2017-11-29 Thread Peryt, Sebastian
Hi,

I'd like to ask for backporting to GCC-6 branch vzeroupper generation patches 
from trunk,
that are resolving 3 PRs:
PR target/82941
PR target/82942
PR target/82990

Two patches were combined into one and rebased. Bootstraped and tested.
Is it ok for merge?

Changelog:

Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction
should be inserted before a transfer of control flow out of the function.
It is turned on by default unless we are tuning for KNL.  Users can always
use -mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/
Bakcported from trunk
PR target/82941
PR target/82942
PR target/82990
* config/i386/i386.c (pass_insert_vzeroupper): Remove
TARGET_AVX512F check from gate condition.
(ix86_check_avx256_register): Changed to ...
(ix86_check_avx_upper_register): ... this. Add extra check for
VALID_AVX512F_REG_OR_XI_MODE.
(ix86_avx_u128_mode_needed): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_check_avx256_stores): Changed to ...
(ix86_check_avx_upper_stores): ... this. Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_after): Changed
avx_reg256_found to avx_upper_reg_found. Changed
ix86_check_avx256_stores to ix86_check_avx_upper_stores.
(ix86_avx_u128_mode_entry): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_exit): Ditto.
(ix86_option_override_internal): Set MASK_VZEROUPPER if
neither -mzeroupper nor -mno-zeroupper is used and
TARGET_EMIT_VZEROUPPER is set.
* config/i386/i386.h: (host_detect_local_cpu): New define.
(TARGET_EMIT_VZEROUPPER): New.
* config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/testsuite/
Backported from trunk
PR target/82941
PR target/82942
PR target/82990
* gcc.target/i386/pr82941-1.c: New test.
* gcc.target/i386/pr82941-2.c: Likewise.
* gcc.target/i386/pr82942-1.c: Likewise.
* gcc.target/i386/pr82942-2.c: Likewise.
* gcc.target/i386/pr82990-1.c: Likewise.
* gcc.target/i386/pr82990-2.c: Likewise.
* gcc.target/i386/pr82990-3.c: Likewise.
* gcc.target/i386/pr82990-4.c: Likewise.
* gcc.target/i386/pr82990-5.c: Likewise.
* gcc.target/i386/pr82990-6.c: Likewise.
* gcc.target/i386/pr82990-7.c: Likewise.

Thanks,
Sebastian


0001-backportPR82941-GCC-6.patch
Description: 0001-backportPR82941-GCC-6.patch


[Patch][x86, backport] Backport to GCC-7 vzeroupper patches

2017-11-29 Thread Peryt, Sebastian
Hi,

I'd like to ask for backporting to GCC-7 branch vzeroupper generation patches 
from trunk,
that are resolving 3 PRs:
PR target/82941
PR target/82942
PR target/82990

Two patches were combined into one and rebased. Bootstraped and tested.
Is it ok for merge?

Changelog:

Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction should
be inserted before a transfer of control flow out of the function.  It is
turned on by default unless we are tuning for KNL.  Users can always use
-mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/
Bakcported from trunk
PR target/82941
PR target/82942
PR target/82990
* config/i386/i386.c (pass_insert_vzeroupper): Remove
TARGET_AVX512F check from gate condition.
(ix86_check_avx256_register): Changed to ...
(ix86_check_avx_upper_register): ... this. Add extra check for
VALID_AVX512F_REG_OR_XI_MODE.
(ix86_avx_u128_mode_needed): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_check_avx256_stores): Changed to ...
(ix86_check_avx_upper_stores): ... this. Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_after): Changed
avx_reg256_found to avx_upper_reg_found. Changed
ix86_check_avx256_stores to ix86_check_avx_upper_stores.
(ix86_avx_u128_mode_entry): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_exit): Ditto.
(ix86_option_override_internal): Set MASK_VZEROUPPER if
neither -mzeroupper nor -mno-zeroupper is used and
TARGET_EMIT_VZEROUPPER is set.
* config/i386/i386.h: (host_detect_local_cpu): New define.
(TARGET_EMIT_VZEROUPPER): New.
* config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/testsuite/
Backported from trunk
PR target/82941
PR target/82942
PR target/82990
* gcc.target/i386/pr82941-1.c: New test.
* gcc.target/i386/pr82941-2.c: Likewise.
* gcc.target/i386/pr82942-1.c: Likewise.
* gcc.target/i386/pr82942-2.c: Likewise.
* gcc.target/i386/pr82990-1.c: Likewise.
* gcc.target/i386/pr82990-2.c: Likewise.
* gcc.target/i386/pr82990-3.c: Likewise.
* gcc.target/i386/pr82990-4.c: Likewise.
* gcc.target/i386/pr82990-5.c: Likewise.
* gcc.target/i386/pr82990-6.c: Likewise.
* gcc.target/i386/pr82990-7.c: Likewise.


Thanks,
Sebastian


0001-backportPR82942-GCC-7.patch
Description: 0001-backportPR82942-GCC-7.patch


RE: [PATCH, committed] Add myself to MAINTAINERS

2017-11-16 Thread Peryt, Sebastian
Message didn't get thru for some reason. Resending.

Sebastian

From: Peryt, Sebastian 
Sent: Wednesday, November 15, 2017 1:44 PM
To: gcc-patches@gcc.gnu.org
Cc: Peryt, Sebastian <sebastian.pe...@intel.com>
Subject: [PATCH, committed] Add myself to MAINTAINERS

ChangeLog:

2017-11-15  Sebastian Peryt  <sebastian.pe...@intel.com>

    * MAINTAINERS (write after approval): Add myself.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 254760)
+++ MAINTAINERS (working copy)
@@ -532,6 +532,7 @@
Devang Patel   <dpa...@apple.com>
Andris Pavenis <andris.pave...@iki.fi>
Fernando Pereira   <prone...@gmail.com>
+Sebastian Peryt    
<sebastian.pe...@intel.com>
Kaushik Phatak <kaushik.pha...@kpitcummins.com>
Nicolas Pitre  <n...@cam.org>
Paul Pluzhnikov    <ppluzhni...@google.com>

Sebastian



RE: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread Peryt, Sebastian
Attached is fixed patch.

Sebastian


> -Original Message-
> From: H.J. Lu [mailto:hjl.to...@gmail.com]
> Sent: Tuesday, November 14, 2017 1:18 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: Jakub Jelinek <ja...@redhat.com>; gcc-patches@gcc.gnu.org; Uros Bizjak
> <ubiz...@gmail.com>; Kirill Yukhin <kirill.yuk...@gmail.com>; Lu, Hongjiu
> <hongjiu...@intel.com>
> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
> for SKX
> 
> On Tue, Nov 14, 2017 at 3:18 AM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > I have updated tests and changelog according to Jakub's suggestions.
> > Please find attached v2 of my patch.
> >
> >
> > 14.11.2017  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/
> >
> > PR target/82941
> > PR target/82942
> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> > to return true on Xeon and not on Xeon Phi.
> > (ix86_check_avx256_register): Changed to ...
> > (ix86_check_avx_upper_register): ... this. Add extra check for
> > VALID_AVX512F_REG_OR_XI_MODE.
> > (ix86_avx_u128_mode_needed): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_check_avx256_stores): Changed to ...
> > (ix86_check_avx_upper_stores): ... this. Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_avx_u128_mode_after): Changed
> > avx_reg256_found to avx_upper_reg_found. Changed
> > ix86_check_avx256_stores to ix86_check_avx_upper_stores.
> > (ix86_avx_u128_mode_entry): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_avx_u128_mode_exit): Ditto.
> > * config/i386/i386.h: (host_detect_local_cpu): New define.
> 
> @@ -2497,7 +2497,7 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return TARGET_AVX && !TARGET_AVX512F
> +  return TARGET_AVX && !TARGET_AVX512PF && !TARGET_AVX512ER
> ^  Please 
> remove  this.
> 
> From glibc commit:
> 
> commit 4cb334c4d6249686653137ec273d081371b3672d
> Author: H.J. Lu <hjl.to...@gmail.com>
> Date:   Tue Apr 18 14:01:45 2017 -0700
> 
> x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
> 
> On Skylake server, AVX512 load/store instructions in memcpy/memset may
> lead to lower CPU turbo frequency in certain situations.  Use of AVX2
> in memcpy/memset has been observed to have improved overall performance
> in many workloads due to the higher frequency.
> 
> Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
> if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
> used on Skylake server.
> 
> Only AVX512ER is really unique to Xeon Phi.
> 
>&& TARGET_VZEROUPPER && flag_expensive_optimizations
>&& !optimize_size;
>  }
> 
> > 14.11.2017  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/testsuite/
> >
> > PR target/82941
> > PR target/82942
> > * gcc.target/i386/pr82941-1.c: New test.
> > * gcc.target/i386/pr82941-2.c: New test.
> > * gcc.target/i386/pr82942-1.c: New test.
> > * gcc.target/i386/pr82942-2.c: New test.
> >
> >
> > Thanks,
> > Sebastian
> >
> >> -Original Message-
> >> From: Jakub Jelinek [mailto:ja...@redhat.com]
> >> Sent: Tuesday, November 14, 2017 10:51 AM
> >> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubiz...@gmail.com>; Kirill
> >> Yukhin <kirill.yuk...@gmail.com>; Lu, Hongjiu <hongjiu...@intel.com>
> >> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper
> >> generation for SKX
> >>
> >> On Tue, Nov 14, 2017 at 09:45:12AM +, Peryt, Sebastian wrote:
> >> > Hi,
> >> >
> >> > This patch fixes PR82941 and PR82942 by adding vzeroupper
> >> > generation on
> >> SKX.
> >> > Bootstrapped and tested.
> >> >
> >> > 14.11.2017  Sebastian Peryt  <sebastian.pe...@intel.com>
> >> >
> >> > gcc/
> >>
> >> In that case the ChangeLog entry should list the PRs, i.e.
> >>   PR targ

RE: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread Peryt, Sebastian
I have updated tests and changelog according to Jakub's suggestions.
Please find attached v2 of my patch.


14.11.2017  Sebastian Peryt  <sebastian.pe...@intel.com>

gcc/

PR target/82941
PR target/82942
* config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
to return true on Xeon and not on Xeon Phi.
(ix86_check_avx256_register): Changed to ...
(ix86_check_avx_upper_register): ... this. Add extra check for
VALID_AVX512F_REG_OR_XI_MODE.
(ix86_avx_u128_mode_needed): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_check_avx256_stores): Changed to ...
(ix86_check_avx_upper_stores): ... this. Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_after): Changed
avx_reg256_found to avx_upper_reg_found. Changed
ix86_check_avx256_stores to ix86_check_avx_upper_stores.
(ix86_avx_u128_mode_entry): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_exit): Ditto.
* config/i386/i386.h: (host_detect_local_cpu): New define.

14.11.2017  Sebastian Peryt  <sebastian.pe...@intel.com>

gcc/testsuite/

PR target/82941
PR target/82942
* gcc.target/i386/pr82941-1.c: New test.
* gcc.target/i386/pr82941-2.c: New test.
* gcc.target/i386/pr82942-1.c: New test.
* gcc.target/i386/pr82942-2.c: New test.


Thanks,
Sebastian

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, November 14, 2017 10:51 AM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubiz...@gmail.com>; Kirill Yukhin
> <kirill.yuk...@gmail.com>; Lu, Hongjiu <hongjiu...@intel.com>
> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
> for SKX
> 
> On Tue, Nov 14, 2017 at 09:45:12AM +, Peryt, Sebastian wrote:
> > Hi,
> >
> > This patch fixes PR82941 and PR82942 by adding vzeroupper generation on
> SKX.
> > Bootstrapped and tested.
> >
> > 14.11.2017  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > gcc/
> 
> In that case the ChangeLog entry should list the PRs, i.e.
>   PR target/82941
>   PR target/82942
> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> > to return true on Xeon and not on Xeon Phi.
> > (ix86_check_avx256_register): Changed to ...
> > (ix86_check_avx_upper_register): ... this.
> > (ix86_check_avx_upper_register): Add extra check for
> > VALID_AVX512F_REG_OR_XI_MODE.
> 
> The way this is usually written is instead:
>   (ix86_check_avx256_register): Changed to ...
>   (ix86_check_avx_upper_register): ... this.  Add extra check for
>   VALID_AVX512F_REG_OR_XI_MODE.
> i.e. don't duplicate the function name, just continue mentioning further 
> changes.
> 
> > (ix86_avx_u128_mode_needed): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_check_avx256_stores): Changed to ...
> > (ix86_check_avx_upper_stores): ... this.
> > (ix86_check_avx_upper_stores): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> 
> Likewise.
> 
> > gcc/testsuite/
> > * gcc.target/i386/pr82941.c: New test.
> > * gcc.target/i386/pr82942.c: New test.
> 
> Shouldn't there be also a test that if using -march=knl and another one if 
> using -
> mavx512f -mavx512er that we don't emit any vzeroupper?
> 
>   Jakub


0001-VZEROUPPER_v2.patch
Description: 0001-VZEROUPPER_v2.patch


[PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread Peryt, Sebastian
Hi,

This patch fixes PR82941 and PR82942 by adding vzeroupper generation on SKX.
Bootstrapped and tested.

14.11.2017  Sebastian Peryt  

gcc/
* config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
to return true on Xeon and not on Xeon Phi.
(ix86_check_avx256_register): Changed to ...
(ix86_check_avx_upper_register): ... this.
(ix86_check_avx_upper_register): Add extra check for
VALID_AVX512F_REG_OR_XI_MODE.
(ix86_avx_u128_mode_needed): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_check_avx256_stores): Changed to ...
(ix86_check_avx_upper_stores): ... this.
(ix86_check_avx_upper_stores): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_after): Changed
avx_reg256_found to avx_upper_reg_found.
(ix86_avx_u128_mode_after): Changed
ix86_check_avx256_stores to ix86_check_avx_upper_stores.
(ix86_avx_u128_mode_entry): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_exit): Ditto.
* config/i386/i386.h: (host_detect_local_cpu): New define.

gcc/testsuite/
* gcc.target/i386/pr82941.c: New test.
* gcc.target/i386/pr82942.c: New test.

Is it ok for trunk?

Thanks,
Sebastian



0001-VZEROUPPER.patch
Description: 0001-VZEROUPPER.patch


RE: [Patch, testcase] PR82767 Fix scan-assembler patterns in i386/pr71321.c

2017-11-05 Thread Peryt, Sebastian
> On Sun, Nov 5, 2017 at 12:14 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > Hi,
> >
> > After r253934 gcc.target/i386/pr71321.c started to fail due to the wrong
> number of scan-assembler - 2 instead of 3. This patch is fixing that.
> 
> Are you sure that there is no problem with the code generation? Did you
> investigate original PR for what it is testing and why it is testing for 
> these 3
> LEAs?

Well, the problem is due to the change in cost model. This can be reverted by 
simple modification:

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index c7ac70e..bb5b3e2 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2253,7 +2253,7 @@ struct processor_costs core_cost = {
COSTS_N_INSNS (4),  /*   DI */
COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
-  {COSTS_N_INSNS (8),  /* cost of a divide/mod for QI */
+  {COSTS_N_INSNS (18), /* cost of a divide/mod for QI */
COSTS_N_INSNS (8),  /*  HI */
/* 8-11 */
COSTS_N_INSNS (11), /*  SI */

The original PR was to make better code generation when dividing and modulo 
small integers.

Ok, maybe I missed something. I'll get back to PR and see if any other solution 
will be proposed
since for now I have nothing.

> 
> > 2017-11-05  Sebastian Peryt  <sebastian.pe...@intel.com>
> >
> > PR testsuite/82767
> > * gcc.target/i386/pr71321.c: Fix invalid testcase.
> 
> There is nothing wrong with the testcase.
> 
> > Is it ok for trunk?
> >
> > Thanks,
> > Sebastian
> >


[Patch, testcase] PR82767 Fix scan-assembler patterns in i386/pr71321.c

2017-11-05 Thread Peryt, Sebastian
Hi,

After r253934 gcc.target/i386/pr71321.c started to fail due to the wrong number 
of scan-assembler - 2 instead of 3. This patch is fixing that.

2017-11-05  Sebastian Peryt  

PR testsuite/82767
* gcc.target/i386/pr71321.c: Fix invalid testcase.

Is it ok for trunk?

Thanks,
Sebastian



PR82767.patch
Description: PR82767.patch


[patch][i386, AVX] Adding missing CMP* intrinsics

2017-10-20 Thread Peryt, Sebastian
Hi,

This patch written by Olga Makhotina adds listed below missing intrinsics:
_mm512_[mask_]cmpeq_[pd|ps]_mask
_mm512_[mask_]cmple_[pd|ps]_mask
_mm512_[mask_]cmplt_[pd|ps]_mask
_mm512_[mask_]cmpneq_[pd|ps]_mask
_mm512_[mask_]cmpnle_[pd|ps]_mask
_mm512_[mask_]cmpnlt_[pd|ps]_mask
_mm512_[mask_]cmpord_[pd|ps]_mask
_mm512_[mask_]cmpunord_[pd|ps]_mask

20.10.2017  Olga Makhotina  

gcc/
* config/i386/avx512fintrin.h (_mm512_cmpeq_pd_mask,
_mm512_cmple_pd_mask, _mm512_cmplt_pd_mask,
_mm512_cmpneq_pd_mask, _mm512_cmpnle_pd_mask,
_mm512_cmpnlt_pd_mask, _mm512_cmpord_pd_mask,
_mm512_cmpunord_pd_mask, _mm512_mask_cmpeq_pd_mask,
_mm512_mask_cmple_pd_mask, _mm512_mask_cmplt_pd_mask,
_mm512_mask_cmpneq_pd_mask, _mm512_mask_cmpnle_pd_mask,
_mm512_mask_cmpnlt_pd_mask, _mm512_mask_cmpord_pd_mask,
_mm512_mask_cmpunord_pd_mask, _mm512_cmpeq_ps_mask,
_mm512_cmple_ps_mask, _mm512_cmplt_ps_mask,
_mm512_cmpneq_ps_mask, _mm512_cmpnle_ps_mask,
_mm512_cmpnlt_ps_mask, _mm512_cmpord_ps_mask,
_mm512_cmpunord_ps_mask, _mm512_mask_cmpeq_ps_mask,
_mm512_mask_cmple_ps_mask, _mm512_mask_cmplt_ps_mask,
_mm512_mask_cmpneq_ps_mask, _mm512_mask_cmpnle_ps_mask,
_mm512_mask_cmpnlt_ps_mask, _mm512_mask_cmpord_ps_mask,
_mm512_mask_cmpunord_ps_mask): New intrinsics.

20.10.2017  Olga Makhotina  

gcc/testsuite/
* gcc.target/i386/avx512f-vcmpps-1.c (_mm512_cmpeq_ps_mask,
_mm512_cmple_ps_mask, _mm512_cmplt_ps_mask,
_mm512_cmpneq_ps_mask, _mm512_cmpnle_ps_mask,
_mm512_cmpnlt_ps_mask, _mm512_cmpord_ps_mask,
_mm512_cmpunord_ps_mask, _mm512_mask_cmpeq_ps_mask,
_mm512_mask_cmple_ps_mask, _mm512_mask_cmplt_ps_mask,
_mm512_mask_cmpneq_ps_mask, _mm512_mask_cmpnle_ps_mask,
_mm512_mask_cmpnlt_ps_mask, _mm512_mask_cmpord_ps_mask,
_mm512_mask_cmpunord_ps_mask): Test new intrinsics.
* gcc.target/i386/avx512f-vcmpps-2.c (_mm512_cmpeq_ps_mask,
_mm512_cmple_ps_mask, _mm512_cmplt_ps_mask, 
_mm512_cmpneq_ps_mask, _mm512_cmpnle_ps_mask,
_mm512_cmpnlt_ps_mask, _mm512_cmpord_ps_mask,
_mm512_cmpunord_ps_mask, _mm512_mask_cmpeq_ps_mask,
_mm512_mask_cmple_ps_mask, _mm512_mask_cmplt_ps_mask,
_mm512_mask_cmpneq_ps_mask, _mm512_mask_cmpnle_ps_mask,
_mm512_mask_cmpnlt_ps_mask, _mm512_mask_cmpord_ps_mask,
_mm512_mask_cmpunord_ps_mask): Test new intrinsics.
* gcc.target/i386/avx512f-vcmppd-1.c (_mm512_cmpeq_pd_mask,
_mm512_cmple_pd_mask, _mm512_cmplt_pd_mask,
_mm512_cmpneq_pd_mask, _mm512_cmpnle_pd_mask,
_mm512_cmpnlt_pd_mask, _mm512_cmpord_pd_mask,
_mm512_cmpunord_pd_mask, _mm512_mask_cmpeq_pd_mask,
_mm512_mask_cmple_pd_mask, _mm512_mask_cmplt_pd_mask,
_mm512_mask_cmpneq_pd_mask, _mm512_mask_cmpnle_pd_mask,
_mm512_mask_cmpnlt_pd_mask, _mm512_mask_cmpord_pd_mask,
_mm512_mask_cmpunord_pd_mask): Test new intrinsics.
* gcc.target/i386/avx512f-vcmppd-2.c (_mm512_cmpeq_pd_mask,
_mm512_cmple_pd_mask, _mm512_cmplt_pd_mask,
_mm512_cmpneq_pd_mask, _mm512_cmpnle_pd_mask,
_mm512_cmpnlt_pd_mask, _mm512_cmpord_pd_mask,
_mm512_cmpunord_pd_mask, _mm512_mask_cmpeq_pd_mask,
_mm512_mask_cmple_pd_mask, _mm512_mask_cmplt_pd_mask,
_mm512_mask_cmpneq_pd_mask, _mm512_mask_cmpnle_pd_mask,
_mm512_mask_cmpnlt_pd_mask, _mm512_mask_cmpord_pd_mask,
_mm512_mask_cmpunord_pd_mask): Test new intrinsics.

Is it ok for trunk?
 
Thanks,
Sebastian



0001-vcmpp-d-s.patch
Description: 0001-vcmpp-d-s.patch


Missing REDUCE[SD,SS] intrinsics

2017-10-16 Thread Peryt, Sebastian
Hi,

This patch written by Olga Makhotina adds missing intrinsics for REDUCE[SD,SS].

16.10.2017 Olga Makhotina 

gcc/
* config/i386/avx512dqintrin.h (_mm_mask_reduce_sd,
_mm_maskz_reduce_sd, _mm_mask_reduce_ss, 
_mm_maskz_reduce_ss): New intrinsics.
* config/i386/i386-builtin.def (__builtin_ia32_reducesd_mask,
__builtin_ia32_reducess_mask): New builtin.
(__builtin_ia32_reducesd, __builtin_ia32_reducess): Remove.
* config/i386/sse.md (reduces): Renamed to ...
(reduces): ... this.
(vreduce\t{%3, %2, %1, %0|%0, %1, %2, %3}): 
Changed to ...
(vreduce\t{%3, %2, %1, %0|
%0, %1, %2, %3}): ... this.

gcc/testsuite/
* gcc.target/i386/avx512dq-vreducesd-1.c (_mm_mask_reduce_sd,
_mm_maskz_reduce_sd): Test new intrinsics.
* gcc.target/i386/avx512dq-vreducesd-2.c: New.
* gcc.target/i386/avx512dq-vreducess-1.c (_mm_mask_reduce_ss,
_mm_maskz_reduce_ss): Test new intrinsics.
* gcc.target/i386/avx512dq-vreducess-2.c: New.
* gcc.target/i386/avx-1.c (__builtin_ia32_reducesd,
__builtin_ia32_reducess): Remove builtin.
(__builtin_ia32_reducesd_mask,
__builtin_ia32_reducess_mask): Test new builtin.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

Is it ok for trunk?
 
Thanks,
Sebastian



0001-reduce_ss-reduce_sd.patch
Description: 0001-reduce_ss-reduce_sd.patch


RE: [PATCH][x86] Knights Mill -march/-mtune options

2017-09-20 Thread Peryt, Sebastian
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Tuesday, September 19, 2017 11:23 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> 
> On Tue, Sep 19, 2017 at 9:01 AM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> 
> >> >> >> > This patch adds  options -march=/-mtune=knm for Knights Mill.
> >> >> >> >
> >> >> >> > 2017-09-14  Sebastian Peryt  <sebastian.pe...@intel.com> gcc/
> >> >> >> >
> >> >> >> > * config.gcc: Support "knm".
> >> >> >> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> "knm".
> >> >> >> > * config/i386/i386-c.c (ix86_target_macros_internal): 
> >> >> >> > Handle
> >> >> >> > PROCESSOR_KNM.
> >> >> >> > * config/i386/i386.c (m_KNM): Define.
> >> >> >> > (processor_target_table): Add "knm".
> >> >> >> > (PTA_KNM): Define.
> >> >> >> > (ix86_option_override_internal): Add "knm".
> >> >> >> > (ix86_issue_rate): Add PROCESSOR_KNM.
> >> >> >> > (ix86_adjust_cost): Ditto.
> >> >> >> > (ia32_multipass_dfa_lookahead): Ditto.
> >> >> >> > (get_builtin_code_for_version): Handle PROCESSOR_KNM.
> >> >> >> > (fold_builtin_cpu): Define M_INTEL_KNM.
> >> >> >> > * config/i386/i386.h (TARGET_KNM): Define.
> >> >> >> > (processor_type): Add PROCESSOR_KNM.
> >> >> >> > * config/i386/x86-tune.def: Add m_KNM.
> >> >> >> > * doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.
> >> >> >> >
> >> >> >> >
> >> >> >> > gcc/testsuite/
> >> >> >> >
> >> >> >> > * gcc.target/i386/funcspec-5.c: Test knm.
> >> >> >> >
> >> >> >> > Is it ok for trunk?
> >> >> >>
> >> >> >> You also have to update libgcc/cpuinfo.h together with
> >> >> >> fold_builtin_cpu from i386.c. Please note that all new
> >> >> >> processor types and subtypes have to be added at the end of the enum.
> >> >> >>
> >> >> >
> >> >> > Uros,
> >> >> >
> >> >> > I have updated libgcc/cpuinfo.h and libgcc/cpuinfo.c. I
> >> >> > understood that CPU_TYPE_MAX in libgcc/cpuinfo.h processor_types
> >> >> > is some kind of barrier, this is why I put KNM before that. Is that 
> >> >> > correct
> thinking?
> >> >> > As for fold_builtin_cpu in i386.c I already have something like this:
> >> >> >
> >> >> > @@ -34217,6 +34229,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >> >  M_AMDFAM15H,
> >> >> >  M_INTEL_SILVERMONT,
> >> >> >  M_INTEL_KNL,
> >> >> > +M_INTEL_KNM,
> >> >> >  M_AMD_BTVER1,
> >> >> >  M_AMD_BTVER2,
> >> >> >  M_CPU_SUBTYPE_START,
> >> >> > @@ -34262,6 +34275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >> >{"bonnell", M_INTEL_BONNELL},
> >> >> >{"silvermont", M_INTEL_SILVERMONT},
> >> >> >{"knl", M_INTEL_KNL},
> >> >> > +  {"knm", M_INTEL_KNM},
> >> >> >{"amdfam10h", M_AMDFAM10H},
> >> >> >{"barcelona", M_AMDFAM10H_BARCELONA},
> >> >> >{"shanghai", M_AMDFAM10H_SHANGHAI},
> >> >> >
> >> >> > I couldn't find any other place where I'm supposed to add anything
> extra.
> >> >>
> >> >> Please look at libgcc/config/i386/cpuinfo.h. The comment here says that:
> >> >>
> >> >> /* Any new types or subtypes have to be inserted at the end. */
> >> >>
> >> >> The above patch should then add M_INTE

RE: [PATCH][x86] Knights Mill -march/-mtune options

2017-09-19 Thread Peryt, Sebastian
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Monday, September 18, 2017 9:10 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> 
> On Mon, Sep 18, 2017 at 12:42 PM, Peryt, Sebastian
> <sebastian.pe...@intel.com> wrote:
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> Sent: Monday, September 18, 2017 12:23 PM
> >> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> >> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> >>
> >> On Mon, Sep 18, 2017 at 12:17 PM, Peryt, Sebastian
> >> <sebastian.pe...@intel.com> wrote:
> >> >> -Original Message-
> >> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> >> Sent: Sunday, September 17, 2017 6:14 PM
> >> >> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> >> >> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin
> >> >> <kirill.yuk...@gmail.com>
> >> >> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> >> >>
> >> >> On Thu, Sep 14, 2017 at 1:47 PM, Peryt, Sebastian
> >> >> <sebastian.pe...@intel.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This patch adds  options -march=/-mtune=knm for Knights Mill.
> >> >> >
> >> >> > 2017-09-14  Sebastian Peryt  <sebastian.pe...@intel.com> gcc/
> >> >> >
> >> >> > * config.gcc: Support "knm".
> >> >> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect 
> >> >> > "knm".
> >> >> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> >> >> > PROCESSOR_KNM.
> >> >> > * config/i386/i386.c (m_KNM): Define.
> >> >> > (processor_target_table): Add "knm".
> >> >> > (PTA_KNM): Define.
> >> >> > (ix86_option_override_internal): Add "knm".
> >> >> > (ix86_issue_rate): Add PROCESSOR_KNM.
> >> >> > (ix86_adjust_cost): Ditto.
> >> >> > (ia32_multipass_dfa_lookahead): Ditto.
> >> >> > (get_builtin_code_for_version): Handle PROCESSOR_KNM.
> >> >> > (fold_builtin_cpu): Define M_INTEL_KNM.
> >> >> > * config/i386/i386.h (TARGET_KNM): Define.
> >> >> > (processor_type): Add PROCESSOR_KNM.
> >> >> > * config/i386/x86-tune.def: Add m_KNM.
> >> >> > * doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.
> >> >> >
> >> >> >
> >> >> > gcc/testsuite/
> >> >> >
> >> >> > * gcc.target/i386/funcspec-5.c: Test knm.
> >> >> >
> >> >> > Is it ok for trunk?
> >> >>
> >> >> You also have to update libgcc/cpuinfo.h together with
> >> >> fold_builtin_cpu from i386.c. Please note that all new processor
> >> >> types and subtypes have to be added at the end of the enum.
> >> >>
> >> >
> >> > Uros,
> >> >
> >> > I have updated libgcc/cpuinfo.h and libgcc/cpuinfo.c. I understood
> >> > that CPU_TYPE_MAX in libgcc/cpuinfo.h processor_types is some kind
> >> > of barrier, this is why I put KNM before that. Is that correct thinking?
> >> > As for fold_builtin_cpu in i386.c I already have something like this:
> >> >
> >> > @@ -34217,6 +34229,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >  M_AMDFAM15H,
> >> >  M_INTEL_SILVERMONT,
> >> >  M_INTEL_KNL,
> >> > +M_INTEL_KNM,
> >> >  M_AMD_BTVER1,
> >> >  M_AMD_BTVER2,
> >> >  M_CPU_SUBTYPE_START,
> >> > @@ -34262,6 +34275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >{"bonnell", M_INTEL_BONNELL},
> >> >{"silvermont", M_INTEL_SILVERMONT},
> >> >{"knl", M_INTEL_KNL},
> >> > +  {"knm", M_INTEL_KNM},
> >> >{"amdfam10h", M_AMDFAM10H},
> >> >{"barcelona", M_AMDFAM10H_BARCELONA},
> >> >{"shanghai", M_AMDFAM10H_SHANGHAI},
> >> >
> >> > I couldn't find any other place where I'm supposed to add anything extra.
> >>
> >> Please look at libgcc/config/i386/cpuinfo.h. The comment here says that:
> >>
> >> /* Any new types or subtypes have to be inserted at the end. */
> >>
> >> The above patch should then add M_INTEL_KNM as the last entry
> >> *before* M_CPU_SUBTYPE_START.
> >>
> >
> > Sorry, I didn't notice this value at first. I believe now it's correct.
> 
> OK for mainline SVN (with updated ChangeLog).
> 

Can you please commit for me?

Thanks,
Sebastian

> Thanks,
> Uros.


RE: [PATCH][x86] Knights Mill -march/-mtune options

2017-09-18 Thread Peryt, Sebastian
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Monday, September 18, 2017 12:23 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> 
> On Mon, Sep 18, 2017 at 12:17 PM, Peryt, Sebastian
> <sebastian.pe...@intel.com> wrote:
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> Sent: Sunday, September 17, 2017 6:14 PM
> >> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> >> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> >>
> >> On Thu, Sep 14, 2017 at 1:47 PM, Peryt, Sebastian
> >> <sebastian.pe...@intel.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > This patch adds  options -march=/-mtune=knm for Knights Mill.
> >> >
> >> > 2017-09-14  Sebastian Peryt  <sebastian.pe...@intel.com> gcc/
> >> >
> >> > * config.gcc: Support "knm".
> >> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect 
> >> > "knm".
> >> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> >> > PROCESSOR_KNM.
> >> > * config/i386/i386.c (m_KNM): Define.
> >> > (processor_target_table): Add "knm".
> >> > (PTA_KNM): Define.
> >> > (ix86_option_override_internal): Add "knm".
> >> > (ix86_issue_rate): Add PROCESSOR_KNM.
> >> > (ix86_adjust_cost): Ditto.
> >> > (ia32_multipass_dfa_lookahead): Ditto.
> >> > (get_builtin_code_for_version): Handle PROCESSOR_KNM.
> >> > (fold_builtin_cpu): Define M_INTEL_KNM.
> >> > * config/i386/i386.h (TARGET_KNM): Define.
> >> > (processor_type): Add PROCESSOR_KNM.
> >> > * config/i386/x86-tune.def: Add m_KNM.
> >> > * doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.
> >> >
> >> >
> >> > gcc/testsuite/
> >> >
> >> > * gcc.target/i386/funcspec-5.c: Test knm.
> >> >
> >> > Is it ok for trunk?
> >>
> >> You also have to update libgcc/cpuinfo.h together with
> >> fold_builtin_cpu from i386.c. Please note that all new processor
> >> types and subtypes have to be added at the end of the enum.
> >>
> >
> > Uros,
> >
> > I have updated libgcc/cpuinfo.h and libgcc/cpuinfo.c. I understood
> > that CPU_TYPE_MAX in libgcc/cpuinfo.h processor_types is some kind of
> > barrier, this is why I put KNM before that. Is that correct thinking?
> > As for fold_builtin_cpu in i386.c I already have something like this:
> >
> > @@ -34217,6 +34229,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >  M_AMDFAM15H,
> >  M_INTEL_SILVERMONT,
> >  M_INTEL_KNL,
> > +M_INTEL_KNM,
> >  M_AMD_BTVER1,
> >  M_AMD_BTVER2,
> >  M_CPU_SUBTYPE_START,
> > @@ -34262,6 +34275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >{"bonnell", M_INTEL_BONNELL},
> >{"silvermont", M_INTEL_SILVERMONT},
> >{"knl", M_INTEL_KNL},
> > +  {"knm", M_INTEL_KNM},
> >{"amdfam10h", M_AMDFAM10H},
> >{"barcelona", M_AMDFAM10H_BARCELONA},
> >{"shanghai", M_AMDFAM10H_SHANGHAI},
> >
> > I couldn't find any other place where I'm supposed to add anything extra.
> 
> Please look at libgcc/config/i386/cpuinfo.h. The comment here says that:
> 
> /* Any new types or subtypes have to be inserted at the end. */
> 
> The above patch should then add M_INTEL_KNM as the last entry *before*
> M_CPU_SUBTYPE_START.
> 

Sorry, I didn't notice this value at first. I believe now it's correct.

Sebastian

> > Additionally I updated one extra test I found -
> > gcc.target/i386/funcspec-56.inc
> >
> >> Ops, and ANDFAM17H processor type should not be there in cpuinfo.h.
> >
> > Sorry, I don't understand - it shouldn't be at this position, or in this 
> > enum at all?
> 
> This means I have to synchronize gcc part with libgcc. I'll do it later today.
> 
> Uros.


KNM_enabling_v3.patch
Description: KNM_enabling_v3.patch


RE: [PATCH][x86] Knights Mill -march/-mtune options

2017-09-18 Thread Peryt, Sebastian
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Sunday, September 17, 2017 6:14 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> 
> On Thu, Sep 14, 2017 at 1:47 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > Hi,
> >
> > This patch adds  options -march=/-mtune=knm for Knights Mill.
> >
> > 2017-09-14  Sebastian Peryt  <sebastian.pe...@intel.com> gcc/
> >
> > * config.gcc: Support "knm".
> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect "knm".
> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> > PROCESSOR_KNM.
> > * config/i386/i386.c (m_KNM): Define.
> > (processor_target_table): Add "knm".
> > (PTA_KNM): Define.
> > (ix86_option_override_internal): Add "knm".
> > (ix86_issue_rate): Add PROCESSOR_KNM.
> > (ix86_adjust_cost): Ditto.
> > (ia32_multipass_dfa_lookahead): Ditto.
> > (get_builtin_code_for_version): Handle PROCESSOR_KNM.
> > (fold_builtin_cpu): Define M_INTEL_KNM.
> > * config/i386/i386.h (TARGET_KNM): Define.
> > (processor_type): Add PROCESSOR_KNM.
> > * config/i386/x86-tune.def: Add m_KNM.
> > * doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.
> >
> >
> > gcc/testsuite/
> >
> > * gcc.target/i386/funcspec-5.c: Test knm.
> >
> > Is it ok for trunk?
> 
> You also have to update libgcc/cpuinfo.h together with fold_builtin_cpu from
> i386.c. Please note that all new processor types and subtypes have to be added
> at the end of the enum.
> 

Uros,

I have updated libgcc/cpuinfo.h and libgcc/cpuinfo.c. I understood that 
CPU_TYPE_MAX in libgcc/cpuinfo.h processor_types is some kind of barrier,
this is why I put KNM before that. Is that correct thinking? As for 
fold_builtin_cpu 
in i386.c I already have something like this:

@@ -34217,6 +34229,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
 M_AMDFAM15H,
 M_INTEL_SILVERMONT,
 M_INTEL_KNL,
+M_INTEL_KNM,
 M_AMD_BTVER1,
 M_AMD_BTVER2,
 M_CPU_SUBTYPE_START,
@@ -34262,6 +34275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
   {"bonnell", M_INTEL_BONNELL},
   {"silvermont", M_INTEL_SILVERMONT},
   {"knl", M_INTEL_KNL},
+  {"knm", M_INTEL_KNM},
   {"amdfam10h", M_AMDFAM10H},
   {"barcelona", M_AMDFAM10H_BARCELONA},
   {"shanghai", M_AMDFAM10H_SHANGHAI},

I couldn't find any other place where I'm supposed to add anything extra.
Additionally I updated one extra test I found - gcc.target/i386/funcspec-56.inc

> Ops, and ANDFAM17H processor type should not be there in cpuinfo.h.

Sorry, I don't understand - it shouldn't be at this position, or in this enum 
at all?
> 
> Uros.

Thanks,
Sebastian

2017-09-18  Sebastian Peryt  <sebastian.pe...@intel.com> 

gcc/

* config.gcc: Support "knm".
* config/i386/driver-i386.c (host_detect_local_cpu): Detect "knm".
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
PROCESSOR_KNM.
* config/i386/i386.c (m_KNM): Define.
 (processor_target_table): Add "knm".
 (PTA_KNM): Define.
(ix86_option_override_internal): Add "knm".
 (ix86_issue_rate): Add PROCESSOR_KNM.
(ix86_adjust_cost): Ditto.
(ia32_multipass_dfa_lookahead): Ditto.
(get_builtin_code_for_version): Handle PROCESSOR_KNM.
(fold_builtin_cpu): Define M_INTEL_KNM.
* config/i386/i386.h (TARGET_KNM): Define.
(processor_type): Add PROCESSOR_KNM.
 * config/i386/x86-tune.def: Add m_KNM.
* doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.

libgcc/
* config/i386/cpuinfo.h (processor_types): Add INTEL_KNM.
* config/i386/cpuinfo.c (get_intel_cpu): Detect Knights Mill.

gcc/testsuite/

* gcc.target/i386/funcspec-5.c: Test knm.
* gcc.target/i386/funcspec-56.inc: Test arch=knm.


KNM_enabling_v2.patch
Description: KNM_enabling_v2.patch


[PATCH][x86] Knights Mill -march/-mtune options

2017-09-14 Thread Peryt, Sebastian
Hi,

This patch adds  options -march=/-mtune=knm for Knights Mill.

2017-09-14  Sebastian Peryt  
gcc/

* config.gcc: Support "knm".
* config/i386/driver-i386.c (host_detect_local_cpu): Detect "knm".
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
PROCESSOR_KNM.
* config/i386/i386.c (m_KNM): Define.
(processor_target_table): Add "knm".
(PTA_KNM): Define.
(ix86_option_override_internal): Add "knm".
(ix86_issue_rate): Add PROCESSOR_KNM.
(ix86_adjust_cost): Ditto.
(ia32_multipass_dfa_lookahead): Ditto.
(get_builtin_code_for_version): Handle PROCESSOR_KNM.
(fold_builtin_cpu): Define M_INTEL_KNM.
* config/i386/i386.h (TARGET_KNM): Define.
(processor_type): Add PROCESSOR_KNM.
* config/i386/x86-tune.def: Add m_KNM.
* doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.


gcc/testsuite/

* gcc.target/i386/funcspec-5.c: Test knm.

Is it ok for trunk?

Thanks,
Sebastian




KNM_enabling.patch
Description: KNM_enabling.patch


RE: [PATCH] i386: Rewrite check for AVX512 features

2017-08-04 Thread Peryt, Sebastian
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Sunday, July 30, 2017 11:02 AM
> To: H.J. Lu <hjl.to...@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; Koval, Julia <julia.ko...@intel.com>; Peryt,
> Sebastian <sebastian.pe...@intel.com>
> Subject: Re: [PATCH] i386: Rewrite check for AVX512 features
> 
> On Sat, Jul 29, 2017 at 3:06 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
> > Add a new file, avx512-check.h, to check all AVX512 features.  The
> > test is skipped if any requested AVX512 features are unavailable.
> >
> > Tested on Skylake server and Haswell.  OK for trunk?
> 
> No, I'd rather leave it in in the way they are now, so test can include 
> individual
> checks.
> 
> Uros.
> 
Uros,

Can you please suggests any alternative approach? The main problem with current
one used in avx512f-helper.h is that it doesn't take into account situations 
where
two features are required, but only one is supported by CPU.

That's exactly the case with AVX512VL and AVX512VBMI on SKX. Once 
avx512vl-check.h
verifies existence of AVX512VL on SKX it starts to execute test, which fails 
because 
AVX512VBMI is not supported but it has never been checked, before test 
execution.

Honestly I cannot think of any solution that would allow for both individual 
include 
files (beside what HJ already did in those few remaining tests) and multiple 
features
verification. Also I think it's worth taking into account that not many tests 
actually 
use individual include files instead of avx512f-helper.h.

Thanks,
Sebastian
> >
> > H.J.
> > ---
> > PR target/81590
> > * gcc.target/i386/avx512-check.h: New file.
> > * gcc.target/i386/avx5124fmaps-check.h: Removed.
> > * gcc.target/i386/avx5124vnniw-check.h: Likewise.
> > * gcc.target/i386/avx512cd-check.h: Likewise.
> > * gcc.target/i386/avx512ifma-check.h: Likewise.
> > * gcc.target/i386/avx512vbmi-check.h: Likewise.
> > * gcc.target/i386/avx512vpopcntdq-check.h: Likewise.
> > * gcc.target/i386/avx512bw-check.h: Rewrite.
> > * gcc.target/i386/avx512dq-check.h: Likewise.
> > * gcc.target/i386/avx512er-check.h: Likewise.
> > * gcc.target/i386/avx512f-check.h: Likewise.
> > * gcc.target/i386/avx512vl-check.h: Likewise.
> > * gcc.target/i386/avx512f-helper.h: Include "avx512-check.h"
> > only.
> > (test_512): Removed.
> > (avx512*_test): Likewise.
> > * gcc.target/i386/avx512f-pr71559.c (TEST): Undef.
> > ---
> >  gcc/testsuite/gcc.target/i386/avx512-check.h   | 113
> +
> >  gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h |  47 -
> > gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h |  47 -
> >  gcc/testsuite/gcc.target/i386/avx512bw-check.h |  50 +
> >  gcc/testsuite/gcc.target/i386/avx512cd-check.h |  46 -
> >  gcc/testsuite/gcc.target/i386/avx512dq-check.h |  50 +
> >  gcc/testsuite/gcc.target/i386/avx512er-check.h |  49 +
> >  gcc/testsuite/gcc.target/i386/avx512f-check.h  |  49 +
> >  gcc/testsuite/gcc.target/i386/avx512f-helper.h |  64 +---
> >  gcc/testsuite/gcc.target/i386/avx512f-pr71559.c|   1 +
> >  gcc/testsuite/gcc.target/i386/avx512ifma-check.h   |  46 -
> >  gcc/testsuite/gcc.target/i386/avx512vbmi-check.h   |  46 -
> >  gcc/testsuite/gcc.target/i386/avx512vl-check.h |  51 +-
> >  .../gcc.target/i386/avx512vpopcntdq-check.h|  47 -
> >  14 files changed, 130 insertions(+), 576 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/i386/avx512-check.h
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-check.h
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512ifma-check.h
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-check.h
> >  delete mode 100644
> > gcc/testsuite/gcc.target/i386/avx512vpopcntdq-check.h
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > new file mode 100644
> > index 000..bfe14960100
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > @@ -0,0 +1,113 @@
> > +#include 
> > +#include "cpuid.h"
> > +#include "m512-check.h"
> > +#include "avx512f-os-support.h"

[PATCH][x86] Add missing intrinsics for VGETMANT[SD,SS] and VGETEXP[SD,SS]

2017-07-06 Thread Peryt, Sebastian
Hi,

This patch adds missing intrinsics for VGETEXPSD, VGETEXPSS, VGETMANTSD, 
VGETMANTSS.

2017-07-06  Sebastian Peryt  

gcc/
* config/i386/avx512fintrin.h (_mm_mask_getexp_round_ss, 
_mm_maskz_getexp_round_ss,  _mm_mask_getexp_round_sd, 
_mm_maskz_getexp_round_sd, _mm_mask_getmant_round_sd,
_mm_maskz_getmant_round_sd, _mm_mask_getmant_round_ss, 
_mm_maskz_getmant_round_ss, _mm_mask_getexp_ss, _mm_maskz_getexp_ss, 
_mm_mask_getexp_sd, _mm_maskz_getexp_sd, _mm_mask_getmant_sd, 
_mm_maskz_getmant_sd, _mm_mask_getmant_ss, 
_mm_maskz_getmant_ss): New intrinsics.
(__builtin_ia32_getexpss128_mask): Changed to ...
__builtin_ia32_getexpss128_round ... this.
(__builtin_ia32_getexpsd128_mask): Changed to ...
__builtin_ia32_getexpsd128_round ... this.
* config/i386/i386-builtin-types.def 
((V2DF, V2DF, V2DF, INT, V2DF, UQI, INT),
(V4SF, V4SF, V4SF, INT, V4SF, UQI, INT)): New function type aliases.
* config/i386/i386-builtin.def (__builtin_ia32_getexpsd_mask_round, 
__builtin_ia32_getexpss_mask_round, 
__builtin_ia32_getmantsd_mask_round, 
__builtin_ia32_getmantss_mask_round): New builtins.
* config/i386/i386.c (V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT,
V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT): Handle new types.
(CODE_FOR_avx512f_vgetmantv2df_mask_round, 
CODE_FOR_avx512f_vgetmantv4sf_mask_round): New cases.
* config/i386/sse.md 
(avx512f_sgetexp): Changed to ...
avx512f_sgetexp
 ... this.
(vgetexp\t{%2, %1, %0|
%0, %1, %2}): Changed to ...
vgetexp
\t{%2, %1, %0|
%0, %1, %2} ... 
this.
(avx512f_vgetmant): Changed to ...
avx512f_vgetmant
 ... this.
(vgetmant\t{%3, %2, %1, %0|
%0, %1, %2, %3}): Changed to ...
vgetmant
\t{%3, %2, %1, %0|
%0, %1, %2
, %3} ... this.
* config/i386/subst.md (mask_scalar_operand4, 
round_saeonly_scalar_mask_operand4, round_saeonly_scalar_mask_op4, 
round_saeonly_scalar_nimm_predicate): New subst attributes.

gcc/testsuite/
* gcc.target/i386/avx512f-vgetexpsd-1.c (_mm_mask_getexp_sd, 
_mm_maskz_getexp_sd, _mm_mask_getexp_round_sd, 
_mm_maskz_getexp_round_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vgetexpss-1.c (_mm_mask_getexp_ss, 
_mm_maskz_getexp_ss, _mm_mask_getexp_round_ss, 
_mm_maskz_getexp_round_ss): Ditto.
* gcc.target/i386/avx512f-vgetmantsd-1.c (_mm_mask_getmant_sd, 
_mm_maskz_getmant_sd, _mm_mask_getmant_round_sd, 
_mm_maskz_getmant_round_sd): Ditto.
* gcc.target/i386/avx512f-vgetmantss-1.c (_mm_mask_getmant_ss, 
_mm_maskz_getmant_ss, _mm_mask_getmant_round_ss, 
_mm_maskz_getmant_round_ss): Ditto.
* gcc.target/i386/avx512f-vgetexpsd-2.c (_mm_mask_getexp_sd, 
_mm_maskz_getexp_sd, _mm_getexp_round_sd, _mm_mask_getexp_round_sd, 
_mm_maskz_getexp_round_sd): New runtime tests.
* gcc.target/i386/avx512f-vgetexpss-2.c (_mm_mask_getexp_ss, 
_mm_maskz_getexp_ss, _mm_getexp_round_ss, _mm_mask_getexp_round_ss, 
_mm_maskz_getexp_round_ss): Ditto.
* gcc.target/i386/avx512f-vgetmantsd-2.c (_mm_mask_getmant_sd, 
_mm_maskz_getmant_sd, _mm_getmant_round_sd, _mm_mask_getmant_round_sd, 
_mm_maskz_getmant_round_sd): Ditto.
* gcc.target/i386/avx512f-vgetmantss-2.c (_mm_mask_getmant_ss, 
_mm_maskz_getmant_ss, _mm_getmant_round_ss, _mm_mask_getmant_round_ss, 
_mm_maskz_getmant_round_ss): Ditto.
* gcc.target/i386/avx-1.c (__builtin_ia32_getexpsd_mask_round, 
__builtin_ia32_getexpss_mask_round, 
__builtin_ia32_getmantsd_mask_round, 
__builtin_ia32_getmantss_mask_round): Test new builtins.
* gcc.target/i386/sse-13.c : Ditto.
* gcc.target/i386/sse-23.c: Ditto. 
* gcc.target/i386/sse-14.c (_mm_maskz_getexp_round_sd, 
_mm_maskz_getexp_round_ss, _mm_mask_getmant_round_sd, 
_mm_maskz_getmant_round_sd, _mm_mask_getmant_round_ss,
_mm_maskz_getmant_round_ss, _mm_mask_getexp_round_sd, 
_mm_mask_getexp_round_ss): Test new intrinsics.
* gcc.target/i386/testround-1.c: Ditto.
* gcc.target/i386/sse-22.c (_mm_maskz_getmant_round_sd, 
_mm_maskz_getmant_round_ss, _mm_mask_getmant_round_sd, 
_mm_mask_getmant_round_ss): Test new intrinsics 
* gcc.target/i386/testimm-10.c (_mm_mask_getmant_sd, 
_mm_maskz_getmant_sd, _mm_mask_getmant_ss, 
_mm_maskz_getmant_ss): Test new intrinsics.

Is it ok for trunk?

Thanks,
Sebastian


Missing_GETEXP_GETMANT.patch
Description: Missing_GETEXP_GETMANT.patch


RE: [PATHC][x86] Scalar mask and round RTL templates

2017-07-05 Thread Peryt, Sebastian
Tests were added. I also updated Changelog and set the max line length to be 
equal to 79 characters.

gcc/
* config/i386/subst.md (mask_scalar, round_scalar,
round_saeonly_scalar): New meta-templates.
(mask_scalar_name, mask_scalar_operand3, round_scalar_name,
round_scalar_mask_operand3, round_scalar_mask_op3,
round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
round_saeonly_scalar_constraint, 
round_saeonly_scalar_prefix): New subst attribute.
* config/i386/sse.md
(_vm3): Renamed to ...
_vm3
 ... this.
(_vm3): Renamed to 
...
_vm3
 ... this.
(_vm3): Renamed to ...
_vm3
 ... this.
(v
\t{%2, %1, %0|
%0, %1, %2}): Changed to ...
v
\t{%2, %1, %0|
%0, %1, %2} ... this.
(v
\t{%2, %1, %0|
%0, %1, %2}): Changed to ...
v
\t{%2, %1, %0|
%0, %1, %2} ... this.
(v
\t{%2, %1, %0|
%0, %1, %2}): Changed to 
...
v
\t{%2, %1, %0|
%0, %1, %2
} ... this.

gcc/testsuite
* gcc.target/i386/avx512f-vaddsd-3.c: New test for mask 0 verification.
* gcc.target/i386/avx512f-vaddss-3.c: Ditto.
* gcc.target/i386/avx512f-vdivsd-3.c: Ditto.
* gcc.target/i386/avx512f-vdivss-3.c: Ditto.
* gcc.target/i386/avx512f-vmaxsd-3.c: Ditto.
* gcc.target/i386/avx512f-vmaxss-3.c: Ditto.
* gcc.target/i386/avx512f-vminsd-3.c: Ditto.
* gcc.target/i386/avx512f-vminss-3.c: Ditto.
* gcc.target/i386/avx512f-vmulsd-3.c: Ditto.
* gcc.target/i386/avx512f-vmulss-3.c: Ditto.
* gcc.target/i386/avx512f-vsubsd-3.c: Ditto.
* gcc.target/i386/avx512f-vsubss-3.c: Ditto.

Is it ok for trunk?

Thanks,
Sebastian

-Original Message-
From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com] 
Sent: Wednesday, July 5, 2017 12:36 PM
To: Peryt, Sebastian <sebastian.pe...@intel.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATHC][x86] Scalar mask and round RTL templates

On 05 Jul 06:38, Peryt, Sebastian wrote:
> Hi Kirill,
> 
> Sorry for this confusion. I meant to write MDs for intrinsics. Those 
> intrinsics are all masked ones for ADD[SD,SS], SUB[SD,SS], MUL[SD,SS], 
> DIV[SD,SS], MIN[SD,SS] and MAX[SD,SS]. What I found is that for mask equal 0 
> they were producing wrong results when old mask meta-template was used.
What you're talking about looks like a bug. Could you pls add a regession test 
to your patch?

> Modified changelog below.
> 
> 2017-07-05  Sebastian Peryt  <sebastian.pe...@intel.com>
> 
> gcc/
>   * config/i386/subst.md (mask_scalar, round_scalar, 
> round_saeonly_scalar): New meta-templates.
>   (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
>   round_scalar_mask_operand3, round_scalar_mask_op3,
>   round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
>   round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
>   round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> subst attribute.
>   * config/i386/sse.md
>   (_vm3): Renamed to ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to 
> ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to ...
>   _vm3 ... 
> this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to ...
>   v\t{%2, 
> %1, %0|
>   %0, %1, %2} ... this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to ...
>   v\t{%2, 
> %1, %0|
>   %0, %1, %2} ... this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to 
> ...
>   
> v\t{%2, %1, 
> %0|
>   %0, %1, %2} 
> ... this.
Max line length is 79 characters I suppose.

--
Thanks, K
> 
> Is it ok for trunk?
> 
> Thanks,
> Sebastian
> 
> -Original Message-
> From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com]
> Sent: Tuesday, July 4, 2017 7:45 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubiz...@gmail.com>
> Subject: Re: [PATHC][x86] Scalar mask and round RTL templates
> 
> Hello Sebastian,
> On 23 Jun 09:00, Peryt, Sebastian wrote:
> > Hi,
> > 
> > This patch adds three extra RTL meta-templates for scalar round and mask. 
> > Additionally fixes errors caused by previous mask and round usage in some 
> > of the intrinsics that I found.
> Could you pls point which intrinsics did you fixed (or which errors)?
> I see only MD changes in your patch.
> 
> > 
> > 2017-06-23  Sebastian Peryt  <sebastian.pe...@intel.

RE: [PATHC][x86] Scalar mask and round RTL templates

2017-07-05 Thread Peryt, Sebastian
Hi Kirill,

Sorry for this confusion. I meant to write MDs for intrinsics. Those intrinsics 
are all masked ones for ADD[SD,SS], SUB[SD,SS], MUL[SD,SS], DIV[SD,SS],
MIN[SD,SS] and MAX[SD,SS]. What I found is that for mask equal 0 they were 
producing wrong results when old mask meta-template was used.

Modified changelog below.

2017-07-05  Sebastian Peryt  <sebastian.pe...@intel.com>

gcc/
* config/i386/subst.md (mask_scalar, round_scalar, 
round_saeonly_scalar): New meta-templates.
(mask_scalar_name, mask_scalar_operand3, round_scalar_name,
round_scalar_mask_operand3, round_scalar_mask_op3,
round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
subst attribute.
* config/i386/sse.md
(_vm3): Renamed to ...
_vm3 
... this.
(_vm3): Renamed to 
...
_vm3 
... this.
(_vm3): Renamed to ...
_vm3 ... 
this.
(v\t{%2, %1, 
%0|
%0, %1, %2}): Changed to ...
v\t{%2, 
%1, %0|
%0, %1, %2} ... this.
(v\t{%2, %1, 
%0|
%0, %1, %2}): Changed to ...
v\t{%2, 
%1, %0|
%0, %1, %2} ... this.
(v\t{%2, %1, 
%0|
%0, %1, %2}): Changed to 
...

v\t{%2, %1, 
%0|
%0, %1, %2} 
... this.

Is it ok for trunk?

Thanks,
Sebastian

-Original Message-
From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com] 
Sent: Tuesday, July 4, 2017 7:45 PM
To: Peryt, Sebastian <sebastian.pe...@intel.com>
Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubiz...@gmail.com>
Subject: Re: [PATHC][x86] Scalar mask and round RTL templates

Hello Sebastian,
On 23 Jun 09:00, Peryt, Sebastian wrote:
> Hi,
> 
> This patch adds three extra RTL meta-templates for scalar round and mask. 
> Additionally fixes errors caused by previous mask and round usage in some of 
> the intrinsics that I found.
Could you pls point which intrinsics did you fixed (or which errors)?
I see only MD changes in your patch.

> 
> 2017-06-23  Sebastian Peryt  <sebastian.pe...@intel.com>
> 
> gcc/
>   * config/i386/subst.md (mask_scalar, round_scalar, 
> round_saeonly_scalar): New templates.
I'd call it meta-templates.
>   (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
>   round_scalar_mask_operand3, round_scalar_mask_op3,
>   round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
>   round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
>   round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> subst attribute.
>   * config/i386/sse.md
>   (_vm3): Renamed to ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to 
> ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to ...
>   _vm3 ... 
> this.
>   (v\t{%2, %1, 
> %0|%0, %1, %2}): Changed 
> to ...
>   v\t{%2, 
> %1, %0|%0, %1, 
> %2} ... this.
>   (v\t{%2, %1, 
> %0|%0, %1, %2}): Changed 
> to ...
>   v\t{%2, 
> %1, %0|%0, %1, 
> %2} ... this.
>   (v\t{%2, %1, 
> %0|%0, %1, %2}): 
> Changed to ...
>   
> v\t{%2, %1, 
> %0|%0, %1, 
> %2} ... this.
We need to obey conventions. Pls break long lines here.

--
Thanks, K
> 
> Is it ok for trunk?
> 
> Thanks,
> Sebastian




[PATCH][x86] Add permutex[var]_epi[32,64] intrinsics

2017-06-28 Thread Peryt, Sebastian
Hi,

This patch adds missing intrinsics:
- _mm256_permutexvar_epi32
- _mm256_permutex_epi64
- _mm256_permutexvar_epi64

gcc/
* config/i386/avx512vlintrin.h (_mm256_permutexvar_epi64, 
_mm256_permutexvar_epi32,
_mm256_permutex_epi64): New intrinsics.

gcc/tesuite/
* gcc.target/i386/avx512vl-vpermd-1.c (_mm256_permutexvar_epi32): Test 
new intrinsic.
* gcc.target/i386/avx512vl-vpermq-imm-1.c (_mm256_permutex_epi64): 
Ditto.
* gcc.target/i386/avx512vl-vpermq-var-1.c (_mm256_permutexvar_epi64): 
Ditto.
*gcc.target/i386/avx512f-vpermd-2.c: Removed define length constraint.
* gcc.target/i386/avx512f-vpermq-imm-2.c: Ditto.
* gcc.target/i386/avx512f-vpermq-var-2.c: Ditto.

Is it ok for trunk?

Thanks,
Sebastian


permutex.patch
Description: permutex.patch


[PATHC][x86] Scalar mask and round RTL templates

2017-06-23 Thread Peryt, Sebastian
Hi,

This patch adds three extra RTL meta-templates for scalar round and mask. 
Additionally fixes errors caused by previous mask and round usage in some of 
the intrinsics that I found.

2017-06-23  Sebastian Peryt  

gcc/
* config/i386/subst.md (mask_scalar, round_scalar, 
round_saeonly_scalar): New templates.
(mask_scalar_name, mask_scalar_operand3, round_scalar_name,
round_scalar_mask_operand3, round_scalar_mask_op3,
round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
subst attribute.
* config/i386/sse.md
(_vm3): Renamed to ...
_vm3 
... this.
(_vm3): Renamed to 
...
_vm3 
... this.
(_vm3): Renamed to ...
_vm3 ... 
this.
(v\t{%2, %1, 
%0|%0, %1, %2}): Changed to 
...
v\t{%2, 
%1, %0|%0, %1, 
%2} ... this.
(v\t{%2, %1, 
%0|%0, %1, %2}): Changed to 
...
v\t{%2, 
%1, %0|%0, %1, 
%2} ... this.
(v\t{%2, %1, 
%0|%0, %1, %2}): 
Changed to ...

v\t{%2, %1, 
%0|%0, %1, 
%2} ... this.

Is it ok for trunk?

Thanks,
Sebastian


Scalar-templates.patch
Description: Scalar-templates.patch


[PATCH][x86] Add missing mask intrinsics for MAX[SD,SS] and MIN[SD,SS]

2017-05-30 Thread Peryt, Sebastian
Hi,

This patch adds missing intrinsics for MAX[SD,SS] and MIN[SD,SS] listed below:
- _mm_mask_max_sd,
- _mm_maskz_max_sd,
- _mm_mask_max_ss,
- _mm_maskz_max_ss,
 
- _mm_mask_min_sd,
- _mm_maskz_min_sd,
- _mm_mask_min_ss,
- _mm_maskz_min_ss.

gcc/
* config/i386/avx512fintrin.h (_mm_mask_max_sd,
_mm_maskz_max_sd, _mm_mask_max_ss, _mm_maskz_max_ss,
_mm_mask_min_sd, _mm_maskz_min_sd, _mm_mask_min_ss,
_mm_maskz_min_ss): New intrinsics.

gcc/testsuite/
* gcc.target/i386/avx512f-vmaxsd-1.c (_mm_mask_max_sd,
_mm_maskz_max_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vmaxsd-2.c (_mm_mask_max_sd,
_mm_maskz_max_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vmaxss-1.c (_mm_mask_max_ss,
_mm_maskz_max_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vmaxss-2.c (_mm_mask_max_ss,
_mm_maskz_max_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vminsd-1.c (_mm_mask_min_sd,
_mm_maskz_min_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vminsd-2.c (_mm_mask_min_sd,
_mm_maskz_min_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vminss-1.c (_mm_mask_min_ss,
_mm_maskz_min_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vminss-2.c (_mm_mask_min_ss,
_mm_maskz_min_ss): Test new intrinsics.

Is it ok for trunk?

Thanks,
Sebastian


MASK_MAX[SD,SS]_MIN[SD,SS].patch
Description: MASK_MAX[SD,SS]_MIN[SD,SS].patch


RE: [PATCH][x86]Fix for false-positives results of runtime tests on machines not supporting AVX512F

2017-05-30 Thread Peryt, Sebastian
Thank you very much for clarification. Yes, you are right, it would be better 
if such test would
be marked UNSUPPORTED.

Sebastian

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Tuesday, May 30, 2017 8:23 AM
To: Peryt, Sebastian <sebastian.pe...@intel.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][x86]Fix for false-positives results of runtime tests on 
machines not supporting AVX512F

On Tue, May 30, 2017 at 7:59 AM, Peryt, Sebastian <sebastian.pe...@intel.com> 
wrote:
> Hi,
>
> The attached patch fixes the issue of tests' false-positive results 
> generation on machines not supporting AVX512F feature. Currently when any 
> runtime test intended for AVX512F feature will be run on non-AVX512F machine 
> the best it can produce to inform of such a case is print SKIPPED, if debug 
> is enabled. But in any case the return value is 0, which is exactly the same 
> as if the test passed what might be misleading when looking at gcc.sum 
> summary values. With this patch such tests can be properly recognized during 
> make check as unexpected failures.
>
> gcc/testsuite/
> * gcc.target/i386/avx512f-check.h: Return value modified for skipped 
> test.
>
>
>
> Please let me know if such fix can be accepted.

No, this is by design. It is not a failure, if the target doesn't support 
requested runtime feature. The test shoudl be marked UNSUPPORTED in this case, 
but I don't think DejaGnu infrastructure allows that.

Uros.


[PATCH][x86]Fix for false-positives results of runtime tests on machines not supporting AVX512F

2017-05-30 Thread Peryt, Sebastian
Hi,

The attached patch fixes the issue of tests' false-positive results generation 
on machines not supporting AVX512F feature. Currently when any runtime test 
intended for AVX512F feature will be run on non-AVX512F machine the best it can 
produce to inform of such a case is print SKIPPED, if debug is enabled. But in 
any case the return value is 0, which is exactly the same as if the test passed 
what might be misleading when looking at gcc.sum summary values. With this 
patch such tests can be properly recognized during make check as unexpected 
failures.

gcc/testsuite/
* gcc.target/i386/avx512f-check.h: Return value modified for skipped 
test.



Please let me know if such fix can be accepted.

Thanks,
Sebastian


AVX512F_TESTS_VERIFICATION_PATCH.patch
Description: AVX512F_TESTS_VERIFICATION_PATCH.patch


RE: [PATCH] Match x86 family machine constraints section with constarints.md

2017-05-25 Thread Peryt, Sebastian
Hi,

Thank you very much for the answers. Can someone please commit this patch for 
me?

Thanks,
Sebastian

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Wednesday, May 24, 2017 3:31 PM
To: Sandra Loosemore <san...@codesourcery.com>
Cc: Peryt, Sebastian <sebastian.pe...@intel.com>; gcc-patches@gcc.gnu.org; 
Koval, Julia <julia.ko...@intel.com>; kirill.yuk...@gmail.com
Subject: Re: [PATCH] Match x86 family machine constraints section with 
constarints.md

On Tue, May 23, 2017 at 5:33 PM, Sandra Loosemore <san...@codesourcery.com> 
wrote:
> On 04/28/2017 03:30 AM, Peryt, Sebastian wrote:
>>
>> Hi,
>>
>> Thank you for your comments. I edited my patch accordingly. As for 
>> some of your doubts:
>> - REX is  the opcode prefix to access 64-bit register extensions 
>> introduced in IA-32e mode.
>> - EVEX is the encoding prefix which applies to SIMD operating 
>> instructions operating on XMM, YMM and ZMM registers. It was 
>> introduced with AVX-512 instructions.
>> - "number factor of four" that means that sources start in a multiple 
>> of 4 boundary. This is used for some of instructions.
>>
>> Also I'd like to add that this whole patch is strictly based on 
>> docstring parts of constraints that are present in 
>> config/i386/constraints.md but not in documentation (md.texi file). 
>> There is no new (new as in nonexistent in
>> code) content.
>>
>> I'm also adding Kirill Yukhin to CC, because I believe he is the 
>> correct person that can catch any technical errors if any has slipped-in.
>
>
> The grammar/markup/etc are OK now, but I can't comment on technical 
> correctness of the information.

LGTM.

Thanks,
Uros.


RE: [PATCH] Match x86 family machine constraints section with constarints.md

2017-05-23 Thread Peryt, Sebastian
Gentle ping.

Thanks,
Sebastian

-Original Message-
From: Peryt, Sebastian 
Sent: Friday, April 28, 2017 11:31 AM
To: Sandra Loosemore <san...@codesourcery.com>; gcc-patches@gcc.gnu.org
Cc: ubiz...@gmail.com; Koval, Julia <julia.ko...@intel.com>; 
kirill.yuk...@gmail.com
Subject: RE: [PATCH] Match x86 family machine constraints section with 
constarints.md

Hi,

Thank you for your comments. I edited my patch accordingly. As for some of your 
doubts:
- REX is  the opcode prefix to access 64-bit register extensions introduced in 
IA-32e mode.
- EVEX is the encoding prefix which applies to SIMD operating instructions 
operating on XMM, YMM and ZMM registers. It was introduced with AVX-512 
instructions.
- "number factor of four" that means that sources start in a multiple of 4 
boundary. This is used for some of instructions.

Also I'd like to add that this whole patch is strictly based on docstring parts 
of constraints that are present in config/i386/constraints.md but not in 
documentation (md.texi file). There is no new (new as in nonexistent in code) 
content.

I'm also adding Kirill Yukhin to CC, because I believe he is the correct person 
that can catch any technical errors if any has slipped-in.

Thanks,
Sebastian

-Original Message-
From: Sandra Loosemore [mailto:san...@codesourcery.com]
Sent: Thursday, April 27, 2017 10:17 PM
To: Peryt, Sebastian <sebastian.pe...@intel.com>; gcc-patches@gcc.gnu.org
Cc: ubiz...@gmail.com; Koval, Julia <julia.ko...@intel.com>
Subject: Re: [PATCH] Match x86 family machine constraints section with 
constarints.md

On 04/26/2017 08:29 AM, Peryt, Sebastian wrote:
> Hi,
>
> This patch updates x86 family machine constraints section in '16.8.5 
> Constraints for Particular Machines' section to match the ones in 
> 'config/i386/constraints.md'.
>
> gcc/
>   * doc/md.texi (Machine Constraints): Update x86 family machine 
> constraints
>  section to match 'config/i386/constraints.md'.
>
> Is it ok for trunk?

I have a few comments on grammar and markup, but I can't comment intelligently 
on whether the technical content is correct.

> @@ -4013,24 +4015,94 @@ Top of 80387 floating-point stack (@code{%st(0)}).
>  @item u
>  Second from top of 80387 floating-point stack (@code{%st(1)}).
>
> +@ifset INTERNALS
> +@item Yk
> +Any mask register that can be used as predicate, i.e. k1-k7.

s/predicate/a predicate/

Other places in this section use @code markup on literal register names.

> +
> +@item k
> +Any mask register.
> +@end ifset
> +
>  @item y
>  Any MMX register.
>
>  @item x
>  Any SSE register.
>
> +@item v
> +Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).
> +
> +@ifset INTERNALS
> +@item w
> +Any bound register.
> +@end ifset
> +
>  @item Yz
>  First SSE register (@code{%xmm0}).
>
>  @ifset INTERNALS
> -@item Y2
> -Any SSE register, when SSE2 is enabled.
> -
>  @item Yi
>  Any SSE register, when SSE2 and inter-unit moves are enabled.
>
> +@item Yj
> +Any SSE register, when SSE2 and inter-unit moves from vector registers are 
> enabled.
> +
>  @item Ym
>  Any MMX register, when inter-unit moves are enabled.
> +
> +@item Yn
> +Any MMX register, when inter-unit moves from vector registers are enabled.
> +
> +@item Yp
> +Any integer register when TARGET_PARTIAL_REG_STALL is disabled.

@code markup on that.

> +
> +@item Ya
> +Any integer register when zero extensions with AND are disabled.

I'm not sure what "AND" is, but it probably needs @code markup too.
> +
> +@item Yb
> +Any register that can be used as the GOT base when calling ___tls_get_addr:

@code{___tls_get_addr}

> +that is, any general register except @code{a} and @code{sp} 
> +registers, for -fno-plt if linker supports it. Otherwise, @code{b} register.

@option{-fno-plt}

> +
> +@item Yf
> +Any x87 register when 80387 FP arithmetic is enabled.

Is "FP" a literal feature name used in the processor documentation, or do you 
just mean "floating-point arithmetic" here?

> +
> +@item Yr
> +Lower SSE register when avoiding REX prefix and all SSE registers otherwise.

I don't know what "avoiding REX prefix" means, and don't see the string "REX" 
in any other GCC documentation.

> +
> +@item Yv
> +For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), 
> +otherwise any SSE register.

This should probably be "EVEX-encodable", whatever that means.

> +
> +@item Yh
> +Any EVEX encodable SSE register, which has number factor of four.

Same here, but what is "number factor of four"?  Also, if this is supposed to 
designate a subset of the EVEX-encodable SSE registers rather than describe all 
of them, you need "that" instead o

[PATCH][x86] Add missing intrinsics for MAX[SD,SS] and MIN[SD,SS]

2017-05-09 Thread Peryt, Sebastian
Hi,

This patch adds missing intrinsics for MAXSD, MAXSS, MINSD and MINSS 
instructions.

2017-05-09  Sebastian Peryt  

gcc/
* config/i386/avx512fintrin.h (_mm_mask_max_round_sd,
_mm_maskz_max_round_sd, _mm_mask_max_round_ss,
_mm_maskz_max_round_ss, _mm_mask_min_round_sd,
_mm_maskz_min_round_sd, _mm_mask_min_round_ss,
_mm_maskz_min_round_ss): New intrinsics.
* config/i386/i386-builtin-types.def (V2DF, V2DF, V2DF, V2DF, UQI, INT,
V4SF, V4SF, V4SF, V4SF, UQI, INT): New function type aliases.
* config/i386/i386-builtin.def (__builtin_ia32_maxsd_mask_round,
__builtin_ia32_maxss_mask_round, __builtin_ia32_minsd_mask_round,
__builtin_ia32_minss_mask_round): New builtins.
* config/i386/i386.c (V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT,
V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT): Handle new types.
* config/i386/sse.md (_vm3): 
Renamed to ...
(_vm3): ... this.
(v\t{%2, %1, 
%0|%0, %1, %2}): Changed to ...
(v\t{%2, %1, 
%0|%0, %1, %2}): 
... this.

gcc/testsuite/
* gcc.target/i386/avx512f-vmaxsd-1.c (_mm_mask_max_round_sd,
_mm_maskz_max_round_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vmaxsd-2.c: New.
* gcc.target/i386/avx512f-vmaxss-1.c (_mm_mask_max_round_ss,
_mm_maskz_max_round_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vmaxss-2.c: New.
* gcc.target/i386/avx512f-vminsd-1.c (_mm_mask_min_round_sd,
_mm_maskz_min_round_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vminsd-2.c: New.
* gcc.target/i386/avx512f-vminss-1.c (_mm_mask_min_round_ss,
_mm_maskz_min_round_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vminss-2.c: New.
* gcc.target/i386/avx-1.c (__builtin_ia32_maxsd_mask_round,
__builtin_ia32_maxss_mask_round, __builtin_ia32_minsd_mask_round,
__builtin_ia32_minss_mask_round): Test new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c (_mm_maskz_max_round_sd,
_mm_maskz_max_round_ss, _mm_maskz_min_round_sd,
_mm_maskz_min_round_ss, _mm_mask_max_round_sd,
_mm_mask_max_round_ss, _mm_mask_min_round_sd,
_mm_mask_min_round_ss): Test new intrinsics.
* gcc.target/i386/testround-1.c: Ditto.

Is it ok for trunk?

Thanks,
Sebastian


MAX[SD_SS]_MIN[SD_SS]_patch.patch
Description: MAX[SD_SS]_MIN[SD_SS]_patch.patch


[PATCH][x86] Add missing intrinsics for DIV[SD,SS] and MUL[SD,SS]

2017-05-09 Thread Peryt, Sebastian
Hi,

This patch adds missing intrinsics for DIVSD, DIVSS, MULSD and MULSS 
instructions.

2017-05-09  Sebastian Peryt  

gcc/
* config/i386/avx512fintrin.h (_mm_mask_mul_round_sd,
_mm_maskz_mul_round_sd, _mm_mask_mul_round_ss,
_mm_maskz_mul_round_ss, _mm_mask_div_round_sd,
_mm_maskz_div_round_sd, _mm_mask_div_round_ss,
_mm_maskz_div_round_ss, _mm_mask_mul_sd, _mm_maskz_mul_sd,
_mm_mask_mul_ss, _mm_maskz_mul_ss, _mm_mask_div_sd,
_mm_maskz_div_sd, _mm_mask_div_ss, _mm_maskz_div_ss): New intrinsics.
* config/i386/i386-builtin-types.def (V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT,
V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT): New function type aliases.
* config/i386/i386-builtin.def (__builtin_ia32_divsd_mask_round,
__builtin_ia32_divss_mask_round, __builtin_ia32_mulsd_mask_round,
__builtin_ia32_mulss_mask_round): New builtins.
* config/i386/i386.c (V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT,
V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT): Handle new types.
* config/i386/sse.md (_vm3): 
Renamed to ...
(_vm3): ... this.
(v\t{%2, %1, %0|%0, 
%1, %2}): Changed to ...
(v\t{%2, %1, 
%0|%0, %1, %2}): ... this.

gcc/testsuite/
* gcc.target/i386/avx512f-vdivsd-1.c (_mm_mask_div_sd,
_mm_maskz_div_sd, _mm_mask_div_round_sd,
_mm_maskz_div_round_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vdivsd-2.c: New.
* gcc.target/i386/avx512f-vdivss-1.c (_mm_mask_div_ss,
_mm_maskz_div_ss, _mm_mask_div_round_ss,
_mm_maskz_div_round_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vdivss-2.c: New.
* gcc.target/i386/avx512f-vmulsd-1.c (_mm_mask_mul_sd,
_mm_maskz_mul_sd, _mm_mask_mul_round_sd,
_mm_maskz_mul_round_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vmulsd-2.c: New.
* gcc.target/i386/avx512f-vmulss-1.c (_mm_mask_mul_ss,
_mm_maskz_mul_ss, _mm_mask_mul_round_ss,
_mm_maskz_mul_round_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vmulss-2.c: New.
* gcc.target/i386/avx-1.c (__builtin_ia32_divsd_mask_round,
__builtin_ia32_divss_mask_round, __builtin_ia32_mulsd_mask_round,
__builtin_ia32_mulss_mask_round): Test new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c (_mm_maskz_div_round_sd,
_mm_maskz_div_round_ss, _mm_maskz_mul_round_sd,
_mm_maskz_mul_round_ss): Test new intrinsics.
* gcc.target/i386/testround-1.c: Ditto.

Is it ok for trunk?

Sebastian


DIV[SD_SS]_MUL[SD_SS]_patch.patch
Description: DIV[SD_SS]_MUL[SD_SS]_patch.patch


RE: [PATCH][x86] Fix ADD[SD,SS] and SUB[SD,SS] runtime tests

2017-05-09 Thread Peryt, Sebastian
Hi,

Can you please commit it for me?

Thanks,
Sebastian

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Tuesday, May 9, 2017 10:40 AM
To: Peryt, Sebastian <sebastian.pe...@intel.com>
Cc: gcc-patches@gcc.gnu.org; kirill.yuk...@gmail.com
Subject: Re: [PATCH][x86] Fix ADD[SD,SS] and SUB[SD,SS] runtime tests

On Mon, May 8, 2017 at 9:53 AM, Peryt, Sebastian <sebastian.pe...@intel.com> 
wrote:
> Hi,
>
> This patch fixes errors in runtime tests for ADDSD, ADDSS, SUBSD and SUBSS 
> instructions.
>
> gcc/testsuite/
> * gcc.target/i386/avx512f-vaddsd-2.c: Test fixed.
> * gcc.target/i386/avx512f-vaddss-2.c: Ditto.
> * gcc.target/i386/avx512f-vsubsd-2.c: Ditto.
> * gcc.target/i386/avx512f-vsubss-2.c: Ditto.
>
> Is it ok for trunk?

OK.

Thanks,
Uros.


[PATCH][x86] Fix ADD[SD,SS] and SUB[SD,SS] runtime tests

2017-05-08 Thread Peryt, Sebastian
Hi,

This patch fixes errors in runtime tests for ADDSD, ADDSS, SUBSD and SUBSS 
instructions.

gcc/testsuite/
* gcc.target/i386/avx512f-vaddsd-2.c: Test fixed.
* gcc.target/i386/avx512f-vaddss-2.c: Ditto.
* gcc.target/i386/avx512f-vsubsd-2.c: Ditto.
* gcc.target/i386/avx512f-vsubss-2.c: Ditto.

Is it ok for trunk?

Thanks,
Sebastian


ADD[SD_SS]_SUB[SD_SS]_runtime_tests_fix.patch
Description: ADD[SD_SS]_SUB[SD_SS]_runtime_tests_fix.patch


RE: [PATCH][x86] Add missing intrinsics for ADD[SD,SS] and SUB[SD,SS]

2017-05-03 Thread Peryt, Sebastian
Thank you!

Sebastian

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Tuesday, May 2, 2017 3:08 PM
To: Peryt, Sebastian <sebastian.pe...@intel.com>
Cc: gcc-patches@gcc.gnu.org; kirill.yuk...@gmail.com
Subject: Re: [PATCH][x86] Add missing intrinsics for ADD[SD,SS] and SUB[SD,SS]

On Tue, May 2, 2017 at 11:39 AM, Peryt, Sebastian <sebastian.pe...@intel.com> 
wrote:
> Hi,
> Can you please commit it for me?

Done.

Uros.


RE: [PATCH][x86] Add missing intrinsics for ADD[SD,SS] and SUB[SD,SS]

2017-05-02 Thread Peryt, Sebastian
Hi,
Can you please commit it for me?

Thanks,
Sebastian

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Monday, May 1, 2017 11:28 AM
To: Peryt, Sebastian <sebastian.pe...@intel.com>
Cc: gcc-patches@gcc.gnu.org; kirill.yuk...@gmail.com
Subject: Re: [PATCH][x86] Add missing intrinsics for ADD[SD,SS] and SUB[SD,SS]

On Thu, Apr 27, 2017 at 10:22 AM, Peryt, Sebastian <sebastian.pe...@intel.com> 
wrote:
> Hi,
>
> This patch adds missing intrinsics for ADDSD, ADDSS, SUBSD and SUBSS 
> instructions.
>
> gcc/
> * config/i386/avx512fintrin.h (_mm_mask_add_round_sd,
> _mm_maskz_add_round_sd, _mm_mask_add_round_ss,
> _mm_maskz_add_round_ss, _mm_mask_sub_round_sd,
> _mm_maskz_sub_round_sd, _mm_mask_sub_round_ss,
> _mm_maskz_sub_round_ss, _mm_mask_add_sd,
> _mm_maskz_add_sd, _mm_mask_add_ss, _mm_maskz_add_ss,
> _mm_mask_sub_sd, _mm_maskz_sub_sd, _mm_mask_sub_ss,
> _mm_maskz_sub_ss): New intrinsics.
> * config/i386/i386-builtin-types.def 
> (V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT,
> V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT): New function type aliases.
> * config/i386/i386-builtin.def (__builtin_ia32_addsd_mask_round,
> __builtin_ia32_addss_mask_round, __builtin_ia32_subsd_mask_round,
> __builtin_ia32_subss_mask_round): New builtins.
> * config/i386/i386.c (V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT,
> V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT): Handle new types.
> * config/i386/sse.md (_vm3): 
> Renamed to  ...
> (_vm3): ... this.
> (v\t{%2, %1, 
> %0|%0, %1, %2}): Changed to ...
> (v\t{%2, %1, 
> %0|%0, %1, %2}): ... this.
>
> gcc/testsuite/
> * gcc.target/i386/avx512f-vaddsd-1.c (_mm_mask_add_sd,
> _mm_maskz_add_sd, _mm_mask_add_round_sd,
> _mm_maskz_add_round_sd): Test new intrinsics.
> * gcc.target/i386/avx512f-vaddsd-2.c: New.
> * gcc.target/i386/avx512f-vaddss-1.c (_mm_mask_add_ss,
> _mm_maskz_add_ss, _mm_mask_add_round_ss,
> _mm_maskz_add_round_ss): Test new intrinsics.
> * gcc.target/i386/avx512f-vaddss-2.c: New.
> * gcc.target/i386/avx512f-vsubsd-1.c (_mm_mask_sub_sd,
> _mm_maskz_sub_sd, _mm_mask_sub_round_sd,
> _mm_maskz_sub_round_sd): Test new intrinsics.
> * gcc.target/i386/avx512f-vsubsd-2.c: New.
> * gcc.target/i386/avx512f-vsubss-1.c (_mm_mask_sub_ss,
> _mm_maskz_sub_ss, _mm_mask_sub_round_ss,
> _mm_maskz_sub_round_ss): Test new intrinsics.
> * gcc.target/i386/avx512f-vsubss-2.c: New.
> * gcc.target/i386/avx-1.c (__builtin_ia32_addsd_mask_round,
> __builtin_ia32_addss_mask_round, __builtin_ia32_subsd_mask_round,
> __builtin_ia32_subss_mask_round): Test new builtins.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
> * gcc.target/i386/sse-14.c (_mm_maskz_add_round_sd,
> _mm_maskz_add_round_ss, _mm_maskz_sub_round_sd,
> _mm_maskz_sub_round_ss, _mm_mask_add_round_sd,
> _mm_mask_add_round_ss, _mm_mask_sub_round_sd,
> _mm_mask_sub_round_ss): Test new intrinsics.
> * gcc.target/i386/testround-1.c: Ditto.
>
> Is it ok for trunk?

OK.

Thanks,
Uros.


RE: [PATCH] Match x86 family machine constraints section with constarints.md

2017-04-28 Thread Peryt, Sebastian
Hi,

Thank you for your comments. I edited my patch accordingly. As for some of your 
doubts:
- REX is  the opcode prefix to access 64-bit register extensions introduced in 
IA-32e mode.
- EVEX is the encoding prefix which applies to SIMD operating instructions 
operating on XMM, YMM and ZMM registers. It was introduced with AVX-512 
instructions.
- "number factor of four" that means that sources start in a multiple of 4 
boundary. This is used for some of instructions.

Also I'd like to add that this whole patch is strictly based on docstring parts 
of constraints that are present in config/i386/constraints.md but not in 
documentation (md.texi file). There is no new (new as in nonexistent in code) 
content.

I'm also adding Kirill Yukhin to CC, because I believe he is the correct person 
that can catch any technical errors if any has slipped-in.

Thanks,
Sebastian

-Original Message-
From: Sandra Loosemore [mailto:san...@codesourcery.com] 
Sent: Thursday, April 27, 2017 10:17 PM
To: Peryt, Sebastian <sebastian.pe...@intel.com>; gcc-patches@gcc.gnu.org
Cc: ubiz...@gmail.com; Koval, Julia <julia.ko...@intel.com>
Subject: Re: [PATCH] Match x86 family machine constraints section with 
constarints.md

On 04/26/2017 08:29 AM, Peryt, Sebastian wrote:
> Hi,
>
> This patch updates x86 family machine constraints section in '16.8.5 
> Constraints for Particular Machines' section to match the ones in 
> 'config/i386/constraints.md'.
>
> gcc/
>   * doc/md.texi (Machine Constraints): Update x86 family machine 
> constraints
>  section to match 'config/i386/constraints.md'.
>
> Is it ok for trunk?

I have a few comments on grammar and markup, but I can't comment intelligently 
on whether the technical content is correct.

> @@ -4013,24 +4015,94 @@ Top of 80387 floating-point stack (@code{%st(0)}).
>  @item u
>  Second from top of 80387 floating-point stack (@code{%st(1)}).
>
> +@ifset INTERNALS
> +@item Yk
> +Any mask register that can be used as predicate, i.e. k1-k7.

s/predicate/a predicate/

Other places in this section use @code markup on literal register names.

> +
> +@item k
> +Any mask register.
> +@end ifset
> +
>  @item y
>  Any MMX register.
>
>  @item x
>  Any SSE register.
>
> +@item v
> +Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).
> +
> +@ifset INTERNALS
> +@item w
> +Any bound register.
> +@end ifset
> +
>  @item Yz
>  First SSE register (@code{%xmm0}).
>
>  @ifset INTERNALS
> -@item Y2
> -Any SSE register, when SSE2 is enabled.
> -
>  @item Yi
>  Any SSE register, when SSE2 and inter-unit moves are enabled.
>
> +@item Yj
> +Any SSE register, when SSE2 and inter-unit moves from vector registers are 
> enabled.
> +
>  @item Ym
>  Any MMX register, when inter-unit moves are enabled.
> +
> +@item Yn
> +Any MMX register, when inter-unit moves from vector registers are enabled.
> +
> +@item Yp
> +Any integer register when TARGET_PARTIAL_REG_STALL is disabled.

@code markup on that.

> +
> +@item Ya
> +Any integer register when zero extensions with AND are disabled.

I'm not sure what "AND" is, but it probably needs @code markup too.
> +
> +@item Yb
> +Any register that can be used as the GOT base when calling ___tls_get_addr:

@code{___tls_get_addr}

> +that is, any general register except @code{a} and @code{sp} 
> +registers, for -fno-plt if linker supports it. Otherwise, @code{b} register.

@option{-fno-plt}

> +
> +@item Yf
> +Any x87 register when 80387 FP arithmetic is enabled.

Is "FP" a literal feature name used in the processor documentation, or do you 
just mean "floating-point arithmetic" here?

> +
> +@item Yr
> +Lower SSE register when avoiding REX prefix and all SSE registers otherwise.

I don't know what "avoiding REX prefix" means, and don't see the string "REX" 
in any other GCC documentation.

> +
> +@item Yv
> +For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), 
> +otherwise any SSE register.

This should probably be "EVEX-encodable", whatever that means.

> +
> +@item Yh
> +Any EVEX encodable SSE register, which has number factor of four.

Same here, but what is "number factor of four"?  Also, if this is supposed to 
designate a subset of the EVEX-encodable SSE registers rather than describe all 
of them, you need "that" instead of "which".

> +
> +@item Bf
> +Flags register operand.
> +
> +@item Bg
> +GOT memory operand.
> +
> +@item Bm
> +Vector memory operand.
> +
> +@item Bc
> +Constant memory operand.
> +
> +@item Bn
> +Memory operand without REX prefix.
> +
> +@item Bs
> +Sibcall memory operand.
> +
> +@item Bw
> +Call mem

[PATCH][x86] Add missing intrinsics for ADD[SD,SS] and SUB[SD,SS]

2017-04-27 Thread Peryt, Sebastian
Hi,

This patch adds missing intrinsics for ADDSD, ADDSS, SUBSD and SUBSS 
instructions.

gcc/
* config/i386/avx512fintrin.h (_mm_mask_add_round_sd, 
_mm_maskz_add_round_sd, _mm_mask_add_round_ss,
_mm_maskz_add_round_ss, _mm_mask_sub_round_sd,
_mm_maskz_sub_round_sd, _mm_mask_sub_round_ss,
_mm_maskz_sub_round_ss, _mm_mask_add_sd,
_mm_maskz_add_sd, _mm_mask_add_ss, _mm_maskz_add_ss,
_mm_mask_sub_sd, _mm_maskz_sub_sd, _mm_mask_sub_ss,
_mm_maskz_sub_ss): New intrinsics.
* config/i386/i386-builtin-types.def (V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT,
V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT): New function type aliases.
* config/i386/i386-builtin.def (__builtin_ia32_addsd_mask_round,
__builtin_ia32_addss_mask_round, __builtin_ia32_subsd_mask_round,
__builtin_ia32_subss_mask_round): New builtins.
* config/i386/i386.c (V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT,
V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT): Handle new types.
* config/i386/sse.md (_vm3): 
Renamed to  ...
(_vm3): ... this.
(v\t{%2, %1, %0|%0, 
%1, %2}): Changed to ...
(v\t{%2, %1, 
%0|%0, %1, %2}): ... this.

gcc/testsuite/
* gcc.target/i386/avx512f-vaddsd-1.c (_mm_mask_add_sd,
_mm_maskz_add_sd, _mm_mask_add_round_sd,
_mm_maskz_add_round_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vaddsd-2.c: New.
* gcc.target/i386/avx512f-vaddss-1.c (_mm_mask_add_ss,
_mm_maskz_add_ss, _mm_mask_add_round_ss,
_mm_maskz_add_round_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vaddss-2.c: New.
* gcc.target/i386/avx512f-vsubsd-1.c (_mm_mask_sub_sd,
_mm_maskz_sub_sd, _mm_mask_sub_round_sd,
_mm_maskz_sub_round_sd): Test new intrinsics.
* gcc.target/i386/avx512f-vsubsd-2.c: New.
* gcc.target/i386/avx512f-vsubss-1.c (_mm_mask_sub_ss,
_mm_maskz_sub_ss, _mm_mask_sub_round_ss,
_mm_maskz_sub_round_ss): Test new intrinsics.
* gcc.target/i386/avx512f-vsubss-2.c: New.
* gcc.target/i386/avx-1.c (__builtin_ia32_addsd_mask_round,
__builtin_ia32_addss_mask_round, __builtin_ia32_subsd_mask_round,
__builtin_ia32_subss_mask_round): Test new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c (_mm_maskz_add_round_sd,
_mm_maskz_add_round_ss, _mm_maskz_sub_round_sd,
_mm_maskz_sub_round_ss, _mm_mask_add_round_sd,
_mm_mask_add_round_ss, _mm_mask_sub_round_sd,
_mm_mask_sub_round_ss): Test new intrinsics.
* gcc.target/i386/testround-1.c: Ditto.

Is it ok for trunk?

Sebastian


ADD[SD_SS]_SUB[SD_SS]_patch.patch
Description: ADD[SD_SS]_SUB[SD_SS]_patch.patch


[PATCH] Match x86 family machine constraints section with constarints.md

2017-04-26 Thread Peryt, Sebastian
Hi,

This patch updates x86 family machine constraints section in '16.8.5 
Constraints for Particular Machines' section to match the ones in 
'config/i386/constraints.md'.

gcc/
* doc/md.texi (Machine Constraints): Update x86 family machine 
constraints
   section to match 'config/i386/constraints.md'.

Is it ok for trunk?

Sebastian


x86_constraints_doc.patch
Description: x86_constraints_doc.patch