[Bug target/95766] Failure to directly use vpbroadcastd for _mm_set1_epi32 when passing unsigned short

2020-08-03 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95766

--- Comment #11 from Kirill Yukhin  ---
(In reply to Jakub Jelinek from comment #10)
> Kirill, any thoughts on that?

I'd prefer your variant, w/o unspecs.

[Bug target/95144] Many AVX-512 functions take an int instead of unsigned int

2020-06-16 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95144

Kirill Yukhin  changed:

   What|Removed |Added

   Last reconfirmed||2020-06-16
 CC||kyukhin at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #2 from Kirill Yukhin  ---
Similar bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65744

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2020-06-16 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 68633, which changed state.

Bug 68633 Summary: [i386, AVX-512] Spec2006/434.zeus miscompares when executed 
on KNL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68633

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug other/84613] [meta-bug] SPEC compiler performance issues

2020-06-16 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84613
Bug 84613 depends on bug 68633, which changed state.

Bug 68633 Summary: [i386, AVX-512] Spec2006/434.zeus miscompares when executed 
on KNL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68633

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/68633] [i386, AVX-512] Spec2006/434.zeus miscompares when executed on KNL

2020-06-16 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68633

Kirill Yukhin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Kirill Yukhin  ---
Fixed.

[Bug other/84613] [meta-bug] SPEC compiler performance issues

2020-06-16 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84613
Bug 84613 depends on bug 68627, which changed state.

Bug 68627 Summary: [i386, AVX-512] Illegal insn generated while compiling 
spec2k6/437.leslie3d for KNL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68627

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/68627] [i386, AVX-512] Illegal insn generated while compiling spec2k6/437.leslie3d for KNL

2020-06-16 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68627

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Kirill Yukhin  ---
Fixed.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2020-06-16 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 68627, which changed state.

Bug 68627 Summary: [i386, AVX-512] Illegal insn generated while compiling 
spec2k6/437.leslie3d for KNL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68627

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/83828] FAIL: gcc.target/i386/avx512vpopcntdqvl-vpopcntq-1.c execution test

2018-02-11 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83828

--- Comment #12 from Kirill Yukhin  ---
Author: kyukhin
Date: Mon Feb 12 06:14:15 2018
New Revision: 257579

URL: https://gcc.gnu.org/viewcvs?rev=257579=gcc=rev
Log:
Fix AVX-512 popcnt and bitalg tests.

gcc/testsuite/
PR target/83828
* gcc.target/i386/avx512bitalg-vpopcntb-1.c: Fix test.
* gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto.
* gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c: Ditto.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c

[Bug target/83828] FAIL: gcc.target/i386/avx512vpopcntdqvl-vpopcntq-1.c execution test

2018-02-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83828

--- Comment #10 from Kirill Yukhin  ---
HJ, I cannot reproduce this fail on recent SDE.

Here's what I have in gcc.log:

spawn -ignore SIGHUP /export/kyukhin/gcc/bld-svn/build-x86_64-linux/gcc/xgcc
-B/export/kyukhin/gcc/bld-svn/build-x86_64-linux/gcc/
/export/kyukhin/gcc/svn/trunk/gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopc\
ntb-1.c -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -mavx512vl
-mavx512bitalg -mavx512bw -lm -o ./avx512bitalgvl-vpopcntb-1.exe^M
PASS: gcc.target/i386/avx512bitalgvl-vpopcntb-1.c (test for excess errors)
Setting LD_LIBRARY_PATH to
:/export/kyukhin/gcc/bld-svn/build-x86_64-linux/gcc:/export/kyukhin/gcc/bld-svn/build-x86_64-linux/gcc/32:
spawn /home/kyukhin/bin/dejagnu/sde-sim ./avx512bitalgvl-vpopcntb-1.exe^M
PASS: gcc.target/i386/avx512bitalgvl-vpopcntb-1.c execution test

I've also verified manually that test PASS, not SKIPPED.

Could you pls send some more info on failure?

[Bug target/83828] FAIL: gcc.target/i386/avx512vpopcntdqvl-vpopcntq-1.c execution test

2018-01-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83828

--- Comment #8 from Kirill Yukhin  ---
Author: kyukhin
Date: Tue Jan 30 08:21:22 2018
New Revision: 257173

URL: https://gcc.gnu.org/viewcvs?rev=257173=gcc=rev
Log:
Fix AVX-512BITALG test failures

gcc/testsuite
PR target/83828
* gcc.target/i386/avx512bitalg-vpopcntb-1.c: Fix test.
* gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto.
* gcc.target/i386/avx512bitalgvl-vpopcntb-1.c: Ditto.
* gcc.target/i386/avx512bitalgvl-vpopcntw-1.c: Ditto.


Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntb-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntw-1.c

[Bug target/83828] FAIL: gcc.target/i386/avx512vpopcntdqvl-vpopcntq-1.c execution test

2018-01-29 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83828

--- Comment #7 from Kirill Yukhin  ---
On the other hand, if masked variant of vpopcnt[w,q] is being issued: there's
no way for reload to put 32/64 bit mask into mask register, since kmov[d,q] are
only available  under -mavx512bw switch.

We can insist user to issue -mavx512bw along w/ -mavx512bitalg if she is going
to use masked variants of corresponding intrinsics. Then only tests need to be
fixed.

[Bug target/83828] FAIL: gcc.target/i386/avx512vpopcntdqvl-vpopcntq-1.c execution test

2018-01-29 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83828

Kirill Yukhin  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||kyukhin at gcc dot gnu.org

--- Comment #6 from Kirill Yukhin  ---
Looks like avx512bw demand is excessive in avx512bitalgintrin.h

[Bug target/82983] [8 Regression] ICE in extract_insn, at recog.c:2305 w/ GFMI

2017-11-15 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82983

--- Comment #1 from Kirill Yukhin  ---
Author: kyukhin
Date: Thu Nov 16 06:14:54 2017
New Revision: 254797

URL: https://gcc.gnu.org/viewcvs?rev=254797=gcc=rev
Log:
Fix GFNI check which didn't work properly in gfni+sse case

gcc/
PR target/82983
* config/i386/gfniintrin.h: Add sse check.
* config/i386/i386.c (ix86_expand_builtin): Fix gfni check.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/gfniintrin.h
trunk/gcc/config/i386/i386.c

[Bug target/82812] ICE in emit_move_insn, at expr.c:3706

2017-11-07 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82812

--- Comment #3 from Kirill Yukhin  ---
Author: kyukhin
Date: Tue Nov  7 19:11:08 2017
New Revision: 254507

URL: https://gcc.gnu.org/viewcvs?rev=254507=gcc=rev
Log:
Fix SSE bits dependencies.

gcc/
PR target/82812
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET): Remove MPX from flag.
(ix86_handle_option): Move MPX to isa_flags2 and GFNI to isa_flags.
* config/i386/i386-c.c (ix86_target_macros_internal): Ditto.
* config/i386/i386.opt: Ditto.
* config/i386/i386.c (ix86_target_string): Ditto.
(ix86_option_override_internal): Ditto.
(ix86_init_mpx_builtins): Move MPX to args2.
(ix86_expand_builtin): Special handling for OPTION_MASK_ISA_GFNI.
* config/i386/i386-builtin.def (__builtin_ia32_vgf2p8affineinvqb_v64qi,
__builtin_ia32_vgf2p8affineinvqb_v64qi_mask,
__builtin_ia32_vgf2p8affineinvqb_v32qi,
__builtin_ia32_vgf2p8affineinvqb_v32qi_mask,
__builtin_ia32_vgf2p8affineinvqb_v16qi,
__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): Move to ARGS array.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/common/config/i386/i386-common.c
trunk/gcc/config/i386/i386-builtin.def
trunk/gcc/config/i386/i386-c.c
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.opt

[Bug tree-optimization/80133] [bootstrap] ICE during build on PPC64-linux.

2017-07-10 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80133

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Kirill Yukhin  ---
I was trying to build GCC w/ some really old host compiler.
After I upgraded host GCC to 4.6 - issue was resolved.

[Bug testsuite/81058] FAIL: gcc.target/i386/avx512bw-vpmovu?swb-1.c scan-assembler-times vpmovu?swb.*

2017-06-29 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81058

--- Comment #4 from Kirill Yukhin  ---
Confirmed.

[Bug target/81022] invalid address with pointer type casting

2017-06-09 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81022

--- Comment #2 from Kirill Yukhin  ---
Intrinsics guide states [1] that this intrinsic:
Store the lower double-precision (64-bit) floating-point element from a into
memory. mem_addr does not need to be aligned on any particular boundary.

[1] -
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_store_sd=5157

[Bug target/73350] AVX512: GCC optimizes away rounding flags

2017-06-08 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350

--- Comment #8 from Kirill Yukhin  ---
Author: kyukhin
Date: Thu Jun  8 11:24:50 2017
New Revision: 249009

URL: https://gcc.gnu.org/viewcvs?rev=249009=gcc=rev
Log:
[PR73350][PR80862] Improve subst for RC-capable insns.

PR target/73350,80862
gcc/
* config/i386/subst.md (round): Fix round pattern.
* config/i386/i386.c (ix86_erase_embedded_rounding):
Fix erasing rounding for the fixed pattern.

gcc/testsuite/
* gcc.target/i386/pr73350.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr73350.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/subst.md
trunk/gcc/testsuite/ChangeLog

[Bug bootstrap/80133] [bootstrap] ICE during build on PPC64-linux.

2017-03-23 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80133

--- Comment #2 from Kirill Yukhin  ---
Caused by r241649.

[Bug bootstrap/80133] [bootstrap] ICE during build on PPC64-linux.

2017-03-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80133

--- Comment #1 from Kirill Yukhin  ---
I am not familiar with Power, may be this can help:
[kyukhin@localhost build2]$ lscpu
Architecture:  ppc64
Byte Order:Big Endian
CPU(s):8
On-line CPU(s) list:   0-7
Thread(s) per core:2
Core(s) per socket:1
Socket(s): 4
NUMA node(s):  2
Model: IBM,9117-MMA
L1d cache: 64K
L1i cache: 64K
L2 cache:  4096K
L3 cache:  32768K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7

[Bug bootstrap/80133] New: [bootstrap] ICE during build on PPC64-linux.

2017-03-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80133

Bug ID: 80133
   Summary: [bootstrap] ICE during build on PPC64-linux.
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

I see on recent trunk:
[kyukhin@localhost build2]$ cd powerpc64-unknown-linux-gnu/libgcc/
[kyukhin@localhost libgcc]$ make
# If this is the top-level multilib, build all the other
# multilibs.
/export/kyukhin/gcc/build2/./gcc/xgcc -B/export/kyukhin/gcc/build2/./gcc/
-B/usr/local/powerpc64-unknown-linux-gnu/bin/
-B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem
/usr/local/powerpc64-unknown-linux-gnu/include -isystem
/usr/local/powerpc64-unknown-linux-gnu/s\
ys-include-g -O2 -O2  -g -O2 -DIN_GCC-W -Wall -Wwrite-strings
-Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes
-Wold-style-definition  -isystem ./include   -fPIC -mlong-double-128
-mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-prote\
ctor   -fPIC -mlong-double-128 -mno-minimal-toc -I. -I. -I../.././gcc
-I/export/kyukhin/gcc/git/gcc/libgcc -I/export/kyukhin/gcc/git/gcc/libgcc/.
-I/export/kyukhin/gcc/git/gcc/libgcc/../gcc
-I/export/kyukhin/gcc/git/gcc/libgcc/../include
-I/export/kyukhin/gcc/git/gcc/lib\
gcc/../libdecnumber/dpd -I/export/kyukhin/gcc/git/gcc/libgcc/../libdecnumber
-DHAVE_CC_TLS  -o _muldi3.o -MT _muldi3.o -MD -MP -MF _muldi3.dep -DL_muldi3 -c
/export/kyukhin/gcc/git/gcc/libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS
In file included from /export/kyukhin/gcc/git/gcc/libgcc/libgcc2.c:56:0:
/export/kyukhin/gcc/git/gcc/libgcc/libgcc2.c: In function ?__multi3?:
/export/kyukhin/gcc/git/gcc/libgcc/libgcc2.h:203:20: internal compiler error:
Segmentation fault
 #define __NDW(a,b) __ ## a ## ti ## b
^
/export/kyukhin/gcc/git/gcc/libgcc/libgcc2.h:273:18: note: in expansion of
macro ?__NDW?
 #define __muldi3 __NDW(mul,3)
  ^
/export/kyukhin/gcc/git/gcc/libgcc/libgcc2.c:547:1: note: in expansion of macro
?__muldi3?
 __muldi3 (DWtype u, DWtype v)
 ^~~~
0x10e3e383 crash_signal
/export/kyukhin/gcc/git/gcc/gcc/toplev.c:337
0x11366fb0 inchash::add_expr(tree_node const*, inchash::hash&, unsigned int)
/export/kyukhin/gcc/git/gcc/gcc/tree.c:
0x10e8fd93 iterative_hash_expr
/export/kyukhin/gcc/git/gcc/gcc/tree.h:4794
0x10e94767 tree_operand_hash::hash(tree_node* const&)
/export/kyukhin/gcc/git/gcc/gcc/tree-hash-traits.h:34
0x11a2f30b hash
/export/kyukhin/gcc/git/gcc/gcc/hash-map-traits.h:48
0x11a2e9f3 get
/export/kyukhin/gcc/git/gcc/gcc/hash-map.h:150
0x11a2d1ff execute
/export/kyukhin/gcc/git/gcc/gcc/gimple-ssa-store-merging.c:1456
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make: *** [_muldi3.o] Error 1

Configured on RHEL6.5:
$ /export/kyukhin/gcc/git/gcc/configure --with-mpfr=/home/kyukhin/bin
--with-gmp=/home/kyukhin/bin --with-mpc=/home/kyukhin/bin
--enable-languages=c,c++,fortran,lto --disable-multilib

I can build gcc-6-branch with this config.

[Bug target/76731] [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature

2017-01-12 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731

--- Comment #10 from Kirill Yukhin  ---
(In reply to Andrew Senkevich from comment #8)
> I think we should follow here declarations from icc headers to be compatible
> with it.
Okay. Could you pls state which rules ICC follows for all gather/scatter
intrinsics?
Could we use void const * for base in all gather intrinsics?
What about scatters?

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-07-01 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729

--- Comment #28 from Kirill Yukhin  ---
Author: kyukhin
Date: Fri Jul  1 09:42:01 2016
New Revision: 237907

URL: https://gcc.gnu.org/viewcvs?rev=237907=gcc=rev
Log:
PR tree-optimization/70729

gcc/
* tree-vectorizer.c (adjust_simduid_builtins): Nullify safelen field
of loop since it can be not valid after transformation.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vectorizer.c

[Bug target/71346] [AVX-512] AVX-512VL insn emitted when it is disabled

2016-05-31 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71346

--- Comment #3 from Kirill Yukhin  ---
Author: kyukhin
Date: Tue May 31 08:05:24 2016
New Revision: 236909

URL: https://gcc.gnu.org/viewcvs?rev=236909=gcc=rev
Log:
AVX-512. Limit constraint for scalar operand in split to AVX-512VL.

PR target/71346
gcc/
* config/i386/sse.md (define_insn_and_split "*vec_extractv4sf_0"): Use
`Yv' for scalar operand.
testsuite/
* gcc.target/i386/pr71346.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr71346.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog

[Bug target/71346] [AVX-512] AVX-512VL insn emitted when it is disabled

2016-05-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71346

--- Comment #2 from Kirill Yukhin  ---
Looks like issue is in split.
This one-liner solves the issue:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b348f2d..1267897 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6837,7 +6837,7 @@
   "operands[1] = gen_lowpart (SFmode, operands[1]);")

 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv")
(vec_select:SF
  (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
  (parallel [(match_operand:SI 2 "const_0_to_3_operand"
"n,n,n,n,n")])))]

[Bug target/71346] [AVX-512] AVX-512VL insn emitted when it is disabled

2016-05-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71346

--- Comment #1 from Kirill Yukhin  ---
Created attachment 38598
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38598=edit
Reproducer

[Bug target/71346] [AVX-512] AVX-512VL insn emitted when it is disabled

2016-05-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71346

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-05-30
   Assignee|unassigned at gcc dot gnu.org  |kyukhin at gcc dot 
gnu.org
 Ever confirmed|0   |1

[Bug target/71346] New: [AVX-512] AVX-512VL insn emitted when it is disabled

2016-05-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71346

Bug ID: 71346
   Summary: [AVX-512] AVX-512VL insn emitted when it is disabled
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

Testcase attached.
Started from Jakub's r235968.

Reproduce:
./cc1  1.c -dp -m64 -march=knl -Ofast   -quiet -o repro.s 2>/dev/null ; cat
repro.s |grep shufps |grep xmm17

Those insns belong to AVX-512VL:
vshufps$255, %xmm15, %xmm15, %xmm17# 680  
sse_shufps_v4sf/2   [length = 7]
vshufps$85, %xmm12, %xmm12, %xmm17 # 682  
sse_shufps_v4sf/2   [length = 7]

[Bug target/70981] [7 regression] gcc.target/i386/avx512f-vprord-1.c FAILs

2016-05-17 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70981

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-05-17
 CC||kyukhin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Kirill Yukhin  ---
Confirmed.

[Bug target/70902] [7 Regression] GCC freezes while compiling for 'skylake-avx512' target

2016-05-04 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70902

Kirill Yukhin  changed:

   What|Removed |Added

 CC||kyukhin at gcc dot gnu.org,
   ||ubizjak at gmail dot com

--- Comment #2 from Kirill Yukhin  ---
Started from Uros's r235523.

[Bug target/70728] GCC trunk emits invalid assembly for knl target

2016-04-27 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70728

Kirill Yukhin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Kirill Yukhin  ---
Done

[Bug target/70728] GCC trunk emits invalid assembly for knl target

2016-04-27 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70728

--- Comment #4 from Kirill Yukhin  ---
Author: kyukhin
Date: Wed Apr 27 12:09:45 2016
New Revision: 235487

URL: https://gcc.gnu.org/viewcvs?rev=235487=gcc=rev
Log:
AVX-512. PR target/70728. Use separate constraint for AVX-512BW

PR target/70728
gcc/
* gcc/config/i386/sse.md (define_insn
"3"):
Extract AVX-512BW constraint from AVX.
gcc/testsuite/
* gcc.target/i386/pr70728.c: New test.


Added:
branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr70728.c
Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/config/i386/sse.md
branches/gcc-6-branch/gcc/testsuite/ChangeLog

[Bug tree-optimization/68030] Redundant address calculations in vectorized loop

2016-04-25 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68030

Kirill Yukhin  changed:

   What|Removed |Added

 CC||amker.cheng at gmail dot com

--- Comment #4 from Kirill Yukhin  ---
Hello Bin,
Is it possible to handle the issue using current ivopt?

[Bug target/70728] GCC trunk emits invalid assembly for knl target

2016-04-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70728

--- Comment #3 from Kirill Yukhin  ---
Author: kyukhin
Date: Thu Apr 21 15:29:29 2016
New Revision: 235344

URL: https://gcc.gnu.org/viewcvs?rev=235344=gcc=rev
Log:
AVX-512. PR target/70728. Use separate constraint for AVX-512BW


PR target/70728
gcc/
* gcc/config/i386/sse.md (define_insn
"3"):
Extract AVX-512BW constraint from AVX.
gcc/testsuite/
* gcc.target/i386/pr70728.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70728.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog

[Bug target/70728] GCC trunk emits invalid assembly for knl target

2016-04-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70728

--- Comment #2 from Kirill Yukhin  ---
This is a 5/6 regression

[Bug target/70728] GCC trunk emits invalid assembly for knl target

2016-04-19 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70728

Kirill Yukhin  changed:

   What|Removed |Added

 Target||i?86/x86_64
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-04-19
 CC||kyukhin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Kirill Yukhin  ---
I'll take a look.

[Bug target/70662] vpbroadcastq assemble failure with -masm=intel -mavx512vbmi

2016-04-19 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70662

Kirill Yukhin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Kirill Yukhin  ---
Done

[Bug target/70662] vpbroadcastq assemble failure with -masm=intel -mavx512vbmi

2016-04-15 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70662

--- Comment #6 from Kirill Yukhin  ---
Author: kyukhin
Date: Fri Apr 15 15:17:31 2016
New Revision: 235038

URL: https://gcc.gnu.org/viewcvs?rev=235038=gcc=rev
Log:
AVX-512. Fix mode size check.

PR target/70662
gcc/   
   * config/i386/sse.md(define_insn "_vec_dup"):
Fix mode size check.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sse.md

[Bug target/70662] vpbroadcastq assemble failure with -masm=intel -mavx512vbmi

2016-04-15 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70662

--- Comment #5 from Kirill Yukhin  ---
Author: kyukhin
Date: Fri Apr 15 15:13:42 2016
New Revision: 235037

URL: https://gcc.gnu.org/viewcvs?rev=235037=gcc=rev
Log:
AVX-512, Fix mode size check.

PR target/70662
gcc/
* config/i386/sse.md(define_insn "_vec_dup"):
Fix mode size check.

Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/i386/sse.md

[Bug target/70662] vpbroadcastq assemble failure with -masm=intel -mavx512vbmi

2016-04-15 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70662

--- Comment #3 from Kirill Yukhin  ---
Author: kyukhin
Date: Fri Apr 15 09:36:31 2016
New Revision: 235013

URL: https://gcc.gnu.org/viewcvs?rev=235013=gcc=rev
Log:
AVX-512. Use proper mem ops modifier for Intel syntax in broadcast patter.

PR target/70662
gcc/
* config/i386/sse.md: Use proper memory operand
modifiers.
gcc/testsuite.
* gcc.target/i386/pr70662.c: New test.

Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr70662.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/i386/sse.md
branches/gcc-5-branch/gcc/testsuite/ChangeLog

[Bug target/70662] vpbroadcastq assemble failure with -masm=intel -mavx512vbmi

2016-04-15 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70662

--- Comment #2 from Kirill Yukhin  ---
Author: kyukhin
Date: Fri Apr 15 08:25:49 2016
New Revision: 235008

URL: https://gcc.gnu.org/viewcvs?rev=235008=gcc=rev
Log:
AVX-512. Fix mem operand modifier for Intel syntax.

PR target/70662
gcc/
* config/i386/sse.md: Use proper memory operand
modifiers.
testsuite/gcc/
* gcc.target/i386/pr70662.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70662.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog

[Bug target/70662] vpbroadcastq assemble failure with -masm=intel -mavx512vbmi

2016-04-14 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70662

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-04-14
   Assignee|unassigned at gcc dot gnu.org  |kyukhin at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Kirill Yukhin  ---
I'll take a look.

[Bug tree-optimization/70577] [6 regression] tree-ssa/prefetch-5.c scan-tree-dump-times aprefetch failures

2016-04-11 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70577

Kirill Yukhin  changed:

   What|Removed |Added

 CC||kyukhin at gcc dot gnu.org

--- Comment #8 from Kirill Yukhin  ---
This commit caused miscompare of spec2000/178.galgel on -march=skylake-avx512
(-Ofast -flto -funroll-loops):
   Newton iteration #  0Maximal derivative = 0.1526E-07
   Newton iteration #  0Maximal derivative = 0.3901E-07

[Bug target/59683] ICE: in classify_argument, at config/i386/i386.c:6637 with #pragma GCC target("avx512f")

2016-04-07 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59683

--- Comment #3 from Kirill Yukhin  ---
This hunk from Jakub's fix for PR61925 makes test working:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a41efa4..6aebaed 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4962,6 +4962,15 @@ static GTY(()) tree ix86_previous_fndecl;
 void
 ix86_reset_previous_fndecl (void)
 {
+  tree new_tree = target_option_current_node;
+  cl_target_option_restore (_options, TREE_TARGET_OPTION (new_tree));
+  if (TREE_TARGET_GLOBALS (new_tree))
+restore_target_globals (TREE_TARGET_GLOBALS (new_tree));
+  else if (new_tree == target_option_default_node)
+restore_target_globals (_target_globals);
+  else
+TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
+
   ix86_previous_fndecl = NULL_TREE;
 }

[Bug target/64386] ICE: in extract_insn, at recog.c:2327 (unrecognizable insn) with -mavx512bw

2016-04-07 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64386

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Kirill Yukhin  ---
Done.

[Bug target/70510] ICE: output_operand: invalid %-code with -mavx512bw -masm=intel when emitting vpbroatcast

2016-04-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70510

--- Comment #3 from Kirill Yukhin  ---
(In reply to Uroš Bizjak from comment #2)
> (In reply to Kirill Yukhin from comment #1)
> > will take a look.
> 
> I have patch in testing:
> 
Oh, great! Thanks!

[Bug target/70510] ICE: output_operand: invalid %-code with -mavx512bw -masm=intel when emitting vpbroatcast

2016-04-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70510

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-04-05
 Ever confirmed|0   |1

[Bug target/64387] ICE: in extract_insn, at recog.c:2327 (unrecognizable insn) with -ffloat-store -mavx512er

2016-04-04 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64387

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||kyukhin at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #6 from Kirill Yukhin  ---
Done

[Bug target/64393] ICE: in extract_insn, at recog.c:2327 (unrecognizable insn) with -mavx512vbmi

2016-04-03 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64393

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Kirill Yukhin  ---
Done

[Bug target/70510] ICE: output_operand: invalid %-code with -mavx512bw -masm=intel when emitting vpbroatcast

2016-04-03 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70510

Kirill Yukhin  changed:

   What|Removed |Added

 CC||kyukhin at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |kyukhin at gcc dot 
gnu.org

--- Comment #1 from Kirill Yukhin  ---
will take a look.

[Bug target/70453] gcc generates invalid instruction vextractu64x4 (should be: vextracti64x4)

2016-04-01 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70453

Kirill Yukhin  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Kirill Yukhin  ---
Done.

[Bug target/70453] gcc generates invalid instruction vextractu64x4 (should be: vextracti64x4)

2016-03-31 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70453

--- Comment #7 from Kirill Yukhin  ---
Author: kyukhin
Date: Thu Mar 31 15:25:33 2016
New Revision: 234635

URL: https://gcc.gnu.org/viewcvs?rev=234635=gcc=rev
Log:
Fix PR target/70453.

gcc/
* config/i386/sse.md (define_mode_attr shuffletype): Fix typo.

gcc/testsuite/
* gcc.target/i386/pr70453.c: New test.

Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr70453.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/i386/sse.md
branches/gcc-5-branch/gcc/testsuite/ChangeLog

[Bug target/70453] gcc generates invalid instruction vextractu64x4 (should be: vextracti64x4)

2016-03-31 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70453

--- Comment #6 from Kirill Yukhin  ---
Author: kyukhin
Date: Thu Mar 31 15:23:29 2016
New Revision: 234634

URL: https://gcc.gnu.org/viewcvs?rev=234634=gcc=rev
Log:
Fix PR target/70453.

gcc/
* config/i386/sse.md (define_mode_attr shuffletype): Fix typo.

gcc/testsuite/
* gcc.target/i386/pr70453.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70453.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/70479] FMA is not reassociated causing x2 slowdown vs. ICC

2016-03-31 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70479

--- Comment #3 from Kirill Yukhin  ---
(In reply to Richard Biener from comment #2)
> You mean we fail to handle ternary associative tree codes in GIMPLE reassoc?
> Yes, that's true.  It's not going to be easy to retro-fit there
> implementation-wise.  With rebalancing you mean handling reassoc-width > 1?

Hi Richard, yes to both.

[Bug tree-optimization/70479] FMA is not reassociated causing x2 slowdown vs. ICC

2016-03-31 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70479

--- Comment #1 from Kirill Yukhin  ---
(In reply to Kirill Yukhin from comment #0)
> Compile:
>   GCC: g++ -march=haswell -Ofast -flto -fopenmp-simd -fpermissive m.cpp -o
> m.gcc
>   ICC: icpc -O3 -ipo -fpermissive -xAVX2 -qopenmp m.cpp -o m.icc
Correct compile commands (original are for Haswell)
   GCC: g++ -march=knl -Ofast -flto -fopenmp-simd -fpermissive m.cpp -o m.gcc
   ICC: icpc -O3 -ipo -fpermissive -xMIC-AVX512 -qopenmp m.cpp -o m.icc

[Bug tree-optimization/70479] New: FMA is not reassociated causing x2 slowdown vs. ICC

2016-03-31 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70479

Bug ID: 70479
   Summary: FMA is not reassociated causing x2 slowdown vs. ICC
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 38146
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38146=edit
Reproducer

Attached example demonstrates the issue.
GCC is recent trunk. ICC is v16.

Compile:
  GCC: g++ -march=haswell -Ofast -flto -fopenmp-simd -fpermissive m.cpp -o
m.gcc
  ICC: icpc -O3 -ipo -fpermissive -xAVX2 -qopenmp m.cpp -o m.icc

Run
  GCC: time ./m.icc 2 2
  ICC: time ./m.gcc 2 2

Hot spot generated by GCC (annotated w/ perf hit counts):
157 │8d0:┌─→vbroad 0x4(%r13),%zmm0
193 ││  lea0x1(%rdx),%edx
173 ││  vmulps (%r14,%rax,1),%zmm0,%zmm0
2943││  vbroad 0x60(%r13),%zmm1
166 ││  vbroad 0x5c(%r13),%zmm2
151 ││  vbroad 0x58(%r13),%zmm3
144 ││  vbroad 0x54(%r13),%zmm4
164 ││  vbroad 0x50(%r13),%zmm5
170 ││  vbroad 0x4c(%r13),%zmm6
162 ││  vbroad 0x48(%r13),%zmm7
162 ││  vbroad 0x44(%r13),%zmm8
154 ││  vbroad 0x40(%r13),%zmm9
172 ││  vbroad 0x3c(%r13),%zmm10
167 ││  vbroad 0x38(%r13),%zmm11
172 ││  vbroad 0x34(%r13),%zmm12
171 ││  vbroad 0x30(%r13),%zmm13
161 ││  vbroad 0x2c(%r13),%zmm14
176 ││  vbroad 0x28(%r13),%zmm15
139 ││  vbroad 0x24(%r13),%zmm16
180 ││  vbroad 0x20(%r13),%zmm17
158 ││  vbroad 0x1c(%r13),%zmm18
165 ││  vbroad 0x18(%r13),%zmm19
140 ││  vbroad 0x10(%r13),%zmm21
179 ││  vbroad 0xc(%r13),%zmm22
146 ││  vbroad 0x8(%r13),%zmm23
170 ││  vbroad 0x0(%r13),%zmm24
170 ││  vbroad 0x14(%r13),%zmm20
168 ││  vfmadd (%r15,%rax,1),%zmm24,%zmm0
2732││  mov0xb8(%rsp),%rcx
172 ││  vfmadd (%r11,%rax,1),%zmm23,%zmm0
1649││  vfmadd (%rsi,%rax,1),%zmm22,%zmm0  
  3413││  vfmadd
(%rcx,%rax,1),%zmm21,%zmm0 
   3653││  mov0xc0(%rsp),%rcx  
 182 ││  vfmadd
(%rcx,%rax,1),%zmm20,%zmm0
2806││  mov0xc8(%rsp),%rcx
176 ││  vfmadd (%rcx,%rax,1),%zmm19,%zmm0
2439││  mov0xd0(%rsp),%rcx
179 ││  vfmadd (%rcx,%rax,1),%zmm18,%zmm0
2562││  mov0xd8(%rsp),%rcx
197 ││  vfmadd (%rcx,%rax,1),%zmm17,%zmm0
2867││  mov0xe0(%rsp),%rcx
141 ││  vfmadd (%rcx,%rax,1),%zmm16,%zmm0
3200││  mov0xe8(%rsp),%rcx
156 ││  vfmadd (%rcx,%rax,1),%zmm15,%zmm0
3557││  mov0xf0(%rsp),%rcx
158 ││  vfmadd (%rcx,%rax,1),%zmm14,%zmm0
││  mov0xf8(%rsp),%rcx
143 ││  vfmadd (%rcx,%rax,1),%zmm13,%zmm0
3004││  mov0x100(%rsp),%rcx
177 ││  vfmadd (%rcx,%rax,1),%zmm12,%zmm0
2876││  mov0x108(%rsp),%rcx
144 ││  vfmadd (%rcx,%rax,1),%zmm11,%zmm0
2838││  mov0x110(%rsp),%rcx
168 ││  vfmadd (%rcx,%rax,1),%zmm10,%zmm0
2503││  mov0x118(%rsp),%rcx
203 ││  vfmadd (%rcx,%rax,1),%zmm9,%zmm0
2471││  mov0x120(%rsp),%rcx
185 ││  vfmadd (%rcx,%rax,1),%zmm8,%zmm0
2153││  mov0x128(%rsp),%rcx
152 ││  vfmadd (%r12,%rax,1),%zmm7,%zmm0
2091││  vfmadd (%rbx,%rax,1),%zmm6,%zmm0
3049││  vfmadd (%r10,%rax,1),%zmm5,%zmm0
3737││  vfmadd (%r9,%rax,1),%zmm4,%zmm0
3665││  vfmadd (%r8,%rax,1),%zmm3,%zmm0
3627││  vfmadd (%rdi,%rax,1),%zmm2,%zmm0
3804││  vfmadd (%rcx,%rax,1),%zmm1,%zmm0
4052││  mov0x130(%rsp),%rcx
160 ││  cmp0x138(%rsp),%edx
534 ││  vmovup %zmm0,(%rcx,%rax,1)
3235││  lea0x40(%rax),%rax
161 │└──jb 8d0

Hot spot generated by ICC (annotated w/ perf hit counts):
   344 │47a:┌─→vmulps 0x204c(%r11,%r14,4),%zmm27,%zmm2
   821 ││  vmulps 0x2050(%r11,%r14,4),%zmm4,%zmm1
   318 ││  vmulps 0x2040(%r11,%r14,4),%zmm6,%zmm29
   818 ││  vmulps 0x1840(%r11,%r14,4),%zmm7,%zmm31
   275 ││  vfmadd 0x1838(%r11,%r14,4),%zmm9,%zmm31
  1234 ││  vfmadd 0x183c(%r11,%r14,4),%zmm8,%zmm29
   442 ││  vfmadd 0x2044(%r11,%r14,4),%zmm5,%zmm2
  1110 ││  vfmadd 0x2048(%r11,%r14,4),%zmm28,%zmm1
   337 ││  vaddps %zmm29,%zmm31,%zmm0
  1047 ││  vaddps %zmm1,%zmm2,%zmm3
   655 ││  vmulps 0x1830(%r11,%r14,4),%zmm11,%zmm30
   956 ││  vmulps 0x1834(%r11,%r14,4),%zmm10,%zmm2
   296 ││  vmulps 0x1024(%r11,%r14,4),%zmm15,%zmm1
  1050 ││  vmulps 0x1028(%r11,%r14,4),%zmm14,%zmm31
   294 ││  vaddps %zmm0,%zmm3,%zmm3
  1057 ││  vfmadd 0x102c(%r11,%r14,4),%zmm13,%zmm30
   344 ││  vfmadd 0x1030(%r11,%r14,4),%zmm12,%zmm2
   911 ││  vfmadd 0x1020(%r11,%r14,4),%zmm16,%zmm31
   332 ││  vfmadd 0x820(%r11,%r14,4),%zmm17,%zmm1
   885 ││  vaddps %z

[Bug target/70453] gcc generates invalid instruction vextractu64x4 (should be: vextracti64x4)

2016-03-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70453

Kirill Yukhin  changed:

   What|Removed |Added

  Attachment #38133|0   |1
is obsolete||

--- Comment #5 from Kirill Yukhin  ---
Created attachment 38135
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38135=edit
The patch

Woops, this one.

[Bug target/70453] gcc generates invalid instruction vextractu64x4 (should be: vextracti64x4)

2016-03-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70453

--- Comment #3 from Kirill Yukhin  ---
Created attachment 38133
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38133=edit
Proposed patch

I am reg-testing trivial patch

[Bug target/70453] gcc generates invalid instruction vextractu64x4 (should be: vextracti64x4)

2016-03-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70453

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-30
 Ever confirmed|0   |1

--- Comment #2 from Kirill Yukhin  ---
Confirmed

[Bug target/70453] gcc generates invalid instruction vextractu64x4 (should be: vextracti64x4)

2016-03-30 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70453

Kirill Yukhin  changed:

   What|Removed |Added

 CC||kyukhin at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |kyukhin at gcc dot 
gnu.org

--- Comment #1 from Kirill Yukhin  ---
Will look

[Bug target/70429] Wrong code with -O1.

2016-03-28 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70429

Kirill Yukhin  changed:

   What|Removed |Added

 CC||kyukhin at gcc dot gnu.org

--- Comment #4 from Kirill Yukhin  ---
Seems like combiner performs invalid reassociation. This trivial addition to
Jakub's PR70222 fix makes test work:

--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10526,7 +10526,7 @@ simplify_shift_const_1 (enum rtx_code code,
machine_mode result_mode,
{
  /* For ((unsigned) (cstULL >> count)) >> cst2 we have to make
 sure the result will be masked.  See PR70222.  */
- if (code == LSHIFTRT
+ if ((code == LSHIFTRT || code == ASHIFTRT)
  && mode != result_mode
  && !merge_outer_ops (_op, _const, AND,
   GET_MODE_MASK (result_mode)

[Bug target/70406] ICE: in extract_insn, at recog.c:2287 (unrecognizable insn) with -mtune=pentium2 -mavx512f

2016-03-28 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70406

Kirill Yukhin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Kirill Yukhin  ---
Done.

[Bug target/70406] ICE: in extract_insn, at recog.c:2287 (unrecognizable insn) with -mtune=pentium2 -mavx512f

2016-03-28 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70406

--- Comment #5 from Kirill Yukhin  ---
Author: kyukhin
Date: Mon Mar 28 08:01:56 2016
New Revision: 234501

URL: https://gcc.gnu.org/viewcvs?rev=234501=gcc=rev
Log:
PR target/70406.

gcc/
 * config/i386/i386.md (define_split, andn): Fix modes.

gcc/testsuite/
 * gcc.target/i386/pr70406.c: New test.

Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr70406.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/i386/i386.md
branches/gcc-5-branch/gcc/testsuite/ChangeLog

[Bug target/70406] ICE: in extract_insn, at recog.c:2287 (unrecognizable insn) with -mtune=pentium2 -mavx512f

2016-03-28 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70406

--- Comment #4 from Kirill Yukhin  ---
Author: kyukhin
Date: Mon Mar 28 07:59:44 2016
New Revision: 234500

URL: https://gcc.gnu.org/viewcvs?rev=234500=gcc=rev
Log:
PR target/70406

gcc/
 * config/i386/i386.md (define_split, andn): Fix modes.

gcc/testsuite/
 * gcc.target/i386/pr70406.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70406.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/testsuite/ChangeLog

[Bug target/70406] ICE: in extract_insn, at recog.c:2287 (unrecognizable insn) with -mtune=pentium2 -mavx512f

2016-03-25 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70406

--- Comment #3 from Kirill Yukhin  ---
Created attachment 38095
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38095=edit
Bootstrapped/regtested patch

Will submit to gcc-patches shortly

[Bug target/70406] ICE: in extract_insn, at recog.c:2287 (unrecognizable insn) with -mtune=pentium2 -mavx512f

2016-03-25 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70406

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-03-25
 Ever confirmed|0   |1

--- Comment #2 from Kirill Yukhin  ---
Reproduced.

[Bug target/70325] ICE on __builtin_ia32_storedquqi256_mask

2016-03-23 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70325

Kirill Yukhin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Kirill Yukhin  ---
Done.

[Bug target/70325] ICE on __builtin_ia32_storedquqi256_mask

2016-03-22 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70325

--- Comment #6 from Kirill Yukhin  ---
Author: kyukhin
Date: Tue Mar 22 11:13:44 2016
New Revision: 234396

URL: https://gcc.gnu.org/viewcvs?rev=234396=gcc=rev
Log:
PR target/70325.

gcc/
* config/i386/i386.c (def_builtin): Handle
OPTION_MASK_ISA_AVX512VL to be and-ed with other
bits.
(const struct builtin_description bdesc_special_args[]):
Remove duplicate ISA bits.
gcc/testsuite/
* gcc.target/i386/pr70325.c: New test.

Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr70325.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/i386/i386.c
branches/gcc-5-branch/gcc/testsuite/ChangeLog

[Bug target/70325] ICE on __builtin_ia32_storedquqi256_mask

2016-03-22 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70325

--- Comment #5 from Kirill Yukhin  ---
Author: kyukhin
Date: Tue Mar 22 11:09:03 2016
New Revision: 234395

URL: https://gcc.gnu.org/viewcvs?rev=234395=gcc=rev
Log:

PR target/70325
gcc/
* config/i386/i386.c (def_builtin): Handle
OPTION_MASK_ISA_AVX512VL to be and-ed with other
bits.
(const struct builtin_description bdesc_special_args[]):
Remove duplicate ISA bits.
gcc/testsuite/
* gcc.target/i386/pr70325.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70325.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog

[Bug target/70293] [ICE, AVX-512] Wrong reg constraints in vec_dup

2016-03-22 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70293

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Kirill Yukhin  ---
Done

[Bug target/70325] ICE on __builtin_ia32_storedquqi256_mask

2016-03-22 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70325

Kirill Yukhin  changed:

   What|Removed |Added

 Status|RESOLVED|ASSIGNED
   Last reconfirmed||2016-03-22
 Resolution|FIXED   |---
 Ever confirmed|0   |1

--- Comment #4 from Kirill Yukhin  ---
Sorry, closed by mistake

[Bug target/70325] ICE on __builtin_ia32_storedquqi256_mask

2016-03-22 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70325

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Kirill Yukhin  ---
Done.

[Bug target/70325] ICE on __builtin_ia32_storedquqi256_mask

2016-03-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70325

--- Comment #2 from Kirill Yukhin  ---
I am testing this patch:
commit e88ceeabc50634012fa21f47625934d9a2c2e160
Author: Kirill Yukhin 
Date:   Mon Mar 21 14:28:58 2016 +0300

AVX-512. Fix PR70325.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3d8dbc4..2c56ee7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -32431,7 +32431,7 @@ def_builtin (HOST_WIDE_INT mask, const char *name,

   mask &= ~OPTION_MASK_ISA_64BIT;
   if (mask == 0
- || (mask & ix86_isa_flags) != 0
+ || (mask & ix86_isa_flags) == mask
  || (lang_hooks.builtin_function
  == lang_hooks.builtin_function_ext_scope))

diff --git a/gcc/testsuite/gcc.target/i386/pr70325.c
b/gcc/testsuite/gcc.target/i386/pr70325.c
new file mode 100644
index 000..e2b9342
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr70325.c
@@ -0,0 +1,12 @@
+/* PR target/70325 */
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+
+typedef char C __attribute((__vector_size__(32)));
+typedef int I __attribute((__vector_size__(32)));
+
+void
+f(int a,I b)
+{
+  __builtin_ia32_storedquqi256_mask((C*)f,(C)b,a); /* { dg-warning "implicit
declaration of function" } */
+}

[Bug target/70293] [ICE, AVX-512] Wrong reg constraints in vec_dup

2016-03-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70293

--- Comment #5 from Kirill Yukhin  ---
Author: kyukhin
Date: Mon Mar 21 10:53:50 2016
New Revision: 234364

URL: https://gcc.gnu.org/viewcvs?rev=234364=gcc=rev
Log:
PR target/70293.

gcc/
* config/i386 (define_insn "*vec_dup"/AVX2): Block
third alternative for AVX-512VL target,
gcc/testsuite/
* gcc.target/i386/pr70293.c: New test.

Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr70293.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/i386/sse.md
branches/gcc-5-branch/gcc/testsuite/ChangeLog

[Bug target/70293] [ICE, AVX-512] Wrong reg constraints in vec_dup

2016-03-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70293

--- Comment #4 from Kirill Yukhin  ---
Author: kyukhin
Date: Mon Mar 21 10:51:04 2016
New Revision: 234363

URL: https://gcc.gnu.org/viewcvs?rev=234363=gcc=rev
Log:
PR target/70293

gcc/
* config/i386 (define_insn "*vec_dup"/AVX2): Block
third alternative for AVX-512VL target,

gcc/testsuite/
* gcc.target/i386/pr70293.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70293.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog

[Bug target/70325] ICE on __builtin_ia32_storedquqi256_mask

2016-03-21 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70325

Kirill Yukhin  changed:

   What|Removed |Added

 CC||kyukhin at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |kyukhin at gcc dot 
gnu.org

--- Comment #1 from Kirill Yukhin  ---
Reproducible with:
 ./xg++ -B. -O2 -S 1.c

[Bug target/70293] New: [ICE, AVX-512] Wrong reg constraints in vec_dup

2016-03-20 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70293

Bug ID: 70293
   Summary: [ICE, AVX-512] Wrong reg constraints in vec_dup
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 38018
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38018=edit
Reproducer

Attached testcase ICEs when compiled as:
./xgcc -B. -mtune=broadwell -mavx512vl -O2 -S ~/pixman-sse.i

1_0.32.6-r0/pixman-0.32.6/pixman/pixman-sse2.c: In function
‘fast_composite_scaled_bilinear_sse2__8__none\
_OVER’:
/home/donn/c/8.x/wrl-projects/intel-skylake-standard-glibc_std/bitbake_build/tmp/work/skylake-avx512-64-wrs-linux/pix\
man/1_0.32.6-r0/pixman-0.32.6/pixman/pixman-sse2.c:6059:1: error: insn does not
satisfy its constraints:
(insn 5050 5049 1065 58 (set (reg/v:V8HI 56 xmm19 [orig:670 D.27517 ] [670])
(vec_duplicate:V8HI (vec_select:HI (reg/v:V8HI 56 xmm19 [orig:670
D.27517 ] [670])
(parallel [
(const_int 0 [0])
]
/home/donn/c/8.x/wrl-projects/intel-skylake-standard-glibc_std/bitbake_build/tmp/sysroots/x\
86_64-linux/usr/lib/x86_64-wrs-linux/gcc/x86_64-wrs-linux/5.2.0/include/emmintrin.h:606
4153 {avx2_pbroadcastv8hi}
 (nil))
/home/donn/c/8.x/wrl-projects/intel-skylake-standard-glibc_std/bitbake_build/tmp/work/skylake-avx512-64-wrs-linux/pix\
man/1_0.32.6-r0/pixman-0.32.6/pixman/pixman-sse2.c:6059:1: internal compiler
error: in extract_constrain_insn, at rec\
og.c:2190
0xdaccab _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/export/users/kyukhin/gcc/git/gcc2/gcc/rtl-error.c:108
0xdacd0b _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/export/users/kyukhin/gcc/git/gcc2/gcc/rtl-error.c:119
0xd50b31 extract_constrain_insn(rtx_insn*)
/export/users/kyukhin/gcc/git/gcc2/gcc/recog.c:2190
0xd5f1d3 copyprop_hardreg_forward_1
/export/users/kyukhin/gcc/git/gcc2/gcc/regcprop.c:774
0xd60afe execute
/export/users/kyukhin/gcc/git/gcc2/gcc/regcprop.c:1280
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

[Bug target/70293] [ICE, AVX-512] Wrong reg constraints in vec_dup

2016-03-19 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70293

--- Comment #1 from Kirill Yukhin  ---
We've got duplication of patterns (make mddump):
;; /export/users/kyukhin/gcc/git/gcc2/gcc/config/i386/sse.md: 17107
(define_insn ("avx2_pbroadcastv8hi")
 [
(set (match_operand:V8HI 0 ("register_operand") ("=x"))
(vec_duplicate:V8HI (vec_select:HI (match_operand:V8HI 1
("nonimmediate_operand") ("xm"))
(parallel [
(const_int 0 [0])
]
] ("TARGET_AVX2") ("vpbroadcastw\t{%1, %0|%0, %w1}")
 [
(set_attr ("type") ("ssemov"))
(set_attr ("prefix_extra") ("1"))
(set_attr ("prefix") ("vex"))
(set_attr ("mode") ("TI"))
])
...
(define_insn ("avx512vl_vec_dupv8hi")
 [
(set (match_operand:V8HI 0 ("register_operand") ("=v"))
(vec_duplicate:V8HI (vec_select:HI (match_operand:V8HI 1
("nonimmediate_operand") ("vm"))
(parallel [
(const_int 0 [0])
]
] ("(TARGET_AVX512BW) && (TARGET_AVX512VL)") ("vpbroadcastw\t{%1, %0|%0,
%1}")
 [
(set_attr ("type") ("ssemov"))
(set_attr ("prefix") ("evex"))
(set_attr ("mode") ("TI"))
])

That's why we've got unsatisfied constraints on xmmN, N>15.

[Bug target/70293] [ICE, AVX-512] Wrong reg constraints in vec_dup

2016-03-18 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70293

--- Comment #2 from Kirill Yukhin  ---
Created attachment 38020
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38020=edit
Proposed patch

Attached patch solves the issue by blocking AVX2's broadcast pattern
alternative: $r->Yi, which is subject of split2

[Bug target/70293] [ICE, AVX-512] Wrong reg constraints in vec_dup

2016-03-18 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70293

--- Comment #3 from Kirill Yukhin  ---
Regtest is in progress

[Bug target/70028] Error: operand size mismatch for `kmovw' (wrong assembly generated) with -mavx512bw -masm=intel

2016-03-02 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70028

--- Comment #4 from Kirill Yukhin  ---
(In reply to Jakub Jelinek from comment #3)
> Created attachment 37835 [details]
> gcc6-pr70028.patch
> 
> So what about this patch then?  I don't see kmov* used with %k in other
> patterns, where "m" could appear.

Hi Jakub, patch is fine to me.

[Bug target/70028] Error: operand size mismatch for `kmovw' (wrong assembly generated) with -mavx512bw -masm=intel

2016-03-01 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70028

Kirill Yukhin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-01
 CC||kyukhin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Kirill Yukhin  ---
Confirmed.
The issue is that operand modifier passed in .md file is %k1,
which stands for SI mode.
It should be 32b reg or 16b memory, i.e. %ebx and WORD.

[Bug tree-optimization/69980] New: [6 regression] Supposedly wrong SLP code emitted

2016-02-26 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69980

Bug ID: 69980
   Summary: [6 regression] Supposedly wrong SLP code emitted
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 37806
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37806=edit
Reproducer

Hello,
Attached test runfails when compiled is following:
$ gfortran -m64 -Ofast repro.f90 -msse

When compiled w/ -O2  - it works fine.

Second loop nest is just for verification.

Issue lives here:
  mumax = 0;
  do k=1,26
 do i=1,3
mumax(i) = max(mumax(i), mu(i,k)+mu(i,k))
 end do
  end do

Looks like SLP emits some wrong permutations here.

[Bug tree-optimization/69956] New: [ICE] Wrong vector type @ fold-const

2016-02-25 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69956

Bug ID: 69956
   Summary: [ICE] Wrong vector type @ fold-const
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 37789
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37789=edit
Reproducer

Hello,
Attached testcase produces ICE when compiled as following:
gcc -S -O2 -march=skylake-avx512 repro.i -ftree-vectorize

I observe the ICE since 02.02.2016

/nfs/ims/home/kyukhin/repro.i:2:1: internal compiler error: tree check:
expected vector_type, have integer_type in co\
nst_unop, at fold-const.c:1665
 fn1() {
 ^~~
0xda1f9c tree_check_failed(tree_node const*, char const*, int, char const*,
...)
/export/users/gnutester/stability/svn/trunk/gcc/tree.c:9637
0x860742 tree_check(tree_node*, char const*, int, char const*, tree_code)
/export/users/gnutester/stability/svn/trunk/gcc/tree.h:3006
0x860742 const_unop(tree_code, tree_node*, tree_node*)
/export/users/gnutester/stability/svn/trunk/gcc/fold-const.c:1665
0xe7f639 gimple_resimplify1(gimple**, code_helper*, tree_node*, tree_node**,
tree_node* (*)(tree_node*))
/export/users/gnutester/stability/svn/trunk/gcc/gimple-match-head.c:85
0xee84b3 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**,
tree_node* (*)(tree_node*), tree_node* (*)(tre\
e_node*))
/export/users/gnutester/stability/svn/trunk/gcc/gimple-match-head.c:622
0x8a0933 gimple_fold_stmt_to_constant_1(gimple*, tree_node* (*)(tree_node*),
tree_node* (*)(tree_node*))
/export/users/gnutester/stability/svn/trunk/gcc/gimple-fold.c:4981
0xc409d2 back_propagate_equivalences
/export/users/gnutester/stability/svn/trunk/gcc/tree-ssa-dom.c:881
0xc409d2 record_temporary_equivalences(edge_def*, const_and_copies*,
avail_exprs_stack*)
/export/users/gnutester/stability/svn/trunk/gcc/tree-ssa-dom.c:963
0xd0663a thread_through_normal_block
   
/export/users/gnutester/stability/svn/trunk/gcc/tree-ssa-threadedge.c:858
0xd07a22 thread_across_edge(gcond*, edge_def*, bool, const_and_copies*,
avail_exprs_stack*, tree_node* (*)(gimple*, g\
imple*, avail_exprs_stack*))
   
/export/users/gnutester/stability/svn/trunk/gcc/tree-ssa-threadedge.c:1005
0xc404c0 dom_opt_dom_walker::thread_across_edge(edge_def*)
/export/users/gnutester/stability/svn/trunk/gcc/tree-ssa-dom.c:989
0xc406eb dom_opt_dom_walker::after_dom_children(basic_block_def*)
/export/users/gnutester/stability/svn/trunk/gcc/tree-ssa-dom.c:1423
0x11a47a7 dom_walker::walk(basic_block_def*)
/export/users/gnutester/stability/svn/trunk/gcc/domwalk.c:307
0xc432a0 execute
/export/users/gnutester/stability/svn/trunk/gcc/tree-ssa-dom.c:614
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

I suspect scalar masks.

[Bug tree-optimization/69882] New: [6 regression] Excessive reduction statements generated by SLP

2016-02-20 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69882

Bug ID: 69882
   Summary: [6 regression] Excessive reduction statements
generated by SLP
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 37743
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37743=edit
Reproducer

Hello,
Attached test case emits wrong reduction statements.

Compile:
$ trunk/64/20160220/bin/gfortran -o repro -static -m64 -Ofast -mavx repro.f90

Execution ABORTs

Works fine when compiled w/ -O0

Extract from vectorizer dump:
  :
  # k_239 = PHI <k.4_11(48), k_266(56)>
  # c_I_lsm.10_241 = PHI <c_I_lsm.10_48(48), M.0_249(56)>
  # c_I_lsm.11_242 = PHI <c_I_lsm.11_3(48), M.0_252(56)>
  # vectp_a.47_406 = PHI <vectp_a.48_402(48), vectp_a.47_407(56)>
  # vect_M.50_410 = PHI <vect_cst__412(48), vect_M.50_411(56)>
  # ivtmp_420 = PHI <0(48), ivtmp_421(56)>
  _245 = (integer(kind=8)) k_239;
  _246 = _245 * 4;
  _247 = _246 + -4;
  _248 = *a_22(D)[_247];
  M.0_249 = MAX_EXPR <_248, c_I_lsm.10_241>;
  _250 = _246 + -3;
  vect__248.49_408 = MEM[(real(kind=8) *)vectp_a.47_406]; <-- SLP
  vectp_a.47_409 = vectp_a.47_406 + 32;
  _251 = *a_22(D)[_250];
  vect_M.50_411 = MAX_EXPR <vect__248.49_408, vect_M.50_410>; <-- SLP
  M.0_252 = MAX_EXPR <_251, c_I_lsm.11_242>;
  k_266 = k_239 + 1;
  vectp_a.47_407 = vectp_a.47_409 + 32; < -- SLP
  ivtmp_421 = ivtmp_420 + 1;
  if (ivtmp_421 >= bnd.44_361)
goto ;
  else
goto ;

  :
  ... # REMAINDER
  k_377 = k_365 + 1;
  if (k_365 == 26)
goto ;
  else
goto ;

  :
  goto ;

  :
  # k_381 = PHI <k_266(49)>
  # c_I_lsm.10_384 = PHI <M.0_249(49)>
  # c_I_lsm.11_386 = PHI <M.0_252(49)>
  # c_I_lsm.13_389 = PHI <c_I_lsm.13_84(49)>
  # c_I_lsm.12_392 = PHI <c_I_lsm.12_13(49)>
  # vect_M.50_413 = PHI <vect_M.50_411(49)>
  stmp_M.51_414 = BIT_FIELD_REF <vect_M.50_413, 64, 0>;
  stmp_M.51_415 = BIT_FIELD_REF <vect_M.50_413, 64, 64>;
  stmp_M.51_416 = BIT_FIELD_REF <vect_M.50_413, 64, 128>;
  stmp_M.51_417 = BIT_FIELD_REF <vect_M.50_413, 64, 192>;
  stmp_M.51_418 = MAX_EXPR <stmp_M.51_414, stmp_M.51_416>;  # <-- WHOT??
  stmp_M.51_419 = MAX_EXPR <stmp_M.51_415, stmp_M.51_417>;  # <-- DITTO.
  _401 = (integer(kind=4)) ratio_mult_vf.45_364;
  tmp.46_400 = k.4_11 + _401;
  if (niters.42_358 == ratio_mult_vf.45_364)
goto ;
  else
goto ;

Those 2 SSA names are then stored to 1st and 2nd array elements

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-18 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #24 from Kirill Yukhin  ---
(In reply to rguent...@suse.de from comment #23)
> On Wed, 17 Feb 2016, jakub at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671
> > 
> > --- Comment #22 from Jakub Jelinek  ---
> > Created attachment 37722 [details]
> >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37722=edit
> > gcc6-pr69671.patch
> > 
> > Actually, on a closer look, I believe the only problem are the patterns that
> > use a vector_move_operand "0C" inside of vec_select with only constants as 
> > the
> > parallel's operands.  Because fwprop is able to propagate constants into
> > instructions (thus undo the CSE effect), but doesn't do anything on these,
> > because it also simplifies them, so instead of the expected say
> > (vec_select:V4QI (const_vector:V16QI [
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > ])
> > (parallel [
> > (const_int 0 [0])
> > (const_int 1 [0x1])
> > (const_int 2 [0x2])
> > (const_int 3 [0x3])
> > ]))
> > we get in there simplified:
> > (const_vector:V4QI [
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > (const_int 0 [0])
> > ])
> > So, by adding extra patterns for that simplification fwprop is able to do 
> > its
> > job even if CSE did a better job.
> 
> Of course then I wonder why we didn't simplify this in the first place
> when generating RTL and need to wait for forwprop ...
> 
> But yes, sounds like the easiest way to go forward.

Agree.

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-17 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #21 from Kirill Yukhin  ---
I am going to fix the issue in v7 for sure.
But from current point of view this is going to be great pattern refactoring
and hence patch will be thousands of lines.
If this might be ported - I can put an XFAIL on the tests

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-12 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #14 from Kirill Yukhin  ---
Okay,
I've tried:
1. Run AVX-512 testing on Spec2006 and see no impact of the one-liner:
Geomeans:
INT : 5.11 5.11+0.05%
FP  : 2.73 2.73-0.08%
ALL : 3.54 3.54-0.02%

2. Tried Uroš's proposal. Adding to guilty pattern a condition like this:
  "TARGET_AVX512VL
   && ((REG_P (operands[2]) && REG_P (operands[0]) && REGNO (operands[0]) ==
REGNO (operands[2]))
   || (operands[2] == CONST0_RTX (mode)))"

  No success as well. The problem is that zero-masked built-in have register as
second sorce at expand. Which when rematerializes to zero. So, setting this
condition will lead to ICE in recog @ expand.

So, for v6 it looks like we need to remove one-liner.

For v7  we need to extend define_subst a bit to allow multiple output patterns.
E.g. currently:
(define_subst "mask"
  [(set (match_operand:SUBST_V 0)
(match_operand:SUBST_V 1))]
  "TARGET_AVX512F"
  [(set (match_dup 0)
(vec_merge:SUBST_V
  (match_dup 1)
  (match_operand:SUBST_V 2 "vector_move_operand" "0C")
  (match_operand: 3 "register_operand" "Yk")))])

It'd solve a problem if we'll had this instead:
(define_subst "mask"
  [(set (match_operand:SUBST_V 0)
(match_operand:SUBST_V 1))]
  "TARGET_AVX512F"
  [(set (match_dup 0)
(vec_merge:SUBST_V
  (match_dup 1)
  (match_dup 0)
  (match_operand: 3 "register_operand" "Yk")))])
  [(set (match_dup 0)
(vec_merge:SUBST_V
  (match_dup 1)
  (match_operand:SUBST_V 2 "const0_operand" "C")
  (match_operand: 3 "register_operand" "Yk")))])

Opinions?

[Bug libfortran/69651] Usage of unitialized pointer io/list_read.c

2016-02-07 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69651

--- Comment #3 from Kirill Yukhin  ---
Created attachment 37627
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37627=edit
Reproducer src

Reproducer

[Bug libfortran/69651] Usage of unitialized pointer io/list_read.c

2016-02-07 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69651

--- Comment #4 from Kirill Yukhin  ---
Created attachment 37628
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37628=edit
Reproducer input

[Bug libfortran/69651] Usage of unitialized pointer io/list_read.c

2016-02-07 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69651

--- Comment #5 from Kirill Yukhin  ---
A bug in fortran's IO RT has emerged during 21 Apr 2016,
between r54 and r92; 
looks like it's caused by the same revision –r71
(libgfortran/io/list_read.c ), which probably just triggers
another hidden bug.

Trying two  builds (as of  21 and 22 Apr ):
$  gfortran-20160421 -O0 T.f90 -static  
$ ./a.out 
 res, (1) ==1 !
### Ok 

$  gfortran-20160422 -O0 T.f90 -static  
$ ./a.out  
 res, (1) ==   80  @p¼B
### FAIL – garbage is read in 

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #5 from Kirill Yukhin  ---
(In reply to ktkachov from comment #3)
> CC'ing Kirill for AVX512 opinion

I suppose that there's something wrong w/ MD patterns.
E.g. for example provided pattern is:
;; /export/users/kyukhin/gcc/git/gcc/gcc/config/i386/sse.md: 9199
(define_insn ("avx512vl_truncatev4siv4qi2_mask")
 [
(set (match_operand:V16QI 0 ("register_operand") ("=v"))
(vec_concat:V16QI (vec_merge:V4QI (truncate:V4QI
(match_operand:V4SI 1 ("register_operand") ("v")))
(vec_select:V4QI (match_operand:V16QI 2
("vector_move_operand") ("0C"))
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
]))
(match_operand:QI 3 ("register_operand") ("Yk")))
(const_vector:V12QI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
Right now I think that 2nd operand predicate is not correct.
It should be const0_rtx (of corresponding mode) or duplicate of operand 0
(result actually)
This is whats contstraint tells.

However predicate says simply that operand is either const0_rtx or
nonimmediate: no connection with operand 0.

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

Kirill Yukhin  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |kyukhin at gcc dot 
gnu.org

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #6 from Kirill Yukhin  ---
This bug seems to be mine.

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #8 from Kirill Yukhin  ---
(In reply to Jakub Jelinek from comment #7)
> So do you want to use reg_or_0_operand?  I don't think we usually tie output
> with input already in the predicates, except when match_dup is used.

That is the issue. reg_or_0_operand won't work (although it is better than
"vector_move_operand" since it is prohibits memory)

We want 2nd operand to be either:
1. const0_rtx
2. match_dup 0

I cannot see in gcc/genpreds.c if a reference to another operands is possible
from the other.

We might invent some complicated subst. But patterns look too complicated for
that.

Maybe extend genpreds.c and friends introducing new version of predicate which
will take instead of (op, mode) -> (op, mode, operands).
Not sure in volume of efforts though.
Really hope there's some simpler solution.

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-05 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #10 from Kirill Yukhin  ---
(In reply to Jakub Jelinek from comment #9)
> But something like that might remove the flexibility from the register
> allocator.
> 
> Wonder why the RA in this case doesn't see that the value loaded into that
> pseudo register is CONST0_RTX which satisfies the C constraint and doesn't
> undo CSE (rematerialize) in that case if it doesn't have that value already
> loaded in the matching register to the output one.

Then I see two options:
1. Split all patterns into match_dup and 0_operand by hand
2. Implement dedicated subst for such a patterns which will do p.1 while
processing MD. Not sure it'll be easy

[Bug libfortran/69651] New: Usage of unitialized pointer io/list_read.c

2016-02-03 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69651

Bug ID: 69651
   Summary: Usage of unitialized pointer io/list_read.c
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libfortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kyukhin at gcc dot gnu.org
  Target Milestone: ---

Unfortunately I have no testcase.

But code itself looks awful to me:
/* Worker function to save a KIND=4 character to a string buffer,
   enlarging the buffer as necessary.  */

static void
push_char4 (st_parameter_dt *dtp, int c)
{
  gfc_char4_t *new, *p = (gfc_char4_t *) dtp->u.p.saved_string;

  if (p == NULL)
{
  dtp->u.p.saved_string = xcalloc (SCRATCH_SIZE, sizeof (gfc_char4_t));
  dtp->u.p.saved_length = SCRATCH_SIZE;
  dtp->u.p.saved_used = 0;
  p = (gfc_char4_t *) dtp->u.p.saved_string;
}

  if (dtp->u.p.saved_used >= dtp->u.p.saved_length)
{
  dtp->u.p.saved_length = 2 * dtp->u.p.saved_length;
  p = xrealloc (p, dtp->u.p.saved_length * sizeof (gfc_char4_t));

  memset4 (new + dtp->u.p.saved_used, 0, // <-- ??? new==junk ???
  dtp->u.p.saved_length - dtp->u.p.saved_used);
}

  p[dtp->u.p.saved_used++] = c;
}

It was introduced w/ r210948
(https://gcc.gnu.org/ml/fortran/2014-05/msg00149.html). Before that new was [at
least] initialized.

[Bug target/69120] sse2_shufpd_v2df_mask has wrong name

2016-02-03 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69120

--- Comment #2 from Kirill Yukhin  ---
Looked closely.
The name was chosen intentionally to simplify "sse2_shufpd"
expand. If we want to fix this name - new subst attribute need to be introduced
and 
if () 
  emit_insn (avx512vl_...
else
  emit_insn (sse2_...
inserted into the expand.

Beside of the expand this template never called by name.

So, I bet to have the name unchanged and keep things simple.

[Bug target/69120] sse2_shufpd_v2df_mask has wrong name

2016-02-03 Thread kyukhin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69120

--- Comment #1 from Kirill Yukhin  ---
Will fix.

  1   2   3   >