[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-07-30 Thread janus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 janus at gcc dot gnu.org changed: What|Removed |Added CC||janus at gcc dot gnu.org ---

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-02-09 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 sergey.shalnov at intel dot com changed: What|Removed |Added Status|NEW |RESOLVED

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-02-08 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #37 from uros at gcc dot gnu.org --- Author: uros Date: Thu Feb 8 22:31:15 2018 New Revision: 257505 URL: https://gcc.gnu.org/viewcvs?rev=257505=gcc=rev Log: PR target/83008 * config/i386/x86-tune-costs.h

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-02-08 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #36 from sergey.shalnov at intel dot com --- The patch fixes the issue for SKX is in https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html I will close the PR after the patch has been merged. Thank you very much for all involved.

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #35 from Christophe Lyon --- Author: clyon Date: Wed Feb 7 09:12:48 2018 New Revision: 257438 URL: https://gcc.gnu.org/viewcvs?rev=257438=gcc=rev Log: [testsuite] Fix gcc.dg/cse_recip.c for AArch64 after r257181. 2018-02-07

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #34 from Richard Biener --- Author: rguenth Date: Tue Jan 30 11:19:47 2018 New Revision: 257181 URL: https://gcc.gnu.org/viewcvs?rev=257181=gcc=rev Log: 2018-01-30 Richard Biener PR

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-29 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #33 from sergey.shalnov at intel dot com --- Richard, I'm not sure is it a regression or not. I see code has been visibly refactored in this commit https://github.com/gcc-mirror/gcc/commit/ee6e9ba576099aed29f1097195c649fc796ecf5e in

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-29 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #32 from rguenther at suse dot de --- On Fri, 26 Jan 2018, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #31 from sergey.shalnov at intel dot com --- > Richard, > Thank

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-26 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #31 from sergey.shalnov at intel dot com --- Richard, Thank you for your latest patch. This patch is exactly that I’ve discussed in this issue request. I tested it with SPEC20[06|17] and see no performance/stability degradation.

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 Richard Biener changed: What|Removed |Added Attachment #43084|0 |1 is obsolete|

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-19 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #29 from sergey.shalnov at intel dot com --- Richard, Thank you for your latest patch. I would like to clarify the multiple_p() function usage in if() clause. First of all, I assume that architectures with fixed size of HW

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-17 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #28 from sergey.shalnov at intel dot com --- Richard, Thank you for your comments. I see that TYPE_VECTOR_SUBPARTS is constant for for the test case but multiple_p (group_size, const_nunits) returns 1 in the code: if

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #27 from rguenther at suse dot de --- On Wed, 10 Jan 2018, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #26 from sergey.shalnov at intel dot com --- > Sorry, did you

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #26 from sergey.shalnov at intel dot com --- Sorry, did you meant "arm_sve.h" on ARM? In this case we have machine specific code in common part of the gcc code. Should we make it as machine dependent callback function because having

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #25 from rguenther at suse dot de --- On Wed, 10 Jan 2018, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #24 from sergey.shalnov at intel dot com --- > Richard, > The

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #24 from sergey.shalnov at intel dot com --- Richard, The latest "SLP costing for constants/externs improvement" patch generates the same code as baseline for the test example. Are you sure that "num_vects_to_check" should 1 if

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #22 from rguenther at suse dot de --- On Wed, 10 Jan 2018, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #21 from sergey.shalnov at intel dot com --- > Thanks Richard

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #22 from rguenther at suse dot de --- On Wed, 10 Jan 2018, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #21 from sergey.shalnov at intel dot com --- > Thanks Richard

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #23 from Richard Biener --- Created attachment 43084 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43084=edit SLP costing for constants/externs improvement

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-10 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #21 from sergey.shalnov at intel dot com --- Thanks Richard for your comments. Based on our discussion I've produced the patch attached and run it on SPEC2017intrate/fprate on skylake server (with [-Ofast -flto -march=skylake-avx512

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-02 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #20 from sergey.shalnov at intel dot com --- Richard, I did quick static analysis for your latest patch. Using command line “-g -Ofast -mfpmath=sse -funroll-loops -march=znver1” your latest patch doesn’t affects the issue I discussed

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2018-01-02 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #19 from rguenther at suse dot de --- On Sun, 24 Dec 2017, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #18 from sergey.shalnov at intel dot com --- > Yes, I agree that

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-24 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #18 from sergey.shalnov at intel dot com --- Yes, I agree that vector_store stage has it’s own vectorization cost. And each vector_store has vector_construction stage. These stages are different in gcc slp (as you know). To better

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-15 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #17 from rguenther at suse dot de --- On Fri, 15 Dec 2017, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #16 from sergey.shalnov at intel dot com --- > «it's one

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-15 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #16 from sergey.shalnov at intel dot com --- «it's one vec_construct operation - it's the task of the target to turn this into a cost comparable to vector_store» I agree that vec_construct operation cost is based on the target cost

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-15 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #15 from rguenther at suse dot de --- On Fri, 15 Dec 2017, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #14 from sergey.shalnov at intel dot com --- > " we have a

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-15 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #14 from sergey.shalnov at intel dot com --- " we have a basic-block vectorizer. Do you propose to remove it? " Definitely not! SLP vectorizer is very good to have! “What's the rationale for not using vector registers” I just tried

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #13 from rguenther at suse dot de --- On Fri, 8 Dec 2017, sergey.shalnov at intel dot com wrote: > And it uses xmm+ vpbroadcastd to spill tmp[] to stack > ... > 1e7: 62 d2 7d 08 7c c9 vpbroadcastd %r9d,%xmm1 > 1ed: c4 c1

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #12 from sergey.shalnov at intel dot com --- Richard, Your last proposal changed the code generated a bit. Currently is shows: test_bugzilla1.c:6:5: note: Cost model analysis:. Vector inside of loop cost: 62576 Vector prologue

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #11 from sergey.shalnov at intel dot com --- Richard, “Is this about the "stupid" attempt to use as little AVX512 as possible” No, it is not. I provided asm listing at the beginning with zmm only to illustrate the issue more

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #10 from Richard Biener --- Just to note this is _basic block vectorization_ triggering. Of course we do vectorize basic blocks even when we do not vectorize any loop. Is this about the "stupid" attempt to use as little AVX512 as

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #9 from sergey.shalnov at intel dot com --- Created attachment 42813 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42813=edit New reproducer Slightly changed first loop

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #8 from sergey.shalnov at intel dot com --- Richard, This is great changes and I see the first loop became vectorized for the test example I provided with gcc-8.0 main trunk. But I think the issue a bit more complicated. Vectorization

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #7 from Richard Biener --- Note the first loop is now vectorized fine thus the strange code is gone. -> fixed? (probably by the fix for PR83202)

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #6 from sergey.shalnov at intel dot com --- I found the issue request related to the vactorization issues in second loop (reduction uint->int). https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65930

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-12-08 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #5 from sergey.shalnov at intel dot com --- (In reply to Richard Biener from comment #2) > The strange code is because we perform basic-block vectorization resulting in > > vect_cst__249 = {_251, _251, _251, _251, _334, _334, _334,

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-11-20 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #4 from rguenther at suse dot de --- On Sun, 19 Nov 2017, hubicka at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > Jan Hubicka changed: > >What|Removed |Added

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-11-19 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-11-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 Richard Biener changed: What|Removed |Added Keywords||missed-optimization

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

2017-11-15 Thread sergey.shalnov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 --- Comment #1 from sergey.shalnov at intel dot com --- Created attachment 42616 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42616=edit reproducer