[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-08 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #27 from Alexander Nesterovskiy --- Place of interest here is a loop in mat_times_vec function. For r253678 a mat_times_vec.constprop._loopfn.0 is created with autopar. For r256990 the mat_times_vec is inlined into bi_cgstab_block

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-08 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #26 from Alexander Nesterovskiy --- Created attachment 43361 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43361=edit r253678 vs r256990_work_spin

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-05 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #25 from amker at gcc dot gnu.org --- (In reply to Alexander Nesterovskiy from comment #24) > Yes, it looks like more time is being spent in synchronizing. > r256990 really changes the way autopar works: > For r253679...r256989 the

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-02 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #24 from Alexander Nesterovskiy --- Yes, it looks like more time is being spent in synchronizing. r256990 really changes the way autopar works: For r253679...r256989 the most of work was in main thread0 mostly (thread0 ~91%,

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-02 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #23 from Alexander Nesterovskiy --- Created attachment 43326 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43326=edit r253678 vs r256990

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #22 from Jakub Jelinek --- libgomp has many ways to tweak the wait behavior, look for OMP_WAIT_POLICY and GOMP_SPINCOUNT environment variables to tweak the spinning vs. futex waiting. Is the work scheduled for each thread roughly the

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-30 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #21 from amker at gcc dot gnu.org --- (In reply to Alexander Nesterovskiy from comment #20) > I've made test runs on Broadwell and Skylake, RHEL 7.3. > 410.bwaves became faster after r256990 but not as fast as it was on r253678. >

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-30 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #20 from Alexander Nesterovskiy --- I've made test runs on Broadwell and Skylake, RHEL 7.3. 410.bwaves became faster after r256990 but not as fast as it was on r253678. Comparing 410.bwaves performance, "-Ofast -funroll-loops -flto

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-29 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED CC|

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-23 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #18 from amker at gcc dot gnu.org --- Author: amker Date: Tue Jan 23 16:47:03 2018 New Revision: 256990 URL: https://gcc.gnu.org/viewcvs?rev=256990=gcc=rev Log: PR tree-optimization/82604 * tree-loop-distribution.c

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #17 from rguenther at suse dot de --- On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 > > --- Comment #16 from amker at gcc dot gnu.org --- > I hope it's possible to break

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #16 from amker at gcc dot gnu.org --- I hope it's possible to break the dependence by reordering passes so that graphite/parallelization could be moved earlier. There are several issues like this IIRC.

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #15 from rguenther at suse dot de --- On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 > > --- Comment #14 from amker at gcc dot gnu.org --- > (In reply to rguent...@suse.de

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #14 from amker at gcc dot gnu.org --- (In reply to rguent...@suse.de from comment #13) > On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 > > > > --- Comment #12 from

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #13 from rguenther at suse dot de --- On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 > > --- Comment #12 from amker at gcc dot gnu.org --- > (In reply to rguent...@suse.de

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #12 from amker at gcc dot gnu.org --- (In reply to rguent...@suse.de from comment #11) > On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > > > I think the zeroing stmt can be distributed into a separate loop nest > (up to

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #11 from rguenther at suse dot de --- On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 > > --- Comment #10 from amker at gcc dot gnu.org --- > For the record, there is

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-18 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #10 from amker at gcc dot gnu.org --- For the record, there is another possible fix. Quoted loop nest from gcc/testsuite/gfortran.dg/pr81303.f: do j=1,ny jm1=mod(j+ny-2,ny)+1 jp1=mod(j,ny)+1

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-11-27 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #8 from amker at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #7) > The #c5 approach sounds better to me, we can have memsets in the IL even > from the user, so would be nice if we handled those in the dr analysis too.

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-11-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-11-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #6 from Richard Biener --- failed loop-distribution hack: (still needs dependence analysis fixes) Could even preserve TBAA if we use a {} of correct element array type. For constant sizes this should be always a win. Index:

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-11-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #5 from Richard Biener --- Sth like the following which only works up to the point of dependence analysis trying to disambiguate this ref against others ... I suppose some dr_may_alias_p tweaks to consider niter information and

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-11-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #4 from Richard Biener --- So it still parallelizes the loop(s) but at one level deeper (line 176 vs. 173). This is because dependence analysis does not handle calls and loop distribution distributed a memset. ISL dependence

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-10-19 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #3 from rguenther at suse dot de --- On Thu, 19 Oct 2017, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 > > --- Comment #2 from amker at gcc dot gnu.org --- > (In reply to Richard Biener from

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-10-19 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #2 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #1) > I suppose loop distribution inserted a version copy turning this into a > non-perfect nest for outer loops and thus disabling autopar there. > > What

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-10-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 Richard Biener changed: What|Removed |Added Keywords||missed-optimization