[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-06-02 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #50 from Thomas Koenig  ---
Author: tkoenig
Date: Sun Jun  2 15:18:22 2019
New Revision: 271844

URL: https://gcc.gnu.org/viewcvs?rev=271844=gcc=rev
Log:
2019-06-02  Thomas Koenig  

PR fortran/90539
* trans-expr.c (gfc_conv_subref_array_arg): If the size of the
expression can be determined to be one, treat it as contiguous.
Set likelyhood of presence of an actual argument according to
PRED_FORTRAN_ABSENT_DUMMY and likelyhood of being contiguous
according to PRED_FORTRAN_CONTIGUOUS.

2019-06-02  Thomas Koenig  

PR fortran/90539
* predict.def (PRED_FORTRAN_CONTIGUOUS): New predictor.

2019-06-02  Thomas Koenig  

PR fortran/90539
* gfortran.dg/internal_pack_24.f90: New test.


Added:
trunk/gcc/testsuite/gfortran.dg/internal_pack_24.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/trans-expr.c
trunk/gcc/predict.def
trunk/gcc/testsuite/ChangeLog

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #49 from Martin Liška  ---
(In reply to Martin Liška from comment #48)
> I see the performance is back as seen here:
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=21.270.0
> 
> -Ofast periodic tester hasn't finished yet, but I would close the PR.
> Thank you Thomas!

-Ofast -march native is also fine:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=23.270.0

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #48 from Martin Liška  ---
I see the performance is back as seen here:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=21.270.0

-Ofast periodic tester hasn't finished yet, but I would close the PR.
Thank you Thomas!

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-30 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Thomas Koenig  changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING

--- Comment #47 from Thomas Koenig  ---
Waiting for feedback on the speed issue.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-29 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #46 from Thomas Koenig  ---
Let's see if the failures go away (they should) and also what the
performance impact is now.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-29 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #45 from Thomas Koenig  ---
Author: tkoenig
Date: Wed May 29 20:30:45 2019
New Revision: 271751

URL: https://gcc.gnu.org/viewcvs?rev=271751=gcc=rev
Log:
2019-05-29  Thomas Koenig  

PR fortran/90539
* gfortran.h (gfc_has_dimen_vector_ref): Add prototype.
* trans.h (gfc_conv_subref_array_arg): Add argument check_contiguous.
(gfc_conv_is_contiguous_expr): Add prototype.
* frontend-passes.c (has_dimen_vector_ref): Remove prototype,
rename to
(gfc_has_dimen_vector_ref): New function name.
(matmul_temp_args): Use gfc_has_dimen_vector_ref.
(inline_matmul_assign): Likewise.
* trans-array.c (gfc_conv_array_parameter): Also check for absence
of a vector subscript before calling gfc_conv_subref_array_arg.
Pass additional argument to gfc_conv_subref_array_arg.
* trans-expr.c (gfc_conv_subref_array_arg): Add argument
check_contiguous. If that is true, check if the argument
is contiguous and do not repack in that case.
* trans-intrinsic.c (gfc_conv_intrinsic_is_contiguous): Split
away most of the work into, and call
(gfc_conv_intrinsic_is_coniguous_expr): New function.

2019-05-29  Thomas Koenig  

PR fortran/90539
* gfortran.dg/internal_pack_21.f90: Adjust scan patterns.
* gfortran.dg/internal_pack_22.f90: New test.
* gfortran.dg/internal_pack_23.f90: New test.


Added:
trunk/gcc/testsuite/gfortran.dg/internal_pack_22.f90
trunk/gcc/testsuite/gfortran.dg/internal_pack_23.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/frontend-passes.c
trunk/gcc/fortran/gfortran.h
trunk/gcc/fortran/trans-array.c
trunk/gcc/fortran/trans-expr.c
trunk/gcc/fortran/trans-intrinsic.c
trunk/gcc/fortran/trans.h
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gfortran.dg/internal_pack_21.f90

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-29 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Thomas Koenig  changed:

   What|Removed |Added

   Keywords||patch, wrong-code

--- Comment #44 from Thomas Koenig  ---
Patch here: https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01882.html

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-29 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #43 from Martin Liška  ---
(In reply to Thomas Koenig from comment #42)
> Created attachment 46428 [details]
> Patch which should finally work
> 
> So, this does not regress, apparently.
> 
> Martin, could you give this one a shot?

I can verify that segmentation faults for both benchmarks are gone.
Thank you very much!

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-28 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Thomas Koenig  changed:

   What|Removed |Added

  Attachment #46427|0   |1
is obsolete||

--- Comment #42 from Thomas Koenig  ---
Created attachment 46428
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46428=edit
Patch which should finally work

So, this does not regress, apparently.

Martin, could you give this one a shot?

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-28 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #41 from Thomas Koenig  ---
Just noticed that this causes a regression in
gfortran.fortran-torture/execute/arrayarg.f90 , but only at certain
optimization levels.

Oh well... need to look some more.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-28 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Thomas Koenig  changed:

   What|Removed |Added

  Attachment #46420|0   |1
is obsolete||

--- Comment #40 from Thomas Koenig  ---
Created attachment 46427
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46427=edit
Updated patch

OK, so this patch fixes the shortened test case and netcdf.

It is basically the earlier one with two lines interchanged.

The idea of the patch is simple: Do the same as the library
version and don't repack if the array in question is
already contiguous

Martin, can you check if that this fixes the SPEC problem, too?

If so, we can commit and then worry about fine-tuning of when
to use this and when to use the library version.  I could imagine
that, for a procedure with very many arguments, using a library
function could be a win because the inlined version would
use more icache.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-28 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #39 from kargl at gcc dot gnu.org ---
(In reply to Thomas Koenig from comment #38)
> So, I finally have a self-contained test case:
> 
> module t2
>   implicit none
> contains
>   subroutine foo(a)
> real, dimension(*) :: a
>   end subroutine foo
> end module t2
> module t1
>   use t2
>   implicit none
> contains
>   subroutine bar(a)
> real, dimension(:) :: a
> call foo(a)
>   end subroutine bar
> end module t1
> 
> program main
>   use t1
>   call bar([1.0, 2.0])
> end program main

This looks an optimizer bug.  Compiling with -fdump-tree-original
-fdump-tree-optimize -O  gives

(in a.f90.004t.original)
MAIN__ ()
{
  {
static real(kind=4) A.5[2] = {1.0e+0, 2.0e+0};
struct array01_real(kind=4) parm.6;

parm.6.span = 4;
parm.6.dtype = {.elem_len=4, .rank=1, .type=3};
parm.6.dim[0].lbound = 0;
parm.6.dim[0].ubound = 1;
parm.6.dim[0].stride = 1;
parm.6.data = (void *) [0];
parm.6.offset = 0;
bar ();
  }
}

(in a.f90.231t.optimized)

main (integer(kind=4) argc, character(kind=1) * * argv)
{
  struct array01_real(kind=4) parm.9;
  static integer(kind=4) options.10[7] = {2116, 4095, 0, 0, 1, 0, 31};

   [local count: 1073741824]:
  _gfortran_set_args (argc_2(D), argv_3(D));
  _gfortran_set_options (7, [0]);
  # DEBUG INLINE_ENTRY MAIN__
  parm.9.span = 4;
  MEM[(struct dtype_type *) + 24B] = {};
  parm.9.dtype.elem_len = 4;
  parm.9.dtype.rank = 1;
  parm.9.dtype.type = 3;
  parm.9.dim[0].lbound = 0;
  parm.9.dim[0].ubound = 1;
  parm.9.dim[0].stride = 1;
  parm.9.data = [0];
  parm.9.offset = 0;
  bar ();
  parm.9 ={v} {CLOBBER};
  return 0;

}

Note 'static real(kind=4) A.5[2] = {1.0e+0, 2.0e+0};' in *original
appears to be A.8 in *.optimized, but the static declaration is
gone.  Perhaps, the Fortran FE needs to mark that actual arguments
as "used" by gfc_mark_ss_chain_used() or TREE_USED().

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #38 from Thomas Koenig  ---
So, I finally have a self-contained test case:

module t2
  implicit none
contains
  subroutine foo(a)
real, dimension(*) :: a
  end subroutine foo
end module t2
module t1
  use t2
  implicit none
contains
  subroutine bar(a)
real, dimension(:) :: a
call foo(a)
  end subroutine bar
end module t1

program main
  use t1
  call bar([1.0, 2.0])
end program main

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #37 from Thomas Koenig  ---
Hm, with that patch, there still seems to be a failure in netcdf :-(
I will keep looking (possibly some small problem with the patch).

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #36 from Thomas Koenig  ---
... which should be

Index: testsuite/gfortran.dg/internal_pack_21.f90
===
--- testsuite/gfortran.dg/internal_pack_21.f90  (Revision 271629)
+++ testsuite/gfortran.dg/internal_pack_21.f90  (Arbeitskopie)
@@ -20,5 +20,5 @@
 USE M1
 CALL S2()
 END
-! { dg-final { scan-tree-dump-times "optional" 4 "original" } }
+! { dg-final { scan-tree-dump-times "arg_ptr" 5 "original" } }
 ! { dg-final { scan-tree-dump-not "_gfortran_internal_unpack" "original" } }
ig25@linux-p51k:~/Gcc/trunk/gcc>

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #35 from Thomas Koenig  ---
(In reply to Thomas Koenig from comment #34)
> Created attachment 46420 [details]
> Patch which includes a check for being contiguous
> 
> This patch looks like it could do the job.  I'll have to work a bit
> more on test cases and ChangeLog before I can submit this, but
> at least it survives regression testing.

... except for a tree dump scan.  I will look at this later.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #34 from Thomas Koenig  ---
Created attachment 46420
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46420=edit
Patch which includes a check for being contiguous

This patch looks like it could do the job.  I'll have to work a bit
more on test cases and ChangeLog before I can submit this, but
at least it survives regression testing.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #33 from Martin Liška  ---
(In reply to Thomas Koenig from comment #32)
> Hi Martin,
> 
> this
> 
> 3822   ierr = pio_put_var (tape(t)%File, ps0var, (/ps0/))
> 
> looks like the culprit (or rather, where gfortran currently
> generates wrong code).  This is consistent with the problem seen
> in netcdf, so I feel pretty confident that this is indeed the problem.
> 
> To double-check, could you maybe do the following? Assume ps0 is a
> real(kind=8) variable, do
> 
> ...
> 
>real(kind=8) :: ps0_array(1) ! Use the type as ps0
> 
> and then
> 
> ps0_array(1) = ps0
> ierr = pio_put_var (tape(t)%File, ps0var, ps0_array)
> 
> and see if the segfault goes away, or at least if this one has
> been removed, and there is a different one now :-)

Yes, I can confirm it helps. I see a segfault later then.
Thank you.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #32 from Thomas Koenig  ---
Hi Martin,

this

3822 ierr = pio_put_var (tape(t)%File, ps0var, (/ps0/))

looks like the culprit (or rather, where gfortran currently
generates wrong code).  This is consistent with the problem seen
in netcdf, so I feel pretty confident that this is indeed the problem.

To double-check, could you maybe do the following? Assume ps0 is a
real(kind=8) variable, do

...

   real(kind=8) :: ps0_array(1) ! Use the type as ps0

and then

ps0_array(1) = ps0
ierr = pio_put_var (tape(t)%File, ps0var, ps0_array)

and see if the segfault goes away, or at least if this one has
been removed, and there is a different one now :-)

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #31 from Martin Liška  ---
I see this:

(gdb) frame
#2  0x00453b06 in pionfput_mod::put_var_vdesc_1d_double (file=...,
vardesc=..., ival=...) at pionfput_mod.fppized.f90:2468
2468ierr = put_var_1d_double (File, vardesc%varid, ival)
(gdb) up
#3  0x005be633 in cam_history::h_define (restart=4294949912, t=-17380)
at cam_history.fppized.f90:3822
3822 ierr = pio_put_var (tape(t)%File, ps0var, (/ps0/))
(gdb) up
#4  cam_history::wshist (rgnht_in=) at cam_history.fppized.f90:4461
4461  call h_define (t, restart)
(gdb) up
#5  0x007811dc in cam_comp::cam_run4 (cam_out=..., cam_in=...,
rstwr=.FALSE., nlend=.FALSE., yr_spec=0, mon_spec=1, day_spec=1, sec_spec=1800)
at cam_comp.fppized.f90:325
325call wshist ()
(gdb) up
#6  0x0079d809 in atm_comp_mct::atm_run_mct (eclock=..., cdata_a=...,
x2a_a=..., a2x_a=...) at atm_comp_mct.fppized.f90:513
513 yr_spec=yr_sync, mon_spec=mon_sync, day_spec=day_sync,
sec_spec=tod_sync)
(gdb) up
#7  0x007deb03 in ccsm_comp_mod::ccsm_run () at
ccsm_comp_mod.fppized.f90:2408
2408 call atm_run_mct( EClock_a, cdata_aa, x2a_aa, a2x_aa)
(gdb) up
#8  0x00403772 in ccsm_driver () at ccsm_driver.fppized.f90:58
58 call ccsm_run()
(gdb) up
#9  main (argc=argc@entry=1, argv=0x7fffdffe) at ccsm_driver.fppized.f90:25
25 use shr_sys_mod,   only: shr_sys_abort
(gdb) up
#10 0x779b6b7b in __libc_start_main (main=0x403740 , argc=1,
argv=0x7fffdb98, init=, fini=,
rtld_fini=, stack_end=0x7fffdb88) at ../csu/libc-start.c:308
308   result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
(gdb) up
#11 0x004037ba in _start () at ../sysdeps/x86_64/start.S:120
120 ../sysdeps/x86_64/start.S: No such file or directory.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #30 from Thomas Koenig  ---
Hi,

what I mean is if you use "up" several times and list the
source of the calling routines, do you encounter something like

  call foo([1.0, 2.0, 3.0, 4.0])

or

  call foo((/1.0, 2.0, 3.0, 4.0/))

?

This is what I see for netcdf, and then I can also understand what
goes wrong. Such an array constructor would be in read-only memory,
and the current version would try to write back to it on exit -
ouch :-)

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #29 from Martin Liška  ---
(In reply to Thomas Koenig from comment #28)
> https://gcc.gnu.org/ml/fortran/2019-05/msg00173.html reports
> the same symptoms for netcdf-fortran-4.4.5, presumably due
> to the same issue.
> 
> I'll try to fix that one and then see if the SPEC failure disappears
> along with it.
> 
> Martin, one additional question: When you step up from the segfault
> in the executable, is an array constructor passed as an argument
> somewhere up the call chain?  This is what appears to cause the trouble
> int netcdf.

How can I investigate that? Backtrace:

#0  0x008f706c in netcdf::nf90_put_var_1d_eightbytereal (ncid=7,
varid=23, values=..., start=, count=, 
stride=, map=...) at
netcdf_expanded.f90:1471
#1  0x00453a94 in pionfput_mod::put_var_1d_double (file=..., varid=23,
ival=...) at pionfput_mod.fppized.f90:1476
#2  0x00453b06 in pionfput_mod::put_var_vdesc_1d_double (file=...,
vardesc=..., ival=...) at pionfput_mod.fppized.f90:2468
#3  0x005be633 in cam_history::h_define (restart=4294949912, t=-17380)
at cam_history.fppized.f90:3822
#4  cam_history::wshist (rgnht_in=) at cam_history.fppized.f90:4461
#5  0x007811dc in cam_comp::cam_run4 (cam_out=..., cam_in=...,
rstwr=.FALSE., nlend=.FALSE., yr_spec=0, mon_spec=1, day_spec=1, sec_spec=1800)
at cam_comp.fppized.f90:325
#6  0x0079d809 in atm_comp_mct::atm_run_mct (eclock=..., cdata_a=...,
x2a_a=..., a2x_a=...) at atm_comp_mct.fppized.f90:513
#7  0x007deb03 in ccsm_comp_mod::ccsm_run () at
ccsm_comp_mod.fppized.f90:2408
#8  0x00403772 in ccsm_driver () at ccsm_driver.fppized.f90:58
#9  main (argc=argc@entry=1, argv=0x7fffdffe) at ccsm_driver.fppized.f90:25
#10 0x779b6b7b in __libc_start_main (main=0x403740 , argc=1,
argv=0x7fffdb98, init=, fini=,
rtld_fini=, stack_end=0x7fffdb88) at ../csu/libc-start.c:308
#11 0x004037ba in _start () at ../sysdeps/x86_64/start.S:120

#2  0x00453b06 in pionfput_mod::put_var_vdesc_1d_double (file=...,
vardesc=..., ival=...) at pionfput_mod.fppized.f90:2468
2468ierr = put_var_1d_double (File, vardesc%varid, ival)
(gdb) info locals 
ierr = 

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Thomas Koenig  changed:

   What|Removed |Added

 Status|WAITING |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |tkoenig at gcc dot 
gnu.org

--- Comment #28 from Thomas Koenig  ---
https://gcc.gnu.org/ml/fortran/2019-05/msg00173.html reports
the same symptoms for netcdf-fortran-4.4.5, presumably due
to the same issue.

I'll try to fix that one and then see if the SPEC failure disappears
along with it.

Martin, one additional question: When you step up from the segfault
in the executable, is an array constructor passed as an argument
somewhere up the call chain?  This is what appears to cause the trouble
int netcdf.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Martin Liška  changed:

   What|Removed |Added

 CC||seurer at gcc dot gnu.org

--- Comment #27 from Martin Liška  ---
*** Bug 90619 has been marked as a duplicate of this bug. ***

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-26 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #26 from Thomas Koenig  ---
Author: tkoenig
Date: Sun May 26 14:02:51 2019
New Revision: 271630

URL: https://gcc.gnu.org/viewcvs?rev=271630=gcc=rev
Log:
2019-05-26  Thomas Koenig  

PR fortran/90539
* trans-types.c (get_formal_from_actual_arglist): Set rank
and lower bound for assumed size arguments.


Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/trans-types.c

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-23 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #25 from Martin Liška  ---
(In reply to Thomas Koenig from comment #22)
> I've been trying out some things, and I cannot construct a failing
> test case.
> 
> A sane way to build such an interface would be
> 
>  cat tst.f90
> module x
>   use, intrinsic :: iso_c_binding, only : c_double
>   implicit none
>   interface
>  subroutine foo(a) bind(c)
>import
>real(kind=c_double) :: a(*)
>  end subroutine foo
>   end interface
>   private
>   public :: bar
> 
> contains
>   subroutine bar(a)
> real(kind=c_double), dimension(:) :: a
> a = 42._c_double
> call foo(a)
>   end subroutine bar
> end module x
> 
> program main
>   use, intrinsic :: iso_c_binding, only : c_double
>   use x
>   implicit none
>   real(kind=c_double), dimension(1) :: a
>   call bar(a)
> end program main
> $ cat foo.c
> #include 
> 
> void foo (double *a)
> {
>   printf("%f\n", *a);
> }
> $ gfortran -flto -O tst.f90 foo.c
> $ ./a.out
> 42.00
> 
> This works as expected.
> 
> What I do not understand is (comment #17)
> 
> (gdb) p debug(fsym)
> || symbol: '_formal_107'  
>   type spec : (REAL 8)
>   attributes: (VARIABLE  DIMENSION DUMMY)
>   Array spec:(0 [0])
> 
> 
> This means that the dummy parameter has rank zero. How, then,
> is it possible to pass a rank-1 argument to it?
> 
> (gdb) p debug(expr)
> nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8)
> 
> (gdb) p *expr->ref
> $8 = {
>   type = REF_ARRAY, 
>   u = {
> ar = {
>   type = AR_FULL, 
>   dimen = 1, 
>   codimen = 0, 
> 
> Something very fishy going on here.
> 
> Please look up the Fortran interface to the C function that is called,
> nc_put_vara_double.
> 
> Also, please break on gfc_conv_procedure_call for the call
> in question and do
> 
> $ call debug(sym)
> $ p args
> $ call debug(args->expr)
> $ p args->next
> $ call debug(args->next->expr)

(gdb) call debug(sym)
|| symbol: 'nf_put_vara_double' 
  type spec : (INTEGER 4)
  attributes: (PROCEDURE EXTERNAL-PROC IMPLICIT-SAVE EXTERNAL FUNCTION)
  result: nf_put_vara_double
  Formal arglist: _formal_103 _formal_104 _formal_105 _formal_106 _formal_107
(gdb) p args
$4 = (gfc_actual_arglist *) 0x2a766f0
(gdb) call debug(args->expr)
nf90_put_var_1d_eightbytereal:ncid (INTEGER 4)
(gdb) p args->next
$5 = (gfc_actual_arglist *) 0x2a72150
(gdb) call debug(args->next->expr)
nf90_put_var_1d_eightbytereal:varid (INTEGER 4)
(gdb) call debug(args->next->next->expr)
nf90_put_var_1d_eightbytereal:localstart(FULL) (INTEGER 4)
(gdb) call debug(args->next->next->next->expr)
nf90_put_var_1d_eightbytereal:localcount(FULL) (INTEGER 4)
(gdb) call debug(args->next->next->next->next->expr)
nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8)

> 
> ... and so on, until args->...->next becomes a null pointer.
> 
> I am starting do suspect that this is, in fact, another piece of SPEC
> bugware where they made some sort of broken interface between C
> and Fortran, which is exposed by my patch.

That's likely :) Hope my remove gdb session helped.

> 
> Hmpf...

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-23 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #24 from Martin Liška  ---
One another note is that the problematic code lives in src/netcdf/* and the
same code contain:
benchspec/CPU/521.wrf_r/src/netcdf/
and
benchspec/CPU/628.pop2_s/src/netcdf/

So that would explain also the segfault of the wrf benchmark.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-23 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #23 from Martin Liška  ---
(In reply to Thomas Koenig from comment #21)
> OK, if the callee is a C function... what is its declaration
> on the Fortran side?  Is there any interface, bind(c) or otherwise?
> 
> I suppose there must be something, otherwise nf_put_vara_double would
> have a trailing underscore.
> 
> On the caller side, I see that an array is passed, but the fsym
> has rank=0.  I think this would be flagged otherwise.

So ncfortran.h contains:
#define nf_put_vara_double  nf_put_vara_double_

And Fortran interface is defined in netcdf/include/netcdf.inc:

  integer nf_put_vara_double
! (integer ncid,
!  integer varid,
!  integer start(1),
!  integer count(1),
!  doubleprecision dvals(1))
  externalnf_put_vara_double

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #22 from Thomas Koenig  ---
I've been trying out some things, and I cannot construct a failing
test case.

A sane way to build such an interface would be

 cat tst.f90
module x
  use, intrinsic :: iso_c_binding, only : c_double
  implicit none
  interface
 subroutine foo(a) bind(c)
   import
   real(kind=c_double) :: a(*)
 end subroutine foo
  end interface
  private
  public :: bar

contains
  subroutine bar(a)
real(kind=c_double), dimension(:) :: a
a = 42._c_double
call foo(a)
  end subroutine bar
end module x

program main
  use, intrinsic :: iso_c_binding, only : c_double
  use x
  implicit none
  real(kind=c_double), dimension(1) :: a
  call bar(a)
end program main
$ cat foo.c
#include 

void foo (double *a)
{
  printf("%f\n", *a);
}
$ gfortran -flto -O tst.f90 foo.c
$ ./a.out
42.00

This works as expected.

What I do not understand is (comment #17)

(gdb) p debug(fsym)
|| symbol: '_formal_107'  
  type spec : (REAL 8)
  attributes: (VARIABLE  DIMENSION DUMMY)
  Array spec:(0 [0])


This means that the dummy parameter has rank zero. How, then,
is it possible to pass a rank-1 argument to it?

(gdb) p debug(expr)
nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8)

(gdb) p *expr->ref
$8 = {
  type = REF_ARRAY, 
  u = {
ar = {
  type = AR_FULL, 
  dimen = 1, 
  codimen = 0, 

Something very fishy going on here.

Please look up the Fortran interface to the C function that is called,
nc_put_vara_double.

Also, please break on gfc_conv_procedure_call for the call
in question and do

$ call debug(sym)
$ p args
$ call debug(args->expr)
$ p args->next
$ call debug(args->next->expr)

... and so on, until args->...->next becomes a null pointer.

I am starting do suspect that this is, in fact, another piece of SPEC
bugware where they made some sort of broken interface between C
and Fortran, which is exposed by my patch.

Hmpf...

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #21 from Thomas Koenig  ---
OK, if the callee is a C function... what is its declaration
on the Fortran side?  Is there any interface, bind(c) or otherwise?

I suppose there must be something, otherwise nf_put_vara_double would
have a trailing underscore.

On the caller side, I see that an array is passed, but the fsym
has rank=0.  I think this would be flagged otherwise.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #20 from Martin Liška  ---
(In reply to Thomas Koenig from comment #19)
> Thanks.
> 
> A bit more:
> 
> What are the declarations of the actual srgument,
> of the dummy argument (on the callee side),
> and what is the argument in the call list?
> 
> 
> Ill try to construct a test case tonight then.

So the callee is actually a C function:

;; Function nf_put_vara_double_ (null)
;; enabled by -tree-original


{
  size_t B3[512];
  size_t B4[512];
  int A0;

  # DEBUG BEGIN STMT;
size_t B3[512];
  # DEBUG BEGIN STMT;
size_t B4[512];
  # DEBUG BEGIN STMT;
int A0;
  # DEBUG BEGIN STMT;
  A0 = nc_put_vara_double (*fncid, *fvarid + -1, (const size_t *) f2c_coords
(*fncid, *fvarid + -1, (const int *) A3, (size_t *) ), (const size_t *)
f2c_counts (*fncid, *fvarid + -1, (const int *) A4, (size_t *) ), A5);
  # DEBUG BEGIN STMT;
  return A0;
}

where nc_put_vara_double is defined as:

int
nc_put_vara_double(int ncid, int varid,
 const size_t *start, const size_t *edges, const double *value)
{
int status = NC_NOERR;
NC *ncp;
const NC_var *varp;
int ii;
size_t iocount;

status = NC_check_id(ncid, ); 
if(status != NC_NOERR)
return status;

if(NC_readonly(ncp))
return NC_EPERM;

if(NC_indef(ncp))
return NC_EINDEFINE;

varp = NC_lookupvar(ncp, varid);
if(varp == NULL)
return NC_ENOTVAR; /* TODO: lost NC_EGLOBAL */

if(varp->type == NC_CHAR)
return NC_ECHAR;

status = NCcoordck(ncp, varp, start);
if(status != NC_NOERR)
return status;
status = NCedgeck(ncp, varp, start, edges);
if(status != NC_NOERR)
return status;

if(varp->ndims == 0) /* scalar variable */
{
return( putNCv_double(ncp, varp, start, 1, value) );
}

if(IS_RECVAR(varp))
{
status = NCvnrecs(ncp, *start + *edges);
if(status != NC_NOERR)
return status;

if(varp->ndims == 1
&& ncp->recsize <= varp->len)
{
/* one dimensional && the only record variable  */
return( putNCv_double(ncp, varp, start, *edges, value)
);
}
}

/*
 * find max contiguous
 *   and accumulate max count for a single io operation
 */
ii = NCiocount(ncp, varp, edges, );

if(ii == -1)
{
return( putNCv_double(ncp, varp, start, iocount, value) );
}

assert(ii >= 0);


{ /* inline */
ALLOC_ONSTACK(coord, size_t, varp->ndims);
ALLOC_ONSTACK(upper, size_t, varp->ndims);
const size_t index = ii;

/* copy in starting indices */
(void) memcpy(coord, start, varp->ndims * sizeof(size_t));

/* set up in maximum indices */
set_upper(upper, start, edges, [varp->ndims]);

/* ripple counter */
while(*coord < *upper)
{
const int lstatus = putNCv_double(ncp, varp, coord, iocount,
 value);
if(lstatus != NC_NOERR)
{
if(lstatus != NC_ERANGE)
{
status = lstatus;
/* fatal for the loop */
break;
}
/* else NC_ERANGE, not fatal for the loop */
if(status == NC_NOERR)
status = lstatus;
}
value += iocount;
odo1(start, upper, coord, [index], [index]);
}

FREE_ONSTACK(upper);
FREE_ONSTACK(coord);
} /* end inline */

return status;
}


that calls:

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #19 from Thomas Koenig  ---
Thanks.

A bit more:

What are the declarations of the actual srgument,
of the dummy argument (on the callee side),
and what is the argument in the call list?


Ill try to construct a test case tonight then.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #18 from Martin Liška  ---
$ cat -n netcdf/netcdf_expanded.f90:
...
  1470  print *,shape(values)
  1471  print *,size(values)
  1472  print *,is_contiguous(values)
  1473  
  1474 nf90_put_var_1D_EightByteReal = &
  1475nf_put_vara_double(ncid, varid, localStart, localCount,
values)
  1476   end if
  1477 end function nf90_put_var_1D_EightByteReal
...

gets me:

   1
   1
 T

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f955f316b40 in ???
#1  0x7f955f315d75 in ???
#2  0x7f955efc3e0f in ???
at
/usr/src/debug/glibc-2.29-5.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3  0x8e905c in __netcdf_MOD_nf90_put_var_1d_eightbytereal
at
/home/marxin/Programming/cpu2017/benchspec/CPU/527.cam4_r/build/build_peak_gcc7-m64./netcdf_expanded.f90:1475

So print result is: 1, 1, T.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #17 from Martin Liška  ---
(In reply to Thomas Koenig from comment #16)
> Hi Martin,
> 
> Is this for the slowdown or for the wrong-code issue?

It's the wrong code for cam4_r benchmark.

> 
> To get another view, from a gdb seesion of the compiler:
> 
> call debug(expr)
> call debug(fsym)

(gdb) p debug(expr)
nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8)
$3 = void
(gdb) p debug(fsym)
|| symbol: '_formal_107'  
  type spec : (REAL 8)
  attributes: (VARIABLE  DIMENSION DUMMY)
  Array spec:(0 [0])
$4 = void


> 
> a look at expr->symtree->n.sym (I think call debug(expr->symtree->n.sym)
> will also work,

(gdb) call debug(expr->symtree->n.sym)
|| symbol: 'values'   
  type spec : (REAL 8)
  attributes: (VARIABLE  DIMENSION DUMMY(IN))
  Array spec:(1 [0] AS_ASSUMED_SHAPE 1 () )


> 
> a look at expr->ref (follow a few pointers)
> 

(gdb) p *expr->ref
$8 = {
  type = REF_ARRAY, 
  u = {
ar = {
  type = AR_FULL, 
  dimen = 1, 
  codimen = 0, 
  in_allocate = false, 
  team = 0x0, 
  stat = 0x0, 
  where = {
nextc = 0x0, 
lb = 0x0
  }, 
  as = 0x27d7ee0, 
  c_where = {{
  nextc = 0x0, 
  lb = 0x0
} }, 
  start = {0x0 }, 
  end = {0x0 }, 
  stride = {0x0 }, 
  dimen_type = {DIMEN_RANGE, 0 }
}, 
c = {
  component = 0x10001, 
  sym = 0x0
}, 
ss = {
  start = 0x10001, 
  end = 0x0, 
  length = 0x0
}, 
i = INQUIRY_IM
  }, 
  next = 0x0
}

> a look at fsym->as (also follow non-zero pointers).

(gdb) p *fsym->as 
$9 = {
  rank = 0, 
  corank = 0, 
  type = AS_ASSUMED_SIZE, 
  cotype = 0, 
  lower = {0x0 }, 
  upper = {0x0 }, 
  cray_pointee = false, 
  cp_was_assumed = false, 
  resolved = false
}

> 
> Also, if you have
> 
> call foo(...,a, ...)
> 
> you can put
> 
> print *,shape(a)
> print *,size(a)
> print *,is_contiguous(a)

Let me work on this..

> 
> into the source, run it and see what you get.
> 
> Also, look into the callee if there is a bounds violation - what
> is the dummy argumet declared as on the calee's side?
> 
> Maybe you could also put
> 
> subroutine foo (, a, ...)
> 
> print *,shape(a)
> print *,size(a)
> print *,is_contiguous(a)
> 
> into the source code and paste the output.
> 
> Regards
> 
> Thomas

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #16 from Thomas Koenig  ---
Hi Martin,

Is this for the slowdown or for the wrong-code issue?

To get another view, from a gdb seesion of the compiler:

call debug(expr)
call debug(fsym)

a look at expr->symtree->n.sym (I think call debug(expr->symtree->n.sym)
will also work,

a look at expr->ref (follow a few pointers)

a look at fsym->as (also follow non-zero pointers).

Also, if you have

call foo(...,a, ...)

you can put

print *,shape(a)
print *,size(a)
print *,is_contiguous(a)

into the source, run it and see what you get.

Also, look into the callee if there is a bounds violation - what
is the dummy argumet declared as on the calee's side?

Maybe you could also put

subroutine foo (, a, ...)

print *,shape(a)
print *,size(a)
print *,is_contiguous(a)

into the source code and paste the output.

Regards

Thomas

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #15 from Martin Liška  ---
Resulting difference in original dump file is:

BEFORE:

D.20757 = _gfortran_internal_pack ();
__result_nf90_put_var_1d_eigh = nf_put_vara_double
((integer(kind=4) *) ncid, (integer(kind=4) *) varid, , ,
D.20757);
if ((real(kind=8)[0:] *) parm.2491.data != (real(kind=8)[0:] *)
D.20757)
  {
_gfortran_internal_unpack (, D.20757);
__builtin_free (D.20757);
  }

AFTER:

D.20757 = offset.2468;
D.20758 = ubound.2466;
D.20759 = D.20758 + -1;
typedef real(kind=8) [0:];
atmp.2492.dtype = {.elem_len=8, .rank=1, .type=3};
atmp.2492.dim[0].stride = 1;
atmp.2492.dim[0].lbound = 0;
atmp.2492.dim[0].ubound = D.20759;
D.20767 = D.20759 < 0;
D.20768 = D.20759 + 1;
atmp.2492.span = 8;
D.20769 = (void * restrict) __builtin_malloc (D.20767 ? 1 :
MAX_EXPR <(unsigned long) (D.20768 * 8), 1>);
D.20770 = D.20769;
atmp.2492.data = D.20770;
atmp.2492.offset = 0;
{
  integer(kind=8) S.2493;
  integer(kind=8) D.20772;

  D.20772 = stride.2467;
  S.2493 = 0;
  while (1)
{
  if (S.2493 > D.20759) goto L.778;
  (*(real(kind=8)[0:] * restrict) atmp.2492.data)[S.2493] =
(*values.0)[(S.2493 + 1) * D.20772 + D.20757];
  S.2493 = S.2493 + 1;
}
  L.778:;
}
__result_nf90_put_var_1d_eigh = nf_put_vara_double
((integer(kind=4) *) ncid, (integer(kind=4) *) varid, , ,
(real(kind=8)[0:] * restrict) atmp.2492.data);
D.20774 = offset.2468;
D.20775 = ubound.2466;
{
  integer(kind=8) S.2494;
  integer(kind=8) D.20778;

  D.20778 = stride.2467;
  D.20776 = -1;
  S.2494 = 1;
  while (1)
{
  if (S.2494 > D.20775) goto L.779;
  (*values.0)[S.2494 * D.20778 + D.20774] = (*(real(kind=8)[0:]
* restrict) atmp.2492.data)[S.2494 + D.20776];
  S.2494 = S.2494 + 1;
}
  L.779:;
}
__builtin_free ((void *) atmp.2492.data);

@Thomas: Can you please provide another hint what to do now?

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #14 from Martin Liška  ---
Ok, so I isolated that to a single file and one gfc_conv_subref_array_arg call.
Problematic file is netcdf/netcdf.f90 and the gfc_conv_subref_array_arg call
happens
for:

(gdb) p *expr
$3 = {
  expr_type = EXPR_VARIABLE, 
  ts = {
type = BT_REAL, 
kind = 8, 
u = {
  derived = 0x0, 
  cl = 0x0, 
  pad = 0
}, 
interface = 0x0, 
is_c_interop = 0, 
is_iso_c = 0, 
f90_type = BT_UNKNOWN, 
deferred = false, 
interop_kind = 0x0
  }, 
  rank = 1, 
  shape = 0x0, 
  symtree = 0x27d3570, 
  ref = 0x2b83f20, 
  where = {
nextc = 0x23bc358, 
lb = 0x23bc230
  }, 
  base_expr = 0x0, 
  is_boz = 0, 
  is_snan = 0, 
  error = 0, 
  user_operator = 0, 
  mold = 0, 
  must_finalize = 0, 
  no_bounds_check = 0, 
  external_blas = 0, 
  do_not_resolve_again = 0, 
  do_not_warn = 0, 
  representation = {
length = 0, 
string = 0x0
  }, 
  value = {
logical = 0, 
iokind = M_READ, 
integer = {{
_mp_alloc = 0, 
_mp_size = 0, 
_mp_d = 0x0
  }}, 
real = {{
_mpfr_prec = 0, 
_mpfr_sign = 0, 
_mpfr_exp = 0, 
_mpfr_d = 0x0
  }}, 
complex = {{
re = {{
_mpfr_prec = 0, 
_mpfr_sign = 0, 
_mpfr_exp = 0, 
_mpfr_d = 0x0
  }}, 
im = {{
_mpfr_prec = 0, 
_mpfr_sign = 0, 
_mpfr_exp = 0, 
_mpfr_d = 0x0
  }}
  }}, 
op = {
  op = GFC_INTRINSIC_BEGIN, 
  uop = 0x0, 

  op1 = 0x0, 
  op2 = 0x0
}, 
function = {
  actual = 0x0, 
  name = 0x0, 
  isym = 0x0, 
  esym = 0x0
}, 
compcall = {
  actual = 0x0, 
  name = 0x0, 
  base_object = 0x0, 
  tbp = 0x0, 
  ignore_pass = 0, 
  assign = 0
}, 
character = {
  length = 0, 
  string = 0x0
}, 
constructor = 0x0
  }, 
  param_list = 0x0
}

proc_name=0x15068d20 "nf_put_vara_double"

(gdb) p *fsym
$5 = {
  name = 0x144a2c20 "_formal_107", 
  module = 0x0, 
  declared_at = {
nextc = 0x23c86d4, 
lb = 0x23c8590
  }, 
  ts = {
type = BT_REAL, 
kind = 8, 
u = {
  derived = 0x0, 
  cl = 0x0, 
  pad = 0
}, 
interface = 0x0, 
is_c_interop = 0, 
is_iso_c = 0, 
f90_type = BT_UNKNOWN, 
deferred = false, 
interop_kind = 0x0
  }, 
  attr = {
allocatable = 0, 
dimension = 1, 
codimension = 0, 
external = 0, 
intrinsic = 0, 
optional = 0, 
pointer = 0, 
target = 0, 
value = 0, 
volatile_ = 0, 
temporary = 0, 
dummy = 1, 
result = 0, 
assign = 0, 
threadprivate = 0, 
not_always_present = 0, 
implied_index = 0, 
subref_array_pointer = 0, 
proc_pointer = 0, 
asynchronous = 0, 
contiguous = 0, 
fe_temp = 0, 
automatic = 0, 
class_pointer = 0, 
save = SAVE_NONE, 
data = 0, 
is_protected = 0, 
use_assoc = 0, 
used_in_submodule = 0, 
use_only = 0, 
use_rename = 0, 
imported = 0, 
host_assoc = 0, 
in_namelist = 0, 
in_common = 0, 
in_equivalence = 0, 
function = 0, 
subroutine = 0, 
procedure = 0, 
generic = 0, 
generic_copy = 0, 
implicit_type = 0, 
untyped = 0, 
is_bind_c = 0, 
extension = 0, 
is_class = 0, 
class_ok = 0, 
vtab = 0, 
vtype = 0, 
is_c_interop = 0, 
is_iso_c = 0, 
sequence = 0, 
elemental = 0, 
pure = 0, 
recursive = 0, 
unmaskable = 0, 
masked = 0, 
contained = 0, 
mod_proc = 0, 
abstract = 0, 
module_procedure = 0, 
public_used = 0, 
implicit_pure = 0, 
array_outer_dependency = 0, 
noreturn = 0, 
entry = 0, 
entry_master = 0, 
mixed_entry_master = 0, 
always_explicit = 0, 
artificial = 0, 
referenced = 0, 
is_main_program = 0, 
access = ACCESS_UNKNOWN, 
intent = INTENT_UNKNOWN, 
flavor = FL_VARIABLE, 
if_source = IFSRC_UNKNOWN, 
proc = PROC_UNKNOWN, 
cray_pointer = 0, 
cray_pointee = 0, 
alloc_comp = 0, 
pointer_comp = 0, 
proc_pointer_comp = 0, 
private_comp = 0, 
zero_comp = 0, 
coarray_comp = 0, 
lock_comp = 0, 
event_comp = 0, 
defined_assign_comp = 0, 
unlimited_polymorphic = 0, 
has_dtio_procs = 0, 
caf_token = 0, 
select_type_temporary = 0, 
associate_var = 0, 
pdt_kind = 0, 
pdt_len = 0, 
pdt_type = 0, 
pdt_template = 0, 
pdt_array = 0, 
pdt_string = 0, 
omp_udr_artificial_var = 0, 
omp_declare_target = 0, 
omp_declare_target_link = 0, 
oacc_declare_create = 0, 
oacc_declare_copyin = 0, 
oacc_declare_deviceptr = 0, 
oacc_declare_device_resident = 0, 
oacc_declare_link = 0, 
oacc_routine_lop = OACC_ROUTINE_LOP_NONE, 
ext_attr = 0, 
volatile_ns = 

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #13 from Thomas Koenig  ---
I'm afraid the tree dumps will not help a lot - I know what they
look like before and after, but I don't know what is wrong with it.

I would therefore ask you to reduce the test case, maybe starting
with the wrong-code issue.

I'm describing now what I would do, if I had access to SPEC.

One possibility is using -Os. This restores the behavior of using
the library function for packing / unpacking. You can check which file(s)
you need to compile using that flag to make that problem go away.
(A more fancy way would be to introduce, in my local tree, a new
option to specifically disable that optimization.) The relevant part is
in trans-array.c:

   /* When optmizing, we can use gfc_conv_subref_array_arg for
8138  making the packing and unpacking operation visible to
the
8139  optimizers.  */
8140 
8141   if (g77 && optimize && !optimize_size && expr->expr_type
== EXPR_VARIABLE
8142   && !is_pointer (expr) && (fsym == NULL
8143 || fsym->ts.type !=
BT_ASSUMED))
8144 {
8145   gfc_conv_subref_array_arg (se, expr, g77,
8146  fsym ? fsym->attr.intent
: INTENT_INOUT,
8147  false, fsym, proc_name,
sym);
8148   return;
8149 }
8150 

)

Once the file is known, I would set a breakpoint at the call to
gfc_conv_subref_arg and look at expr, fsym and proc_name to pinpoint
which part of the source code is affected.

Once that is known, I would debug the compiled program, seeing
what conditions are when the program is called - what kind of
array is passed, what is its rank, what are the dimension, are they
contiguous, and what does the dummy argument on the callee's side look like,
and work on reducing the test case from there.

Another point - maybe it would be a good idea to see how at least
one of the regular Fortran people could get access to SPEC.
I would be willing to sign an NDA, but I would _not_ be willing
to pay for it.  I suppose it would be no good to ask the FSF,
they would probably go bananas :-)

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #11 from Martin Liška  ---
Created attachment 46394
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46394=edit
521.wrf_r valgrind report

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #12 from Martin Liška  ---
Created attachment 46395
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46395=edit
527.cam4_r valgrind report

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #10 from Martin Liška  ---
(In reply to Thomas Koenig from comment #8)
> (In reply to Martin Liška from comment #6)
> > So there's somebody who is having the file in a public git repository.
> > That's probably violating SPEC rules :) But anyway, the .f90 file is here:
> > https://gitlab.bcamath.org/atrucchia/randomfront-wrfsfire-lsfire/blob/
> > 152f8c92b89b20021403acba9536553fda7a527b/wrfv2_fire/share/solve_interface.f90
> > 
> > @Thomas: Is it enough info?
> 
> I'm afraid not, it is neither complete nor self-contained (nor is the
> bug report in comment#7).  So, this is not a valid bug report according
> to https://www.gnu.org/software/gcc/bugs/ .  It's a heads-up, nothing
> more.

I'm sorry that we do play the SPEC game with a not open sources software. But
still, I'm willing to provide as many info as you need.

There are all tree dumps before and after your revision:
https://drive.google.com/file/d/1rzT3B0n6iMDIFNv0Y8dbKG2G8zhA_1fA/view?usp=sharing
https://drive.google.com/file/d/1obnhaGDhXg6DmF5iEmchb7d7lNlf9fj-/view?usp=sharing


> 
> SPEC is proprietary software, none of the Fortran maintainers has
> access to it. I will deal with this bug the same as I deal with
> all other bug reports - no test case, no possibility of fixing.

That's unfortunate, yes.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-21 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #9 from nsz at gcc dot gnu.org ---
spec2017 521.wrf_r never finishes on aarch64

gcc rev 271291 runs fine
gcc rev 271380 does not finish (possibly a crash that the spec scripts don't
detect)

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-21 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Thomas Koenig  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #8 from Thomas Koenig  ---
(In reply to Martin Liška from comment #6)
> So there's somebody who is having the file in a public git repository.
> That's probably violating SPEC rules :) But anyway, the .f90 file is here:
> https://gitlab.bcamath.org/atrucchia/randomfront-wrfsfire-lsfire/blob/
> 152f8c92b89b20021403acba9536553fda7a527b/wrfv2_fire/share/solve_interface.f90
> 
> @Thomas: Is it enough info?

I'm afraid not, it is neither complete nor self-contained (nor is the
bug report in comment#7).  So, this is not a valid bug report according
to https://www.gnu.org/software/gcc/bugs/ .  It's a heads-up, nothing
more.

SPEC is proprietary software, none of the Fortran maintainers has
access to it. I will deal with this bug the same as I deal with
all other bug reports - no test case, no possibility of fixing.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-21 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #7 from Martin Liška  ---
Note that patch is also responsible for 521.wrf_r segfault with -Ofast
-march=native on a Zen machine (with ulimit -s == unlimited):



Contents of wrf.err

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0  0x14c7dd128e0f in ???
at
/usr/src/debug/glibc-2.29-5.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#1  0x1605961 in __module_ra_rrtm_MOD_rtrn
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_ra_rrtm.fppized.f90:6413
#2  0x161dddf in __module_ra_rrtm_MOD_rrtm
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_ra_rrtm.fppized.f90:2256
#3  0x161edd2 in __module_ra_rrtm_MOD_rrtmlwrad
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_ra_rrtm.fppized.f90:1994
#4  0x1631f8d in __module_radiation_driver_MOD_radiation_driver
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_radiation_driver.fppized.f90:1106
#5  0x11776e3 in __module_first_rk_step_part1_MOD_first_rk_step_part1
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_first_rk_step_part1.fppized.f90:367
#6  0x18b6c15 in solve_em_
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/solve_em.fppized.f90:837
#7  0x1906eb3 in solve_interface_
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/solve_interface.fppized.f90:135
#8  0x127a53b in __module_integrate_MOD_integrate
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_integrate.fppized.f90:306
#9  0x17e5171 in __module_wrf_top_MOD_wrf_run
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_wrf_top.fppized.f90:309
#10  0x404b02 in wrf
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/wrf.fppized.f90:28
#11  0x404b02 in main
at
/home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/wrf.fppized.f90:6



[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-21 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #6 from Martin Liška  ---
So there's somebody who is having the file in a public git repository. That's
probably violating SPEC rules :) But anyway, the .f90 file is here:
https://gitlab.bcamath.org/atrucchia/randomfront-wrfsfire-lsfire/blob/152f8c92b89b20021403acba9536553fda7a527b/wrfv2_fire/share/solve_interface.f90

@Thomas: Is it enough info?

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-21 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #5 from Martin Liška  ---
Ok, looking at perf report:

$ head -n20  before.report.txt 
# Overhead  Command  Shared ObjectSymbol
#   ...  ... 
...
#
 7.45%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_advect_em_MOD_advect_scalar
 5.54%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_advance_w
 5.48%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_advance_uv
 5.45%  wrf_peak.amd64-  libc-2.29.so [.]
__memset_avx2_unaligned_erms
 4.51%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_big_step_utilities_em_MOD_calc_cq
 3.84%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_bl_ysu_MOD_ysu2d
 3.80%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_calc_p_rho
 3.71%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_advance_mu_t
 3.55%  wrf_peak.amd64-  libmvec-2.29.so  [.] _ZGVdN8vv_powf_avx2
 3.45%  wrf_peak.amd64-  libc-2.29.so [.]
__memmove_avx_unaligned_erms
 2.82%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_sumflux
 2.69%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_em_MOD_rk_update_scalar
 2.45%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_mp_lin_MOD_clphy1d
 2.24%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_mp_lin_MOD_lin_et_al
 2.16%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_small_step_prep
 2.09%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_big_step_utilities_em_MOD_curvature
 2.07%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_em_MOD_rk_addtend_dry

$ head -n20  after.report.txt 
# Overhead  Command  Shared ObjectSymbol
#   ...  ... 
...
#
19.91%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.] solve_interface_
 5.99%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_advect_em_MOD_advect_scalar
 4.59%  wrf_peak.amd64-  libc-2.29.so [.]
__memset_avx2_unaligned_erms
 4.45%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_advance_w
 4.30%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_advance_uv
 3.63%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_big_step_utilities_em_MOD_calc_cq
 3.11%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_calc_p_rho
 3.10%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_bl_ysu_MOD_ysu2d
 3.02%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_advance_mu_t
 2.77%  wrf_peak.amd64-  libc-2.29.so [.]
__memmove_avx_unaligned_erms
 2.75%  wrf_peak.amd64-  libmvec-2.29.so  [.] _ZGVdN8vv_powf_avx2
 2.31%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_sumflux
 2.10%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_em_MOD_rk_update_scalar
 1.87%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_mp_lin_MOD_clphy1d
 1.78%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_mp_lin_MOD_lin_et_al
 1.69%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_small_step_em_MOD_small_step_prep
 1.68%  wrf_peak.amd64-  wrf_peak.amd64-m64-mine  [.]
__module_big_step_utilities_em_MOD_curvature

The difference is in solve_interface_, I'll analyze that..

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-20 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Thomas Koenig  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #4 from Thomas Koenig  ---
Waiting for a test case.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-20 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #3 from Thomas Koenig  ---
I think I have an idea what might be the problem.

Does the code do something like

call foo(a)

...

subroutine foo(a)
  real, dimension(:) :: a
  call bar(a,size(n))

...

subroutine bar(a,n)
  real, dimension(n) :: a

?

What might be missing for good performance is the
check for contiguous memory when calling bar.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-20 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #2 from Thomas Koenig  ---
I am a bit surprised at this, that the library version
of packing seems to be faster than the inlined one.

Or maybe some argument is now packed which should not be.

Increased code size is sort of expected, copying inline
is bigger than calling s library function. This is why
this is not done at -Os.

Is it possible to get a reduced test case that shows the
slowdown?

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

--- Comment #1 from Richard Biener  ---
Haswell as well
(https://gcc.opensuse.org/gcc-old/SPEC/CFP/sb-czerny-head-64-2006/recent.html)
but only 10% and not bisected.

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-05-20
 CC||tkoenig at gcc dot gnu.org
  Known to work||9.1.0
Version|unknown |10.0
 Blocks||26163
   Target Milestone|--- |10.0
 Ever confirmed|0   |1
  Known to fail||10.0


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)